Category Archives: commodore

Getting Off the Ground With the Other Commodore 8-bits

My Commodore work here has been restricted so far to the Commodore 64. However, that system was only one of many related systems they produced over the years:

  • The PET was the first home computer they made with a serious following. Its abilities made it a rival to systems like the TRS-80, but very little else. Still, that was enough to cement their reputation early on.
  • The VIC-20 was their first system to achieve truly widespread success. It was inexpensive and provided significant power for the price. Its existence, and price point, had largely annihilated any potential market space for the ZX81- and Spectrum-based Timex computers in the US. It was also the direct predecessor to the Commodore 64, and its physical appearance and programming model reflect that.
  • The Commodore Plus/4 and Commodore 16 were an attempt to capture the lower-end home computer and small business market, and specifically to compete with Timex computers. It had many enhancements to the BASIC that let the user make use of its graphics and sound capabilities more directly, but the graphics and sound hardware were replaced with a much simpler combined chip called the Text Editing Device, or TED. The Plus/4 also shipped with on-board productivity software. In the end, though, it appears that the market segment the Plus/4 and C16 were intended to compete with Timex in didn’t really exist in the US at all, and all of the systems were failures there. (The on-board productivity software was also poorly-received; BYTE magazine apparently described it as “an insult.”) The Plus/4s and C16s were mostly dumped into the Eastern European marketplace at huge discounts, where they did ultimately do well enough to retain a niche following and a demoscene that lasts through to the present day.
  • The Commodore 128, released a year after the Plus/4, was the true direct successor to the Commodore 64. It added a new, more PC-like video interface, a separate Z80 CPU to let it run CP/M software, twice the RAM, the extra memory mapping and control support needed to make that RAM readily usable, and a vastly more sophisticated BASIC to take command of all of these new abilities as well as its older ones. The BASIC extensions from the Plus/4 were retained, along with new control flow options. Where the extensions were not taken, new ones were added for proper high-level control of the sprites, the sound chip, and the bitmap display, and unlike the Plus/4, the 128 made no compromises compared to the 64. Unfortunately, it largely avoided compromises by including all the C64 hardware directly within it and switching into a “C64 mode” that was, from a software standpoint, nearly indistinguishable from an ordinary C64, including reimposing all the limitations that the 128 was supposed to break.

Commodore made quite a few other systems as well, from the ill-fated CBM-II through the hugely popular and influential Amiga line, but the machines above formed a clear 8-bit lineage based on the 6502 family of microprocessors and a broadly consistent firmware interface that they called the KERNAL.

The KERNAL was a list of memory locations that provided consistent ways to call into system ROM functionality independent of the particular implementation on any given system or ROM patch level. For example, $FFD2 was always the location of the CHROUT “output a character to the current output device” routine, which in turn means that the assembly language components of our Hello World program for the Commodore 64 are portable across all of these machines. In this way the KERNAL provides a functionality similar to what BIOS provided for IBM PC clones—a way for software to ignore some aspects of hardware even in the absence of a shared operating system or modular device driver facility.

But assembly language source compatibility does not translate to binary compatibility. We saw this, at one remove, when we looked at the relocating loader for Compute!‘s automatic proofreader. In the rest of this article we’ll look at the changes each system demands.

Continue reading

Advertisements

C64 Fat Sprite Workarounds

When writing about retro programming topics here on Bumbershoot, I’ve generally wound up oscillating between the absolute basics of getting anything to run at all and techniques to exploit weird corner cases of the hardware to let it do things it was never intended to do. This article is a more unfortunate cousin: a weird corner case that is fully-intended, but vaguely-documented at best and incredibly inconvenient when doing apparently reasonable things.

In particular, today we will be talking about scrolling sprites off the left side of the screen on the Commodore 64.

A Quick Survey of Sprites

A “sprite” is a generic name for any block of pixels that you want to animate over a background of some kind. These days a sprite is usually two triangles rendered with a partially transparent texture. In the 20th century, it was often a highly-optimized pixel-copy routine (a “blitter”, from BLT, itself an abbreviation of “block transfer”). By the 1990s this was sufficient to the needs of the age, but in 1980s hardware blitting was slow and fraught with compromises. It worked—the Commodore Plus/4, the Apple II, and the entire ZX Spectrum line got by with software blitters over a bitmapped display—but it didn’t look great and those parts of 80s home-computer game animation that we remember fondly relied on more than just a blitter.

In particular, systems like the C64, the NES, and the Atari game consoles and home computers provided sprite definition and placement capabilities directly in their graphics hardware. This put a limit on the number of elements that could be efficiently rendered, but allowed the animation to be much more practical and sophisticated for a given CPU power.

Interestingly enough, though, hardware sprites were not necessarily a strict upgrade. The Atari 2600 and NES used the fact of their hardware sprite support to make actual bitmap support completely unnecessary. The C64’s 320×200 screen requires 9KB of RAM to hold a single screen’s worth of bitmap information—the NES’s graphics (without additional support circuitry on the cartridge) only had access to 2KB of RAM and 8KB of ROM. This produced a text-like display mode, and hardware sprites are the only way to get pixel-level control over the display on the NES, and this general approach of separate tile and sprite layers was convenient and efficient enough that they not only were available on the bitmap-and-sprite-capable systems of the time (like the C64 and Atari), but saw usage in lower-power or portable devices at least through the Nintendo DS.

Continue reading

Implementing SHA-256 on the 6502

After five implementations of various algorithms built mostly around bit shuffling, how about a properly modern, common, useful routine? We just need to find something that has a very similar structure and then we can deploy the knowledge we’ve learned in those other domains.

SHA-2 fits this bill neatly, and Wikipedia even has handy pseudocode we can base an implementation on. In looking at the implementations and specifications, it also seems like SHA-256 is a relatively good fit for the 6502, and may well qualify as the most sophisticated modern algorithm that the 6502 can still comfortably handle. We don’t really need to worry about endianness or word size—since we’re an 8-bit chip we can do multibyte mathematics in any direction we want and the size of the algorithm’s words really only alters our loop bounds—but SHA-256’s largest indexed elements are the round constants and the schedule arrays, each of which are 64 32-bit words long. That translates out to tables exactly 256 bytes in size, which is exactly the size of a 6502 memory page and thus the largest amount of memory that can be efficiently indexed.

Initial Implementation

Translating the pseudocode from the wikipedia article into code was not terribly challenging. I pre-defined tables of the necessary constants in the algorithm, and then set aside a fixed block of RAM to store all the working state that changes as a sum is computed. (In a modern system, we would probably instead operate on a pointer to arbitrary blocks of RAM, but the 6502 is much, much better at operating on blocks of memory at fixed addresses than it is at working with pointers, and I wanted to give the poor CPU every edge it could get.) I then defined several 4-byte blocks of memory for use as accumulators—I defined a variety of small functions that would do 32-bit big-endian math based on the contents of the accumulators and which could store results into and out of the working state as needed. There were a few awkward bits here—I ended up replicating a few functions because the 256-byte index limit got inconvenient at a few points—but nothing about the implementation of the algorithm really ended up having to deviate much from the pseudocode. So the implementation step was quite simple and easy, much as I had hoped.

However, the end result did not come out the way I had hoped it would. While it did give the correct answers to my test vectors, it took 580 milliseconds to consume a single block—nearly 10 milliseconds per byte! I wasn’t willing to push this too far—after all, these algorithms are tuned to be feasible only on modern chips—but we can surely do better than this.

Continue reading

Stabilizing the C64 Display with Sprites

For the most part I came to the 8-bit programming world very late. By the time I’m looking into what makes a system tick, all the development on that platform has ceased or at least plateaued, with the basic tricks of the system (including any consistent undocumented or unintentional capabilities of the chips) well-codified. This isn’t always really consistent—the Sega Genesis, for instance, is much easier to emulate these days than it is to intentionally develop for unless you were there at the time—but it’s still definitely influencing the way I’ve approached my own projects.

However, the development communities didn’t start out knowing these techniques. The main document I used to inform my Commodore 64 graphics work was published in 1996, two years after the C64 had been discontinued and long after it had been technologically eclipsed. For other techniques I’m mostly relying on what ultimately evolved as the community consensus. But effects were usually possible long before they were understood, and ultimately before the underlying principles were worked out thoroughly enough that a design could be settled on as optimal. In this article I’ll be looking at an early technique that was used to stabilize the C64 raster.

Revisiting the Problem

Ordinarily when you are programming C64 graphics, you don’t really have to worry about what the hardware is doing—you just write to the graphics registers and let it handle rendering the screen as needed. For more advanced techniques, such as split-screen displays, the graphics chip could interact with the CPU’s interrupt system to let you juggle the graphical configuration once a certain number of lines of display had been output. This is inherently imprecise. Not only will the CPU not react immediately to this interrupt, the graphics chip can also interfere with the CPU’s ability to run when it is fetching text or sprite data. In the absence of interference—or in cases where the amount of interference is known in advance—it is possible to establish lower and upper bounds on where in a scanline your display currently is. Taking advantage of that knowledge will let you derive timing information that constrains where you can put your reconfiguration code where it will change the display without producing noticeable flicker or other artifacts.

In practice, nobody ever really did this. Early advice would simply take the form of suggesting that you wait 16 cycles or so before writing your registers and everything would be fine, with the number 16 determined by experiment. Later advice, however, would suggest you remove the uncertainty entirely, using one of several techniques to synchronize precisely with the display. The Atari 2600, as we saw last year, gives us this capability for free with a memory location that, when written, halts the CPU until the exact microsecond the next scan line begins. We are not so fortunate on the Commodore 64. The most reliable method for getting this synchronization involves scheduling two interrupts in rapid succession followed by an additional trick to smooth out the final cycle of instability.

Once you have that synchronization, you can then start doing cycle-exact writes to the graphics registers to get horizontally stable visual effects like we saw on the Atari 2600’s displays, like changing the border or background colors to get a striping effect that exists nowhere in the actual display memory:

sync_stripes

The method I outlined back in 2015 is not the only way to do this. An alternate technique involving sprites was also well-known and got a good write-up in issue three of C=Hacking magazine. This was distributed as text and has a a complete archive at the Fridge. Pasi Ojala (who has appeared earlier on this blog as the author of the excellent PUCRUNCH utility) had a column there called The Demo Corner where he would explain how various tricks worked, and he was writing back when the C64 was still being manufactured and sold, too.

The Demo Corner is a fine series and I do recommend it if you want to read more about how the Commodore 64 actually does its work—the remainder of this article will be me taking the technique from issue 3 and contextualizing it within the techniques I’ve already derived and worked through.

Continue reading

Dissecting Three Classic Automatic Proofreaders

I’ve been thinking about type-in programs again. In particular, I’ve been thinking about one of the features many magazines and books provided for type-in programs that I never actually saw back when I was a youth typing programs in: automatic proofreader programs that would provide verification codes for the program as you typed it in, thus saving you multiple passes through the program trying to figure out why it was giving you wrong answers.

In poking around through the Internet Archive’s collections, I’ve found three of note and in this article I’ll be picking them apart.

SWAT: Strategic Weapon Against Typos

I encountered the SWAT system from publications associated with SoftSide Magazine, which focused on the TRS-80, the Apple, and the Atari home computers. These have generally been a bit before the time I focus on, though I really do owe the Atari home computers a more thorough investigation. The earliest attestation of the system I’ve found is in the June 1982 issue, and it provides implementations for all three of its target systems.

SWAT was a program intended to be appended to the program that it was to check; one would then start running the program from that point instead of running the program proper. It would then compute a simple checksum of every byte in the program by adding them up and then printing them out in groups. You would then check these codes against a separate, shorter listing that provided the codes a correct listing would produce. If they didn’t match, one edited the program until they did.

This is somewhat interesting because this is much closer to how we would organize such a utility in this day and age. The program would be read in, and a SWAT code table would be printed out. The other systems we will see in this article essentially modify the code editor and require checking as one types.

SWAT takes three parameters: the boundaries of the program to check, the maximum number of lines per chunk (default 12), and the target number of bytes per chunk (default 500). It then scans through the program as it exists in memory, producing a running sum of every byte in the program, modulo 676. Once it reaches the end of a line, it checks to see if this is the maximum number of lines, or if the byte target has been exceeded. If it is, it emits a line on the SWAT table indicating the range of lines, the total number of bytes, and the resulting sum. Instead of printing the sum as a number between 0 and 676, it emits it as two letters. (676 is, after all, 26*26.) The first letter is the “tens digit” of the result.

One interesting thing about this is that it does not operate on the actual text the user typed. The BASICs for these three systems analyze and reformat the instructions so that they may be executed more efficiently at run time (a process that documentation of the time often called crunching, but which modern writers would call tokenizing), and it is the tokenized form of the program that is summarized. This meshes extremely well with Applesoft BASIC, because its tokenizer actually also removes all user-supplied formatting, which means that all program lines are actually converted into a single canonical form. The TRS-80 preserved all user formatting, which meant that the program had to be entered by the user exactly as printed to match the SWAT codes. The Atari systems were particularly unusual—they normalized input lines like Apple did, but some quirks of its tokenization process meant that how lines were tokenized would depend on the order in which they were entered, so skipping around in a program while entering it or editing typos along the way could actually corrupt your SWAT codes. Fortunately, there was a procedure for normalizing a program, and so SWAT simply required users to perform this procedure before running any checks.

As a checksum, this mostly did what it needed to, but it wasn’t ideal. In addition to its false positives, a simple sum of bytes will not catch transposition of characters, and for programs with a lot of DATA statements, this was the most dangerous and difficult-to-identify problem that a user was likely to cause. Summing the collapsed tokens, however, did mean that any misspelling of a word BASIC recognized would be immediately obvious, altering not only the final sum but even the length of the line. For the kinds of programs that SoftSide tended to publish, this was entirely adequate, though. Their programs tended to be pure BASIC and would not have large amounts of machine code or graphical data within them.

That privilege would go to Compute!’s Gazette, which focused on the Commodore line, which also required much more aggressive use of machine code and memory-mapped I/O to function.

Compute!’s Automatic Proofreader (1983-1986)

Compute!’s Gazette started out as a magazine for the VIC-20 and the Commodore 64. In October 1983 they introduced a pair of programs that would provide proofreading support for automatic proofreading. The tighter focus of the magazine—and the close similarity of the operating systems of the two machines, even at the binary level—allowed the editors to provide tools that hooked much more deeply with the machine.

All the Commodore 8-bit computers provided a uniform interface for basic I/O operations, and also provided a number of points where they user could replace core functionality with custom routines. This low-level interface—which they called the KERNAL—allowed a lot of work to be done at the machine code level and still run acceptably across the entire line.

This program worked by copying itself into a block of memory that was only used for tape I/O and which was commonly used by BASIC programs as scratch space for small machine language programs. A simple BASIC loader copied it into place and then ran a routine that attached the bulk of the program to the KERNAL’s character-input routine. This routine, interestingly, wasn’t called when the user pressed a key; instead, once a line had been entered, the screen-editor logic decided which part of the screen constituted that line and then provided the contents of that line as input, followed by the RETURN key that kicked it all off.

This proofreader would read characters and add their codes to a running 8-bit total, wrapping around as necessary, and ignoring spaces. When the return key was detected, it would stash the output state, then move the cursor to the upper left corner of the screen, print out the final sum (from 0 to 255), and then set the cursor back the way it was. As a checksumming algorithm, this had the same problems with not detecting transposition of characters that SWAT did, and it also was less reliable about misspelled keywords (since this scan was happening before tokenization). On the plus side, a new code was generated for every line of text and you could check your work as you typed, or list an entire block and check it by going to the top of the program block and repeatedly pressing RETURN to evaluate each line.

Early versions of the proofreader had two editions, one for the VIC-20 and one for the Commodore 64, but the only actual difference between the versions was that they called a routine in the BASIC ROM to convert the byte into a decimal number, and the BASIC ROM was mapped to a different part of memory in the two machines. The API for the functions was identical, and indeed the BASICs were so similar that this was the same routine, in the end.

Ultimately later editions of this proofreader unified the two versions and usde the actual original value of the “character read” routine that the proofreader hooked itself up to as a switch to decide where to call to print a decimal number. This added a dozen bytes or so to the final program but even on the extremely cramped VIC-20 this was a cost that could be easily paid.

However, the tighter binding to the operating system produced some unique drawbacks as well. The CHRIN routine the proofreader extended was actually called for all kinds of character input, not just program lines. As a result, running a program with the proofreader active would have it corrupt the program’s display with handy check codes for every response the user gives to an INPUT statement. Worse, it would do the same for textual data read off of the disk or tape. Of course, the tape wouldn’t have time to do any reading; once the tape routines started using their temporary storage, this would trash the memory holding the proofreader, and the system would begin trying to execute random temporary data as code and probably crash extremely hard.

Compute!’s Automatic Proofreader (1986-)

Over the next few years, Compute!’s Gazette got more and more sophisticated programs in its lineup—many approaching or exceeding commercial quality—and it also got several more systems it needed to support. In February 1986, they updated their proofreader to use a more sophisticated technique. While they were at it, they also addressed all the shortcomings I listed above.

The most difficult issue to address was where to put the proofreader so that it would not be touched by the running system during normal operation. They fixed this by pushing the start of BASIC’s program memory forward 256 bytes and using the space freed for that as dedicated to the proofreader. However, this was a different place in memory for the five machines they supported, so they also needed to patch the program after loading so that the addresses pointed to the right place. The necessary information for patching turns out to be largely supplied in a portable way by the KERNAL, so this is not as heinous as it sounds, but it does still require the most sophisticated BASIC loader I have seen.

The other system-specific issues were solved by extending the “tokenize a line of BASIC text” function instead of the “read a character” function. This also lets the proofreader intervene less frequently and lets it process an entire lin eof text at once, guaranteed. User input and file I/O aren’t intercepted, and with the program relocated to the old start of BASIC RAM, tape I/O works fine too.

The final—and, for the user, the most important—change was to use a more sophisticated checksum algorithm that can actually reliably flag swapped characters and make it much less likely for typos to cancel each other out:

  1. The checksum is a 16-bit unsigned integer, and its initial value is the line number being processed.
  2. The line is preprocessed by removing all spaces that are not between quotes. So, for instance, 10 PRINT "HELLO WORLD!" becomes 10PRINT"HELLO WORLD!"
  3. Add the byte value of each character to the checksum, but before adding it, multiply it by its position in the line after extraneous blanks are removed. So, for our sample line, the checksum starts at 10, then gets 49*1 and 48*2 added for the line number 10, then 80*3 for the P in PRINT, and so on.
  4. XOR the high and low bytes of the checksum together to produce the final 8-bit checksum.
  5. Express the checksum as a two-letter code. This is basically a two-digit hexadecimal number, but the least significant digit comes first and instead of using the traditional 0123456789ABCDEF digits, it instead uses the letters ABCDEFGHJKMPQRSX.

This scheme was sufficiently effective that they never modified it afterwards and it continued in use until Compute! stopped publishing type-in programs in the early 1990s. That is a solid pedigree.

After the jump, I will dissect the sophisticated BASIC loader that was used to make the same core program work on five different computer models, and then present my reconstruction of the proofreader itself.

Continue reading

An Ambition, Completed

I belong that that somewhat narrow cohort that straddles Generation X and the Millenials—we grew up with the microcomputer omnipresent but we remember a time before The Internet. The cohort has gotten various attempts at nicknames but I’ve always been fond of the name Oregon Trail generation, after the famous educational software.

My parents were also somewhat technically inclined (or at least gadget-oriented), and as a result I had grown up alongside a number of game consoles and home computers. A lot of the work I did on those, both as a hobby and for schoolwork, shaped my talents in the years to come.

One of my goals here on Bumbershoot over the past few years has been to demystify the machines I grew up with. I’m now a genuine software professional, after all—surely I’ve learned enough to see how the tricks were done and to join in myself, if a bit late.

As such, once I realized that this was what I was doing, I formalized the nature of the task and started getting systematic about it. And now, with this latest release, I have finished it.

The Rules

Here are the rules I had set for this task.

  • Childhood ends in 1992. This is a somewhat arbitrary division point, but it turns out I can pick any year from there to my 20th birthday and not change the list of targeted systems, and 1992 means I don’t have to distinguish between when a platform was released in North America and when I got access to it.
  • Systems require a full year of exposure to count as “one I grew up with.” I had brief acquaintance with the Macintosh and the SNES, but both were systems I didn’t really have any experience with until adulthood. I first beat Super Metroid in 2010.
  • Programs must run on the metal. Interpreted languages are permitted only to the extent that they can then transfer control to machine code. This started out as “BASIC and LOGO don’t count” but eventually became a requirement that, for at least one project, I produce a binary where every single byte in the program exists because I put it there.
  • Programs must be for the platform. Writing a highly portable C program and jamming it through a compiler for each system doesn’t count. I should make use of a generous set of system-specific capabilities that let it show off what it’s good at. Any stunt that reaches demoeffect quality automatically achieves this goal.
  • The platform must have been intended to receive third-party programs. That means the graphing calculators we had in high school don’t count, even though they had Z80 chips in them and you could program them with special connectors and PC programs that no longer run on 64-bit operating systems.
  • Period tools are not required. Since half the targets are game consoles, cross-development tools were the norm even at the time.

So, without further ado, the systems I grew up with.

Continue reading

Lessons Learned: Color Chart Madness Revisited

As long as I’m revisiting old work, I might as well revisit one of the first projects I undertook for this blog, four years ago: Color Chart Madness. I took a sample program from one of my reference books, took it apart, then put it back together again, better.

What’s kind of funny, looking back on it, is that I really had no idea how the VIC-II chip actually worked, and I needed to understand a lot of details about it in order to grasp what was really going on. I didn’t, however, seem to need to understand those details in order to fix all the problems I saw with it. In retrospect, it’s a case study of my past self working around my own ignorance.

What I Thought Was Happening

The program I was dissecting was, at core, a simple rasterbar effect. It wanted to display every combination of foreground and background, so it printed out text in all sixteen colors and then changed the background color sixteen times a frame to produce all the combinations. The initial implementation of this shut off the clock interrupt and then spinlocked on the raster count register in $D012 updating the background color in $D021 every eight lines. The problem with it—beyond the way that it was essentially locking up the system so hard the only way out was a soft-reset—was that the color change was happening a line too early. This meant that the character cells looked a bit uneven, and there was a spurious black line at the very bottom of the text display. Worse, setting the spinlock to terminate one line later ended up making the display incredibly unstable or, failing that, made the color changes happen on different incorrect lines.

cc1_0

My initial theory here as that there was somehow something wrong with the $D012 read—either the register was fundamentally unreliable, or the emulator was taking shortcuts. I rewrote the routine to instead be based on writes to $D012, replacing the old spinlock with a series of raster interrupts. That rewrite, with no other changes to the program constants, fixed the display completely. That only deepened my suspicions about the reliability of the data the first version depended on.

I couldn’t say I really understood what was going on fully, though, because there were still anomalies left behind. In particular, the raster values I had chosen matched the original version’s, but based on both Compute!’s Mapping The C64 and the C64 Programmer’s Reference Guide those values were one less than they should have been to change the color at the top of each letter. Furthermore, even though I had a display that looked the way I wanted it to, changing the raster target by a single line did not actually consistently change the color-change point by a single line.

As it turns out, these facts were only confusing me because I had gravely misunderstood the way the VIC-II builds its displays. My understanding of this system advanced in fits and starts over the course of about a year:

  • Cycle-exact Delays for the 6502, where I adapted some techniques I’d seen in an Atari 2600 demo to systematically trigger the anomalies I had encountered in Color Chart Madness.
  • Color Test: an Atari 2600 program, where I produced a complete Atari 2600 program on my own. This gave me direct experience with directing a raster display, and having to do that work “by hand” on the 2600 give me the background I needed to understand how the VIC-II did it automatically.
  • Flickering Scanlines, where I take that background and finally understand the 1996-era document that explains the operation of the VIC-II chip and thus which lets me really understand the anomalies above.
  • A taxonomy of splitscreens, where I begin to apply this knowledge to extracting reasonable effects out of the C64.
  • VIC-II Interrupt Timing, where I finally sit down, do the math, and work out the precise timing tolerances required to get various effects without relying on the advanced, cycle-exact techniques the demoscene standardized on.

That’s a little over a year’s worth of fitful exploration, research, and experimentation. At this point we should be able to just do a brief and accurate explanation of what issues I had run into and why the techniques I used fixed it.

What Was Actually Happening

The reason switching from a spinlock to a raster interrupt fixed the display is actually pretty trivial. There’s nothing at all wrong with reading $D012, and it is indeed that value that triggers the interrupt. Furthermore, it really was testing for the line before the start of any given character. However, it takes an extra 40 cycles or so to actually process the raster interrupt and get into code that we had, ourselves, written—and those 40 cycles were enough for the screen to display enough of that line in the old color to keep the display looking right. It was still setting it “too early”, but the grey background of the screen is actually an opaque foreground color so it masked our flaws. Take that away and the discontinuity is much more flagrant:

cc3_1

Explaining the flickering and discontinuity when we push the raster trigger one line forward is a little bit more involved. The key fact I was missing was that the C64’s CPU doesn’t actually get to run all the time—at the start of every line of text, the graphics chip (the VIC-II) steals 43 cycles worth of time from the CPU, monopolizing the memory bus so that it can load all the graphics information it needs for the next line. (Because these lines mess up your display timings, they are referred to by C64 programmers as badlines.) If we look at the initial spinlock-based implemenation, we see that actually getting around to updating the background color will take between 11 cycles (if we check the raster the very cycle it changes) and 21 cycles (if it changes the cycle just after we check). On a badline, the CPU will go dormant 13 cycles after the raster changes. That means that depending on exactly how the spinlock syncs up with the display, it will change the color either before or after the full row. Furthermore, the full spinlock cycle is 10 cycles long, and that means it won’t stay synced with the display; thus on different frames or even different points in the same frame there is no consistency. Thus, the flicker.

There’s less inconsistency with the interrupt-based implementation. It relies on the KERNAL to save and restore state and ultimately hand over control, so the 43-cycle delay of the badline will always be paid. However, adding that time into the rest of the display means that the color ends up changing halfway across the screen one line down, which means we get visible display tearing and in some sense the worst of all worlds.

But with an understanding of how that timing works, it at least is no longer surprising.