Author Archives: mcmartin1723

Genesis: Sprites and Controllers

I’m still writing a series of small test routines to put the Genesis through its paces. These haven’t been tremendously interesting in their own right, but if they’re all put together at once maybe we’ll get a complete-enough program out. This time, we’ll be reading the joystick and working with the sprite system.

Documentation on both of these subsystems were pretty thin. I ended up relying almost entirely on Charles MacDonald’s VDP docs, and controller information was taken from his hardware notes.

Continue reading


Variations on the 68000 ABI

Moving from the 8-bit computers to a 16-bit console system is a bit of cold water in the face—with no on-board operating system or BIOS-like software, I end up needing to write a large amount of support code that I’ll generally want to reuse. This is a bit of a problem, because assembly language provides no real protection against having various bits of your code stomp on other bits of your code. You need to keep everyone’s hands out of everyone else’s pockets and still let them efficiently hand things back and forth as needed.

For a lot of these older systems, or smaller programs, I’ve simply relied on documenting a contract for each function that’s exported: where values come in, where they come out, and what registers or memory are trashed as part of the process. The documentation of the C64 system calls in the Programmer’s Reference Guide and of DOS and BIOS system calls in the Interrupt List were both organized in this manner.

This doesn’t really scale.

It doesn’t scale both for trivial reasons—if you’re using a compiled language, it can’t read the comments you made—and for non-trivial reasons—there’s a lot of finicky stuff to keep track of and the cognitive load of adding to a system will just get larger and larger over time.

The solution for this is to adhere to some consistent contract that holds for all functions, or holds closely enough that you can pretend it does. We encountered this before in DOS when we were speeding up an old BASIC program. That provides a consistent mechanism for passing arguments, receiving return values, and dictating which registers must be left unchanged after any given routine returns. (While you might not trash all the registers that aren’t guaranteed to be preserved, anyone who calls you needs to pretend you did, so that you’re free to trash them six months later as part of a bug fix.)

But that really only buys you coexistence. A protocol like this will carry along with it assumptions about how your programs should be structured, and suggest ways to organize the internals of your routines as well. The disciplines that compilers use to keep everything interoperable should let our assembly language code scale as well, or give us some additional tricks we normally would not permit ourselves.

In this post, I’ll outline the technique used by Motorola 68000 compilers, adapt it to the Genesis, discuss what that means for program organization, and compare this protocol to ones used by other chips to show some more advanced techniques we can steal.

Continue reading

An Improvised Sega Genesis Toolchain

I suppose this isn’t entirely coincidence, since GameHut’s Coding Secrets videos were part of what pushed me to attack Genesis development sooner rather than later, but they seem to have also started a new series on Genesis homebrew development. If you’re following my articles with a hope of following along on your own, these videos will likely be a more useful alternate approach. I am most interested in the low level command of the hardware and how things are put together, and how to organize those things. The GameHut series is much higher level and is starting with an established application shell that it then fills it in as needed. It is top-down while my own experiments are bottom-up.

One caveat, though, should you attempt to follow both of these simultaneously; both he and I are using an assembler named asm68k but they are different assemblers with slightly different dialects. The one he is using was apparently originally part of the Psygnosis’s “Psy-Q” devkit, and it seems to be available from I mostly remember Psygnosis because they published Lemmings, but in the sweep of the industry’s history they seem to have been very important devkit creators for a wide variety of platforms. (Indeed, it seems they wound up acquired by Sony and now exclusively work on those systems’ kits.)

Jon Burton also had extensive experience in the industry as part of a significant studio, and he has significant prior tooling and software engineering experience in this realm. I, on the other hand, am the equivalent of a young demoscener trying to put together a release with tools I have either scavenged from the systems I do have, or that I built myself.

For me, that is part of the fun, and if you’ve followed me regularly, I hope it’s part of the fun for you too. In this post I will outline the tools and workflows I’ve used to build the cartridge images I have described in previous articles. In one sense, this article is “how to build a Genesis cartridge using only stone tools”, but my initial starting point is a fully-equipped 21st-century development workstation. It might be a stone tool, but obsidian holds one heck of an edge.

Software Base

My development system runs Fedora Linux. Most of the software I’m using here is also present in Debian-descended systems, but no one distro has every tool I use. Fedora’s core repositories, however, are unusually convenient for people experimenting with cross-architecture development.

For my core programming tools, I am using the gcc and python2 packages to provide support for C and Python. These come standard with the system.

For my emulators, DGen has a good debugger, but the Gens KGen mod has a better visualizer. Neither are in the Fedora repos, and you’ll need the wine package to run Gens, and the SDL-devel, SDL_image-devel, and mesa-libGL-devel packages to build DGen.

Getting Binaries Into ASMX

The ASMX family has two fairly significant gaps in it we’ll need to fill.

The first is that it lacks an equivalent to Ophis’s .incbin directive. If we have a blob of binary data that we want to incorporate into our program, we will need to convert it into a textual format. Our life is made a bit easier because it offers a very unformatted directive called HEX for textually inputting raw hex dumps. The following three lines are equivalent:

        dc.b    $01,$02,$ca,$fe
        dc.l    $0102cafe
        hex     01 02 ca fe

We can convert a binary file to a series of hex lines very easily. The DOS way of doing this would be to put together a tiny little C program that could do it and be compilable into a .COM file with no complaints. The UNIX way would exploit the various command-line tools that are installed as part of the base system and then promptly forgetten about by 99% of users:

od -An -t x1 -v | sed 's/^/        HEX     /' < blob.bin > blob.s

Even relatively casual Linux users will have encountered sed—it’s a commandline text searching and editing utility. I’m using it here to replace “start of line” with the HEX directive suitably indented; as such it’s putting it at the start of every line. The od utility is a bit more obscure. It’s shipped in Fedora as part of a package called coreutils, and that package’s job is to make shell scripts be able to do more without needing special ad-hoc code. The od utility in particular is an “object dumper”, used for making text versions of binary files. The -An argument hides the address column it would normally provide, -t x1 specifies we want hex values output a byte at a time, and -v tells it to not summarize repeated blocks of data.

Getting Binaries Out Of ASMX

The other issue we have is that ASMX doesn’t actually produce binary output – it uses Intel Hex or Motorola S-Record textual formats. Happily, objcopy—a low-level binary manipulation utility we last saw when creating small ARM binaries for RISC OS—also understands both formats:

objcopy -I srec -O binary --gap-fill=0xff --pad-to=0x8000 rom.s19 rom.bin

This works, but it isn’t great for a few reasons:

  • The value of --pad-to there will need to change as your actual program grows.
  • The ASMX core actually has a bug when it’s outputting S28, so if your ROM is larger than 64KB, you need to use S38 or Intel Hex.
  • The metadata block in the Genesis cartridge format includes both a ROM size marker and a checksum based on the rest of the cartridge contents, and objcopy of course cannot correct this.
  • When I’m pulling raw binaries into projects, I have to be careful to distinguish source from destination .BIN files.

In addition to raw binary formats, Genesis emulators accept an interleaved format called .SMD that matches what was produced by the Super Magic Drive peripheral. The Motorola S-Record format is pretty simple and refreshingly stateless, so I used that as my input format and did all my rounding and interpretation in place. I wrote a modestly significant Python script to cover all of those issues at once.

Converting and Editing Graphics

The core image I adapted to produce the Bumbershoot Software logo was a 256-color PNG, that I needed to drop to 16 colors. The ImageMagick suite of tools is the thermonuclear hand grenade of image conversion, and that will handle the resize and the color depth drop for us, and also put us in a position where I can improve it further later:

convert logo.png -colors 65536 -resize 200x200 -colors 16 logo.xpm

This isn’t really nearly good enough, though. The color depth drop produces dithering to cover its lack of palette, and that dithering makes the nice sharp lines in the icon jagged and corrupted. We need to fix that up before we actually import it. That’s why I chose to convert it to the XPM format, which is a very intuitive text-based image format. This lets us edit pixels in any text editor we want. I used that to smooth out the jagged edges. (Entertainingly, emacs will graphically render the image for you at the push of a key; unfortunately, editing the XPM is insanely slow on my computer unless I switch it to fundamental mode during the edits.) After that, it’s just a matter of converting it into something the Genesis VDP wants.

The script I used to do this was more than a little ad-hoc, but it got the job done. A more serious project would craft the images in 16 colors directly and then use a library like SDL_image to process the PNGs directly.

Sample Code

The sample programs from the past few posts are now posted to GitHub. I’ve adjusted the startup code so that I can share as much as possible between different programs without having to lock down our options too much.

The Sega Genesis Startup Code

Before we go to much further into making the Genesis do stuff, let’s dig into the startup code and see what pieces of the system need initializing. This is another one of those crunchy nothing-but-a-code-walkthrough posts, so if you’re not here for that, there won’t be much here for you.

The code I’m working on includes some of my own modifications, but it draws its ancestry through the version in the SGDK which goes back to an original routine by one Paul W. Lee. It incorporates further edits from Charles Coty and Stephane Dallongeville. Stephane in particular also curates the SGDK project, and I think the code there was derived from the XGCC startup code. Overall I have gotten the impression that this either was, or was derived from and based on, stock code that Sega had given to third-party developers, or were the results of reverse-engineering it.

This edition of that startup code is an attempt at synthesis and simplification.

We will start at the beginning, which is also the beginning of the cartridge image and the beginning of the 68000’s address space, at location zero.

Vectors and Metadata

The first 512 bytes in the cartridge don’t hold code. The first 4 bytes hold the initial value of the stack pointer, and the next four hold the initial value of the program counter. We set the first to zero, and the second to the value of our entry point label. I named it RESET because that’s how I’d named it on my 6502 projects.

Following this is 248 bytes of interrupt vectors, most of which represent serious program errors or other things that cause hardware traps. These, too, were addresses that would lead to any code you might like, and it was not an uncommon practice to ensure that your game never crashed by having these vectors point to routines that indicated some kind of secret had been unlocked. (This is why you unlocked Sonic 3D Blast‘s level select by physically striking the cartridge, and why speedrunners deliberately overloaded Mickey Mania‘s sprite logic to skip levels.)

(As an aside, it’s worth noting that the ability to specify these interrupt vectors on a per-program basis is why the cartridge ROM is mapped to the location it is. Here, and on the Z80 and its derivatives, the interrupt table is at the low addresses, so that is where the ROM goes, here and on the Game Boy. Meanwhile, on the 6502 and its derivatives, the interrupt vectors are at the highest addresses, so that is why the NES and Atari 2600 maps their cartridges into the higher part of their address spaces. “Operating System” ROMs like the C64’s KERNAL were similarly positioned.)

After the interrupt vectors, the cartridge holds a bunch of metadata the describes the cartridge and the hardware it expects to have available or reasonably supports. We saw similar material with the Game Boy. If it’s too incorrect, the console will refuse to boot.

Immediately After Poweron

When we power on, we jump to the RESET label and begin executing instructions. The very first thing the code does is skip all the rest of the code in this article sequence if the I/O ports are configured for output:

RESET:  tst.l   $a10008
	bne.s   @SkipJoyDetect
        tst.w   $a1000c
	bne.s   @SkipSetup

I’m not sure what this is all about, but I assume this is checking to see if you’re operating in some kind of hardware debugging environment.

After that we initialize the registers we need to do the rest of our work for the entire initialization sequence:

        lea     @Table,a5
        movem.w (a5)+,d5-d7
        movem.l (a5)+,a0-a4

This is actually a really nice setup. The a5 register is loaded with the head of a single table at the bottom of the initialization routine. We do a series of post-increment reads through a5 to get basically all the data we ever need after that. The exceptions are values we reuse or values that aren’t always used.

Our first reads use the MOVEM (move multiple) instruction to pull a bunch of values into or out of registers all at once. You usually use this for spilling and restoring registers as you enter or leave functions, but here it’s used for mass initialization. The initial values for our data and address registers are these:

        ;; Initial values for d5-d7
        dc.w    $8000, $3fff, $0100
        ;; Initial values for a0-a4
        dc.l    $00a00000, $00a11100, $00a11200, $00c00000, $00c00004

The data values are a little unusual, but we’ll be seeing those later. The five addresses are all I/O locations, though. Speaking of I/O, the next operation is the hardware verification check that launched a historic lawsuit:

        ;; Check Version Number
        move.b  -$10ff(a1),d0
        andi.b  #$0f,d0
	beq.s   @VersionOK
        move.l  #$53454741,$2f00(a1)

We’re checking the hardware revision, using the a1 as our base register for this block of I/O. If there’s any revision but zero, we write the number 1,397,049,153 to a different I/O target. If we don’t do this very soon after startup the system will lock up, and Sega claimed trademark control over this number. The reasoning here was twofold:

  1. If you write that value to that port, the console presented a display indicating SEGA had licensed the game.
  2. If you interpret 1387948153 as four ASCII characters instead of as a longword, it’s the word ‘SEGA’.

This argument didn’t carry the day, and that’s most likely a major part of why I was able to find this reverse-engineered startup code to work with. That done, we reset the VDP’s status by reading its status register, then we zero out d0, a6, and the previously unseen usp register:

        move.w  (a4),d0
        moveq   #$00,d0
        movea.l d0,a6
        move    a6,usp

It turns out that usp and ssp are two names for a7. The 68000 has two privilege modes, user and supervisor, and they each have their own stack pointer. On startup, we were in supervisor mode, so here we’re initializing the user’s stack pointer to 0 too.

Aggressively paranoid readers might object that location $000000 is in ROM, and that ROM is a terrible place to have a stack pointer. The trick here is that when you push values on the stack, you assign through a7 with a predecrement, so this wraps around and your first write is at $FFFFFFFC. But the 68000 only has 24 address pins, so that’s really $FFFFFC, the top of RAM. So it works out.

Initializing the VDP registers

This is the code that really impressed me enough to want to dedicate a post to the startup logic. It’s taking advantage of a lot of secondary aspects of the chip’s instructions:

        moveq   #$17,d1
        move.b  (a5)+,d5
        move.w  d5,(a4)
        add.w   d7,d5
        dbra    d1,@FillLoop

        move.l  (a5)+,(a4)
        move.w  d0,(a3)

This is powered by 24 byte values and a single longword:

        dc.b    $04, $14, $30, $2c, $07, $54, $00, $00
        dc.b    $00, $00, $00, $00, $81, $2b, $00, $01
        dc.b    $01, $00, $00, $ff, $ff, $00, $00, $80

        dc.l    $40000080

This code loads 24 values into the VDP registers. It begins by loading the number 23 into d1, which is the loop counter variable. Then it loads a byte into d5 and then writes that register, as a word, into the VDP control. The trick here is that memory reads do not disturb upper bits. With d5 starting at $8000, and is incremented by d7, which is $0100, this neatly iterates through the values needed to do systematic, increasing writes into the VDP registers.

Continue reading

Genesis: Graphical Basics

Last time, we had managed to create a cartridge that ran the startup and initialization code, but did nothing else. Let’s get a real message up:


If that font looks familiar, it’s because it’s the one I used for the Game Boy project. The basics of Genesis graphics end up looking quite a bit like the basics of Game Boy graphics, it turns out. Let’s talk about that a bit before we dig into any actual code.

The Video Display Processor

All the graphical work is done by way of the Video Display Processor, or VDP. It’s got a resolution of 320×224, which is normally a 40×28 grid of 8×8 tiles. (For simpler displays this can be narrowed to 256×224, for 32×28.) Like the Game Boy, this screen is a viewport into a large map buffer. On the Genesis, we get two independent map buffers, each of which measures 32 or 64 tiles in each dimension. These can be filled and scrolled independently, and Scroll Layer A can additionally have a Game-Boy like Status Window attached to it. There is also, as is traditional, a standalone sprite layer that is independent of the scrolling layers. The data describing all these things lives in a special 64KB chunk of memory set aside for the VDP called VRAM. (There is also special-purpose memory set aside for color palettes and configuration some of the more exotic scrolling techniques, but they don’t even add up to 256 bytes.)

Each 8×8 pixel character is built out of a 4 bit-per-pixel paletted bitmap, for a total of 32 bytes per character. Character data can live literally anywhere in VRAM as long as it’s 32-byte aligned, so the total number of possible characters is 2048. The four bits for each pixel index are interpreted by one of four palettes in Color RAM, which are in turn defined as 9-bit RGB values. This (given that color 0 is always transparent) means that on paper, under normal operation you get 61 possible colors out of a palette of 512.

Stepping off the paper as step, you actually get slightly more. The VDP is capable of 12-bit color, and it uses that to produce lighter or darker intermediate shades in the presence of certain kinds of transparency. Clever use of this technique, along with layering the two scroll layers along with sprites to allow for more than 16 colors per 8×8 block, allows for much richer graphical stills. Toy Story used this to good effect.

One other nice thing that falls out of this is that having a unique character in every screen slot only requires 40 * 28 = 1120 characters, which is much less than 2048 and thus fits very comfortably within VRAM:


(This image was made for me something like 15 years ago by someone I only knew as “Orin”. The fact that the umbrella stem vanishes whenever the logo is shrunk or has its colordepth dropped is more my fault than theirs.)

Now, with 11 bits to select a tile, that means that our map data has five extra bits per tile to play with. Two of those are used to select the palette individually for each tile. Another two let us flip the tile horizontally or vertically. These are both tremendously useful for getting more mileage out of a smaller number of tiles. The final bit is used to boost the tile’s priority to get more apparent layers out of a smaller number of actual layers.

The Plan

For now, we’re taking the startup code we had as a given. It puts the graphics chip into a reasonable state: VRAM and CRAM are zeroed out, and the registers are initialized so that we have a 40×28 screen with Scroll Layer A at $C000 and Scroll Layer B at $E000. The displays are both 64×32, the smallest they can be while still representing a complete screen. The only thing that isn’t perfectly configured is that the display isn’t actually enabled.

So for both of the programs shown above, all we need to do is load up VRAM with character and map data, and then turn on the display. Seems easy enough.

Spoiler alert: thanks to sensible third-party boot code, it actually is! Code starts below the jump.

Continue reading

Getting Ready for Genesis Development

At the end of last year, I mentioned that I’d done a Bumbershoot Software release on every computer or game system I used as a child, except for the Game Boy and the Sega Genesis. I’ve targeted the Game Boy. Time to move on to the Sega Genesis.

The Hardware

The Genesis (also known as the Mega Drive) was a 16-bit system with two CPUs: a Motorola 68000 and a Zilog Z80 for a coprocessor. We’ve seen the Z80 before—it’s what powered the ZX Spectrum and the ZX81, and the Game Boy’s Sharp LR35902 was a close cousin. The 68000 (usually just called the m68k) is a 1970s-era microchip, but it’s a 32-bit processor with a 16-bit data bus. It was used in the classic Macintoshes, Amigas, and the Atari ST lines of computers. Debian Linux supported the architecture for some time and while it’s only unofficial support nowadays, that support does still exist.

The graphics are controlled by a custom chip, and the sound is handled by a relative of DOS’s venerable AdLib chip. These are both quite powerful, and overall the console’s feature list looks like someone took all the trouble everyone had in the 8-bit era and decided to attack all of it at once:

  • 16 megabyte address space, so you can have tons of room for peripherals, RAM, and programs up to 4MB without having to bother with bankswitching.
  • A dedicated sound coprocessor, so you can do precise sound timing without having to tie up your main code, or so background music is fire-and-forget without interfering with your graphics display.
  • Multiple, resizable, and independently scrollable background layers, in what feels like a generalization of the Game Boy’s backgrounds and window modes.
  • Extremely programmable raster interrupts, and raster interrupts that for the most part you won’t even need because the graphics chip can be preprogrammed with scroll tables and let to run on its own.
  • The hardware even provides split-screen scrolling with splits that divide the screen into left and right parts, which cannot be accomplished with raster tricks. Well, not without turning your TV sideways.
  • An 8MHz 32-bit CPU means that you have a lot more time to handle any raster effects that you can’t get out of what you already have for free.

Some of these advantages were a bit short-sighted; in particular, the Super Nintendo came out a bit later and had twice the RAM and 64 times as many colors in its palette; its sound chip also focused on digital audio, which meant that its soundtracks are usually better remembered (much like how in the non-console computer world, we remember Amiga MODs more fondly than Adlib tunes). Nevertheless, the Genesis brings an impressive amount of raw computing power to the table and it has strong opinions on how it would like to use it. Even with those limitations in place, the Genesis has been likened to a “lawnmower with a Lamborghini engine.” To me, though, the most hilariously striking thing (especially given their old marketing slogan about how Sega Does What Nintendon’t) was how so many of the features and designs felt like souped-up versions of features that I had just finished learning on Nintendo’s own Game Boy.

The Software

The Genesis is also where we begin to leave the kind of discipline that was omnipresent in the 8-bit era. Of the machines we’ve looked at here on this blog over the years, the closest match in power is actually DOS—and like DOS, most Genesis development was done in high level languages with only a bit of tuned assembly language to handle places where timing was critical or direct hardware access was necessary. Pure assembly language development was possible but it was unusual enough that games that did so were called out as exceptional.

This flies a bit in the face of how I’ve approached projects on this blog. The final products I’ve released here—even on platforms like DOS and Windows—are generally ones where every byte of output is there by my command. (On Windows, this elides the details of DLL calls into the operating system, but even there no standard library was present.) So despite the existence of high-level development tools, I will be largely leaving them alone for the purposes of my first Genesis project.

Tools and Documentation

Unlike the 8-bits, the Genesis homebrew community is a lot smaller and less robust. The primary focus of documentation I’ve found seems to be either ROM hacking or emulator development. The closest thing to a homebrew hub I’ve found is, and it is sparse to the level of being nearly a personal home page. It includes a list with a number of tools, as well, but they seemed quite outdated. The documentation dump, however, is the most complete I have found so far.

That documentation, alas, is a bit of a mess. Even the copies of the old, official developer documentation actively contradicts itself on several important points. I’ll have to rely on emulator behavior and SDK library implementations to deduce what the protocols really are.

Less formally, GameHut’s Coding Secrets series digs into various effects achieved during the system’s heyday, as told by a developer at the time. If you’ve been following along with this blog for awhile or had any kind of contact with the demoscene, be aware that he has a somewhat unfortunate tendency to label as “impossible” any effect that the hardware does not give you literally for free. Most of the techniques there are refreshingly sane compared to just about anything we got up to with the C64’s vertical scrolling register. Still, there’s a lot of good stuff there.

Good cross-assemblers for the 68000 series also seem to be thin on the ground—a distressing number of forums actually suggested finding period tools for the Amiga or Atari ST and just using those. (While the GCC toolchain does include an assembler as part of its suite, it has a long and proud tradition of choosing the most awful possible assembler syntaxes that can still be plausibly said to be compatible with the platform. The m68k toolchain is no different.) That search did, however, turn up the asmx multi-CPU assembler. Conversations amongst the C64 developers also pointed me to the more actively updated SGDK and the Gendev project that ports the toolchain over to non-Windows machines. These are built on top of GCC and are intended to mix C and assembly language.

For any other sort of project, SGDK is quite impressive indeed. It boasts a full library and extensive pre-created routines designed to make the hardware understand standardized data formats, and it also provides a suite of resource compilers to get your data where you need it. For this project, that is overkill, and I will mostly be relying on it for proof-of-concept prototypes or to resolve inconsistencies or unclear parts of the documentation I’ve found.

Our first task will be to get something running at all. Time to get on with that proof of concept prototype.

Continue reading

Lessons Learned: Color Chart Madness Revisited

As long as I’m revisiting old work, I might as well revisit one of the first projects I undertook for this blog, four years ago: Color Chart Madness. I took a sample program from one of my reference books, took it apart, then put it back together again, better.

What’s kind of funny, looking back on it, is that I really had no idea how the VIC-II chip actually worked, and I needed to understand a lot of details about it in order to grasp what was really going on. I didn’t, however, seem to need to understand those details in order to fix all the problems I saw with it. In retrospect, it’s a case study of my past self working around my own ignorance.

What I Thought Was Happening

The program I was dissecting was, at core, a simple rasterbar effect. It wanted to display every combination of foreground and background, so it printed out text in all sixteen colors and then changed the background color sixteen times a frame to produce all the combinations. The initial implementation of this shut off the clock interrupt and then spinlocked on the raster count register in $D012 updating the background color in $D021 every eight lines. The problem with it—beyond the way that it was essentially locking up the system so hard the only way out was a soft-reset—was that the color change was happening a line too early. This meant that the character cells looked a bit uneven, and there was a spurious black line at the very bottom of the text display. Worse, setting the spinlock to terminate one line later ended up making the display incredibly unstable or, failing that, made the color changes happen on different incorrect lines.


My initial theory here as that there was somehow something wrong with the $D012 read—either the register was fundamentally unreliable, or the emulator was taking shortcuts. I rewrote the routine to instead be based on writes to $D012, replacing the old spinlock with a series of raster interrupts. That rewrite, with no other changes to the program constants, fixed the display completely. That only deepened my suspicions about the reliability of the data the first version depended on.

I couldn’t say I really understood what was going on fully, though, because there were still anomalies left behind. In particular, the raster values I had chosen matched the original version’s, but based on both Compute!’s Mapping The C64 and the C64 Programmer’s Reference Guide those values were one less than they should have been to change the color at the top of each letter. Furthermore, even though I had a display that looked the way I wanted it to, changing the raster target by a single line did not actually consistently change the color-change point by a single line.

As it turns out, these facts were only confusing me because I had gravely misunderstood the way the VIC-II builds its displays. My understanding of this system advanced in fits and starts over the course of about a year:

  • Cycle-exact Delays for the 6502, where I adapted some techniques I’d seen in an Atari 2600 demo to systematically trigger the anomalies I had encountered in Color Chart Madness.
  • Color Test: an Atari 2600 program, where I produced a complete Atari 2600 program on my own. This gave me direct experience with directing a raster display, and having to do that work “by hand” on the 2600 give me the background I needed to understand how the VIC-II did it automatically.
  • Flickering Scanlines, where I take that background and finally understand the 1996-era document that explains the operation of the VIC-II chip and thus which lets me really understand the anomalies above.
  • A taxonomy of splitscreens, where I begin to apply this knowledge to extracting reasonable effects out of the C64.
  • VIC-II Interrupt Timing, where I finally sit down, do the math, and work out the precise timing tolerances required to get various effects without relying on the advanced, cycle-exact techniques the demoscene standardized on.

That’s a little over a year’s worth of fitful exploration, research, and experimentation. At this point we should be able to just do a brief and accurate explanation of what issues I had run into and why the techniques I used fixed it.

What Was Actually Happening

The reason switching from a spinlock to a raster interrupt fixed the display is actually pretty trivial. There’s nothing at all wrong with reading $D012, and it is indeed that value that triggers the interrupt. Furthermore, it really was testing for the line before the start of any given character. However, it takes an extra 40 cycles or so to actually process the raster interrupt and get into code that we had, ourselves, written—and those 40 cycles were enough for the screen to display enough of that line in the old color to keep the display looking right. It was still setting it “too early”, but the grey background of the screen is actually an opaque foreground color so it masked our flaws. Take that away and the discontinuity is much more flagrant:


Explaining the flickering and discontinuity when we push the raster trigger one line forward is a little bit more involved. The key fact I was missing was that the C64’s CPU doesn’t actually get to run all the time—at the start of every line of text, the graphics chip (the VIC-II) steals 43 cycles worth of time from the CPU, monopolizing the memory bus so that it can load all the graphics information it needs for the next line. (Because these lines mess up your display timings, they are referred to by C64 programmers as badlines.) If we look at the initial spinlock-based implemenation, we see that actually getting around to updating the background color will take between 11 cycles (if we check the raster the very cycle it changes) and 21 cycles (if it changes the cycle just after we check). On a badline, the CPU will go dormant 13 cycles after the raster changes. That means that depending on exactly how the spinlock syncs up with the display, it will change the color either before or after the full row. Furthermore, the full spinlock cycle is 10 cycles long, and that means it won’t stay synced with the display; thus on different frames or even different points in the same frame there is no consistency. Thus, the flicker.

There’s less inconsistency with the interrupt-based implementation. It relies on the KERNAL to save and restore state and ultimately hand over control, so the 43-cycle delay of the badline will always be paid. However, adding that time into the rest of the display means that the color ends up changing halfway across the screen one line down, which means we get visible display tearing and in some sense the worst of all worlds.

But with an understanding of how that timing works, it at least is no longer surprising.