Atari 2600: A 48px Kernel

Let’s finish this.

The final thing to do is to add a logo to the top of the screen. I’m going to use three copies each of the two player sprites for twelve lines, producing a 48 by 12 pixel display:


To produce a solid block of centered pixels, we need to set the players into “three copies close” mode. Each of those three copies will be eight pixels wide and have eight pixels of space between them. The first part of the trick is that we will interleave the two players so that there aren’t any gaps. That means that, for a solid centered display Player 1 needs to be drawn 8 pixels to the right of Player 0 and the fourth player graphic (which is Player 1’s second copy) needs to land on pixel 80, the beginning of the second half of the screen. Working backwards from that we find that our target pixels are 56 and 64. We covered quite early on how to locate our sprites in fixed locations, but there’s a slight wrinkle this time. The closest we can get to our target pixels is to write on cycles 37 and 40. However, if we write on cycles 34 and 37, we save a couple bytes in our placement code and still remain within just within the range of a single corrective HMOVE:

        sta     WSYNC
        lda     #$03            ; +2 (2)
        sta     NUSIZ0          ; +3 (5)  Three copies close for P0 and P1
        sta     NUSIZ1          ; +3 (8)
        lda     #$1e            ; +2 (10)
        eor     idlemsk         ; +3 (13)
        sta     COLUP0          ; +3 (16)
        sta     COLUP1          ; +3 (19)
        sta     HMCLR           ; +3 (22)
        lda     #$80            ; +2 (24)
        sta     HMP0            ; +3 (27)
        lda     #$90            ; +2 (29)
        sta     HMP1            ; +3 (32)
        nop                     ; +2 (34)
        sta     RESP0           ; +3 (37)
        sta     RESP1

(We’re making the header bright yellow by default, but we’re deploying the idle mask here too so that it will change along with the other graphics when we’re in idle mode.)

With the players placed, we now need to set aside some space to draw the graphic appropriately. For now, I’m splitting the top empty space in half and removing six scanlines from it:

        ldy     #$0d
*       sta     WSYNC
        bne     -

        lda     #$ff
        sta     GRP0
        sta     GRP1
        ldy     #$0c
*       sta     WSYNC
        bne     -
        lda     #$00
        sta     GRP0
        sta     GRP1

        ldy     #$0d
*       sta     WSYNC
        bne     -

I then remove six extra scanlines from the blank area at the bottom to balance it out. This moves the game board down a bit but (using a placeholder graphic that’s just a solid block of color) shows us a decently centered total display:


Now to turn that into our logo.

The General Approach

The STA GRPn instructions take three cycles to execute, during which time the display will advance nine pixels. We need to time writes to these registers so that the graphics are changed just before they’re used. At the time that’s happening, the other set of graphics are being consulted, so we do a have a little leeway here. We also have the advantage of being able to preload some values into the graphics first, so if the first two graphics values are fine, we simply need to update the next four graphics blocks. The challenge here is that we only have the A, X and Y registers to work with, so we can only store three values in a row without some kind of load operation—and load operations all cost unacceptably large amounts of time. The crucial insight to surmounting this challenge—which I see credited primarily to Dave Staugas at Atari—is that there are a few extra places to stash data.

Vertical Delay and the Shadow Registers

GRP0 and GRP1 are the two sprite graphics registers that the TIA chip admits to having, and they’re the only ones that can be directly written. However, it also has some internal registers for storing older values of them that were intended to be used to shift certain graphics down a scanline within a display kernel that took more than one scanline per loop. (Hence the name of the control flags for this, which have names like VDELP0 for Vertical DELay Player 0.) What they actually do is switch to displaying the “older” values of that graphic. This technique—the TIA Hardware Guide I’ve been using for reference calls it “The Venerable Six-Digit Score Trick”, and I’ve also seen it called the “Staugas Score Kernel”—uses those shadow registers to preload enough graphics data to let us display the entire 48 pixels.

These shadow registers don’t simply record the previous value of a write to a graphics register, unfortunately. Instead, the way it works is that any time you write to one of the player’s graphics registers, the “current” value of the other player’s graphics register is copied to its “old” value. This means that for any string of writes to the registers, there can be at most three unique values stored. (since writing the third value has wiped out one of the earlier ones). Fortunately, three unique values is all we need, since we’ve got A, X, and Y to cover the pending values. We’ll preload the graphics registers with three values, preload the registers with the remaining three, and then juggle the values so they display properly across the whole scanline.

Continue reading


Lights-Out 2600: Cleanup and Refactoring

This will be a mostly invisible update; I’m just laying the groundwork to prepare for the final update where I complete the project. There’s a few interesting things in here, but a distressing amount of it works out to me struggling with tools that I myself built to get them to do things I didn’t quite design them to do. Still, it’s got to be done, so let’s get to work.

Continue reading

Lights-Out 2600: Adding Sound

Time to add some sound effects to this project.

The Atari 2600’s sound capabilities are rather modest and unambitious, all told. The overall principle is very similar to the one we saw with the SN76489—we have a number of channels, and they produce a square wave or a modified square wave of some kind. We pick our volume independently for each channel, and we select our frequency by picking a divisor for some fundamental tone. The key differences with the SN76489 are:

  • There are only two channels on the Atari 2600 instead of four.
  • Each channel on the Atari has a selectable waveform, as opposed to the SN76489’s fixed square waves on three channels and choice of two noise waveforms on the fourth.
  • The waveform (“distortion”) selector for the Atari has sixteen options. It turns out there is some overlap; only ten of the selections are unique, and two of them are effectively square waves with different base frequencies. The end result is nine different waveforms and ten waveform control options.
  • There are only 31 available divisors for the base frequency. While there were a handful of heroic attempts to render background music into arcade ports, the results are a bit cringeworthy at best. The 2600’s sound circuitry is intended for sound effects. If you do want to make a little ditty, though, at least you’ve got two waveform options with different frequency ranges.

These options are ultimately sufficiently limited that in order to identify what sounds we want to use we’re better off just exhaustively listing possibilities and experimenting with settings rather than trying to compute tones or effects from first principles. Duane Alan Hahn, aka Random Terrain, not only produced those exhaustive tables but also very nice program about ten years ago that allows this kind of experimentation called Tone Toy 2008:


Playing around with these a bit, I settle on a few sound effects I like.

  • Setting the waveform to 12 (“Pure Lower Tone”) and frequencies of 10 and 5 made some nice beep sounds I could use for making moves.
  • Setting the waveform to 15 (“Electronic Rumble”) and then repeatedly counting down the frequency divider made a little electric whooping sound that worked as a victory marker. The Tone Toy will let you do sweeps but it always sweeps at a speed of one value per frame; to get the sound I wanted it would need to be about twice that fast.

I’ll be working through the actual implementation below the jump.

Continue reading

Lights-Out 2600: Game Flow FSM

The coarse structure is there for the project now; let’s add some polish. In this post I’ll be improving the game flow, and to model what it’s doing I will be using a concept known as a Finite State Machine. This is a very powerful and general model of processing that pops up in a lot of places in computer science. I’ll talk about the general concept, touch upon some of the classic uses for it, work one out for our top-level gameplay, and then implement it.

What is an FSM?

If we plug “FSM” into Wikipedia, we find that it is the deity of the Church of the Flying Spaghetti Monster, who hold that said Flying Spaghetti Monster created the universe after a bout of excessive drinking.

This is a system that we can model, at a high level, with a finite state machine. There are a finite number of states the FSM can be in: sober and drunk. There are likewise two possible inputs into the FSM: alcohol and not-alcohol. We can model the primordial chaos, thus, with a machine that transitions the FSM into the “drunk” state on alcohol input, and into the “sober” state on non-alcoholic input. Then, outside of that machine, we would need to indicate that there was some chance of creating the universe every time we ended up in the “drunk” state.

This is the sort of machine we’d be using for the Lights-Out project—there are a very small number of states and the machine here is not so much doing work on its own as it is keeping track of what it’s supposed to be doing. This sort of FSM is therefore also very common when performing basic scripting or AI for characters in videogames and the like. (For example: if a monster can see you, go into Chase state; if it’s in range and chasing, go into Attacking state; if it loses sight of you, go into Wandering state; but you can’t go into Attacking from wandering so the player can set ambushes.)

FSMs can also be used to directly track progress or represent a computation. For our FSM-as-an-FSM example, instead of simply maintaining “drunk” or “sober” as states, we could instead have as our set of states the number of drinks currently coursing through the FSM’s noodly veins. We keep the state space finite by asserting that at some point the addition of more alcohol will only replace the alcohol that is being metabolized. This lets us represent the FSM as an integer and a function whose inputs represent additional drinks or time spent sobering up.

We can enrich this model further by postulating that universe creation can only happen in the maximally intoxicated state; we thus mark that state as special (an “acceptance” state). We can then imagine taking a series of alcoholic and non-alcoholic drinks and then computing, by starting in the sober state and then feeding the drinks in order to our FSM, whether or not we end up in a universe-creating mood. This kind of FSM is what one generally uses to match regular expressions, except those usually use letters instead of drinks.

An FSM to Control Lights-Out

As of the end of last post, I’m relying on the player to keep track of the game state—whether it’s time for a new game, whether they’ve won, and whether a game is running at all. Let’s make the program a little more self-aware. To do that, we will create three states and shift between them as needed:

  • IDLE: In this state there is not a game in progress. This is the state we are in at power-on, and it just draws an empty board with no cursor. The joystick is not read. The background color changes randomly every few seconds. If the RESET button is pressed, we shift into the STARTING state.
  • STARTING: We are beginning a new game. The cursor is still not visible in this state, but a new puzzle is generated every frame the RESET button is held down. Once the RESET button is released, center the cursor on the game board and transition to the PLAYING state.
  • PLAYING: The game is in progress. The cursor is visible and can be controlled by the joystick. This state will shift back into STARTING if RESET is pressed, and back into IDLE if all the lights on the board are out.

Implementing the Lights-Out FSM

It’s entirely possible to have an FSM be directly embodied in your code, and you can do it pretty efficiently too. However, the most efficient technique for this machine isn’t really to embody it directly—most of the state information ends up being equivalent to other values we have to track. In particular, we end up in the STARTING state exactly and only when the RESET switch is being pressed down. It makes more sense to have that be a standalone override, and then just have a state variable for whether or not we’re in IDLE.

We’d also like to not need different display routines for each state if possible. Fortunately, that’s pretty easy. We’re setting the playfield and background colors during VBLANK and then not changing them, so we can just set them to different colors when we’re in the IDLE state. Likewise, the cursor needs to only appear when we’re in the PLAYING state, but the cursor is otherwise extremely easy to hide; our display logic counts down from four to zero and checks against crsr_y each time, drawing the cursor if there is a match. We can simply set crsr_y to five or more, and this test will never line up, and the cursor will not be drawn that frame.

Continue reading

Lights-Out 2600: The Gameplay Logic

Last time, we did what it took to create a new game. Today we’ll start managing the game itself, and this turns out to be simple enough that by the end of this post we’ll have all the “game” aspects completely done, with a ROM image that will provide all the challenge it’s ever going to. Everything after this will be polish and chrome, though frankly there’s a fair amount of that left to do.

Continue reading

Lights Out 2600: Starting A New Game

The display code is now working more or less the way I’d like it to. Better still, it’s based on the current state of the game in RAM, so from here on out we don’t need to add any display logic to actually update the screen—all updates will be picked up by our display logic as needed.

In this post I’ll be handling the start of the game, where we generate a new puzzle for the player to solve.

The User Experience

In my home computer editions, I handled this in a pretty brain-dead way: I simply performed a thousand random moves and let the normal puzzle-solving machinery sort this out. This would produce a neat ‘scramble’ animation since that performance takes several frames to complete.

That approach won’t really work for us here, though, or at least not in any reasonably convenient way. The Atari 2600 can’t do much computation at once because it is only really free to do it during VBLANK, and furthermore we really would prefer to not have any actions that have to be tracked across multiple frames if we can at all help it. Our frame-update logic will end up needing to be some kind of state machine, or we’ll need to turn the display logic into a subroutine and call out into it multiple times, or something similar.

All the same, that gives us three slightly contradictory requirements: we’d like to have a visible scrambling animation, we’d like to be able to provide a fresh puzzle within a single frame, and we’d like to have both of these things without actually having to track any state across frames.

The first two are pretty simple to reconcile; if we provide a new puzzle every frame and present it, we’ll get a scrambling animation just from that. If we want half a second of scrambling, just provide thirty puzzles in sequence.

The latter is more problematic, but here we are rescued by the mechanical parts of the Atari 2600 hardware:


(Image courtesy Wikipedia)

This is the second version of the Atari 2600 hardware, with wood paneling, four mechanical switches in the front, and two difficulty switches in the back. The rightmost switch on this is the spring-loaded “Game Reset” switch. Normally one would throw this switch to start a game. This solves our problem very neatly; we will regenerate the puzzle on any frame where the GAME RESET switch is thrown. This will visibly scramble the puzzle for exactly as long as the switch is held down and no longer.

Rethinking Puzzle Generation

My earlier implementations have relied on the fact that doing a thousand random moves will put the puzzle in a state that won’t require a thousand moves to solve, but which should have a shortest solution that would be relatively evenly spaced throughout the solution space. If we want to create a fresh puzzle each frame, though, we’ll need a much more efficient mechanism for generating random puzzles.

That solution is suggested by the way that moves in Lights-Out are toggles. If the same move appears twice in a string of moves, you can erase both of them with no change in the final board state. This means that the vast majority of our thousand moves will be wiped out by the end. We should be able to get the same result simply by iterating through each cell and flipping a coin to see whether or not to make a move at that cell. Doing that process once should completely scramble the board. We don’t even need to clear it first—doing this to any valid puzzle will produce a puzzle just as random as any other.

It’s reasonable to suggest that maybe we could just feed random numbers directly into the board and have that be our puzzle. Unfortunately, not all grid combinations are actually solvable by the Lights-Out rules—if we want to guarantee a solvable puzzle we need to generate it by executing valid moves from a solvable start position. That is, as we have just seen, though, not a terrible imposition.

Continue reading

Atari 2600: Vertical Sprite Placement

We’ve pretty much squared away horizontal placement at this point. Now we need to handle the vertical confinement of our play elements—the cursor should only exist within its row as well as its column, and the players need to actually represent the current game board state.

This turns out to be both easier and harder than horizontal placement. It’s easier because the principle behind it is much easier to turn into code. It’s harder because once you’ve done the work on horizontal placement, it’s kind of fire-and-forget. Vertical placement requires more extended and ad-hoc programming effort.

The fundamental principle is pretty straightforward—the Television Interface Adapter has one scanline worth of graphics data, so we cause elements to appear by changing that data to enable or disable our graphics at appropriate times. This isn’t too bad if you aren’t doing much with it—we’ve already done exactly this with the playfield to draw our grid, after all—but once you have a more dynamic display it can quickly become an unmanageable nightmare. Taming this complexity takes a lot of different forms, but there’s a reasonably consistent terminology for talking about them.

The Display Kernel

The term “display kernel” comes up pretty quickly once you start getting into Atari 2600 programming. At its most general it seems to be used to describe the code responsible for producing the entire 192-line display, but the term is more useful in more specialized forms. The version I’ve seen that I’m most comfortable with is that a display kernel is a data-driven loop that renders a contiguous block of scanlines based on that data. A full display is created by a series of one or more display kernels, possibly supplemented with hardcoded graphics code.

Display kernels are measured in lines. By the definition I’ve given above, a kernel is defined by the number of lines its loop comprises. A 12-line kernel, for instance, would split the screen into sixteen lanes by its loop. Each lane would then probably be doing some sub-processing of its own—possibly with internal, smaller kernels—to produce the graphics data for that lane. A game like Space Invaders with enemies that come in in neat rows or within defined lanes would use a technique like this, followed by specialized kernels for the shields and lastly the player’s cannon. At the other extreme, Combat has one two-line kernel to render the score followed by another 2-line kernel that renders the entire rest of the screen. That sort of technique makes more sense for games where the players have the run of the entire screen, and thus where there is not a convenient notion of lanes.

As a rule, the tighter your kernel, the smaller your code and the higher-resolution your graphics will look. This isn’t a hard-and-fast rule—larger kernels are likely to update the graphics registers far more than once per loop iteration— but in the two-to-four-line range it’s likely that graphics will persist within an iteration and lead to blockier graphics.

Continue reading