It’s time to move on to actually implementing the core algorithm for the Cyclic Cellular Automaton. I’ll start by describing the plan in English, and then below the fold we’ll break it down into actual code.
This will end up being a two-parter, I think, because this was more of an adventure than I expected.
Before We Begin
I was recently linked this memoir of an adventure from the same era. It’s a fun read, a pretty good explanation of why memory overrun vulnerabilities are bad, and (OMINOUS FORESHADOWING) might serve as a counterpoint to some of the adventures I had in my own implementation.
The Overall Plan
At the very highest level, here’s our plan:
- We have two chunks of memory; one that is our current automaton state and one of which is an equally-sized scratch space.
- Check to see if the user’s pushed the START button.
- If they haven’t:
- Compute the next frame, putting it in the scratch space.
- Make the scratch space the new current animation state.
- If they have, randomize the contents of the automaton state.
- Hand the new frame to the renderer.
Let’s take each of these elements in turn.
Our automaton state is a 128×128 grid, which takes up 16KB of RAM. The scratch space is another 16KB. That’s half our RAM, and since the processor’s own stack for tracking procedure calls and such lives at the top of RAM, we’ll just dedicate the entire bottom half of RAM to our state and scratch space.
We will not, however, be fixing which part of that space is the automaton state and which part is the scratch space. We’ll instead be using pointers into the space to distinguish that as needed.
Checking the START Button
We’ve already covered the basics of reading the controllers, but our task here is a little bit more tricky. We don’t really want to simply check if the START button is down right now. If we do that, we’ll keep resetting the state each tick for as long as the button is held down, and if they push START but release it before the end of the frame, that input would be forgotten. What need is to know if the START button had transitioned from not-pressed to pressed since the last time we reset.
Computing the Next Frame
We have a 128×128 grid, and each cell in that grid can have one of 16 states. To get the next state, we need to do these steps for each cell:
- Load the current value for that cell.
- Compute the value that eats that value – 0 eats 15, and for all other x, x+1 eats it.
- If the cells north, south, east, or west of this cell (wrapping around if necessary) are of the color that eats this cell, write that value into the corresponding cell of the scratch space.
- Otherwise, write the current value into the corresponding cell of the scratch space.
Because we’re using pointers to identify the automaton state and the scratch space, the “make the scratch space the new automaton state” step is simply swapping the scratch space and automaton state pointers—a much cheaper operation than actually copying 16KB around.
In fact, you may have noticed that all our boundaries in this system are built out of powers of two—16 states, 16KB buffers, 128×128 grids. This lets most of our operations happen at the level of individual bits. In fact, since we simply allocated the entire automaton state to the 32KB starting at $FF0000, we can alter which half of it we point to simply by toggling the $4000 bit on the pointer.
Likewise, we can compute the value that eats our cell by adding one and masking out all bits but the lower 4 (15 = $0F + 1 = $10, $10 & $0F = $00 = 0), and when working out the neighbors of our cells, we can just increment or decrement our X and Y coordinate, mask out all bits but the ones that represent 0-127, and then slot those bits into the appropriate parts of our pointer. The way our rendering code works, the X coordinate occupies bits 0-6 exactly (which may be extracted from our pointer by applying the bitmask $7F), and the Y coordinate occupies bits 7-13 (for a mask of $3F8). Slotting those bits into place produces a value we can add to our base pointer to produce the address of the neighbor we care about.
Randomizing the State
This is pretty straightforward; it’s just filling each spot in the automaton state memory with a random value between 0 and 15. The only real tricky bit here is that we’d prefer that the user get different configurations each time they play, and since we’re initializing our PRNG from values in the ROM, the same patterns will come out unless we mix it up somehow. I pick a first value that lets the automaton work through all its various states, and then every time a new generation is computed, I advance the RNG one step by collecting a random number and throwing it away.
It’s not a perfect system—we’re consuming so many values on a reset that the difference of a few generations ends up producing very similar maps—but it’s better than nothing.
Handing the Frame to the Renderer
We’ve already set up a variable that can be used to signal the renderer that it’s got new data to copy. We preprocess the new state into a form the VDP can interpret, and then we set that signal. The only real wrinkle is that we only have one place to put that processed VDP data, and so we cannot start writing to it until we’ve finished rendering the previous frame. We thus first wait for the renderer to signal back that it’s done rendering. (It does that by clearing the signal we sent ourselves, so we can go back and forth with a single byte of state.)
Implementation and Complications
I’ll be walking through my first working implementation of this below, or, at least, my almost-first implementation. My first attempt did work, mind you, at least more or less…
… but it was incredibly, incredibly slow. I was worried in the earlier posts about whether I was going to have enough time between generations to worry about it taking two frames to actually blit it out. My first implementation required 558 frames per generation, for a total animation speed of about 0.11 FPS. It was still getting the correct results, though, and it wasn’t actually crashing or corrupting memory or anything. So that means it must be basically correct code, right?
Not so much, but given what the problem was, you’d normally think the consequences would be far more dire. Since I needed to compute the neighbors of each cell, which meant extracting the X and Y coordinates of each cell, I decided to cut out the middle man and just loop through each cell by looping through each row, and then looping through each column within each row. I could then build my pointer out of those values and go from there.
The mistake that I made was that instead of looping through X and Y coordinates from 0 to 127 decimal, I would up looping through them from 0 to 127 hex. Ordinarily this would mean that instead of keeping all my computation within the 16KB of an automaton state, it would blow out over 21KB past that point. This should have, at minimum, corrupted the graphics state, and possibly started consuming the actual data that controlled the simulation and display routines.
But that didn’t happen. Our renegade programs did not escape. Why not?
In the malfunctioning lightcycles program I linked up top, the computer opponent blasted a hole in the outer wall and drove out into system memory, where it began treating the contents of RAM overall as a set of obstacles to be avoided. It knew where it was and it was examining its surroundings. What saved CCA’s integrity here is that any operation that ever touched memory or even eventually touched memory was aggressively wrapped around so that only some of its bits were used to set its dimensions. No matter how far you run on the surface of the Earth, you’ll never be able to reach the Moon. Likewise, our mad loop variables couldn’t reach outside of the 16KB allotted for them.
So that’s why it didn’t crash. But not only did it not crash, it also was producing the right answer. This was, to handwave a little bit, because I’d written the CCAstep routine to be purely functional. It received a buffer (the current state) and then only read from it. It also read only from that state to compute the next state. That next state was also treated as write-only. This means that while the loop strayed wildly beyond its intended bounds, it got constrained to remain in one place and then, since access to its input was read-only, got the same answer every time. We were repeating our work excessively, but we would keep getting the same, correct answer.
Fixing those loop bounds sped it up by a factor of approximately five, bringing us to 124 frames per generation, or an 0.48 FPS animation speed.
We can do better than that, but premature optimization is the root of all evil, so let’s start by working through our simple, slow implementation and save optimization for the next post.