In the end, the only extra addition I made to Mandala Checkers was a little code to announce the computer’s move:
The final source code is here. I had a number of other ideas in mind, but ultimately I decided it wasn’t worth the effort to put into them. The rest of this post is a set of questions I had about the program and the porting process along the way, and the answers I found.
How Could We Improve the AI?
The original program’s AI was quite weak. It would take a capture if one were available, but it would otherwise move randomly. Even if we decide to not bother with proper move lookahead or search spaces, it’s pretty easy to devise a better set of heuristics:
- The game is won after making seven captures. We should therefore only make captures if we’re tied or ahead, or if the capture isn’t into a place where our opponent can just capture right back.
- Likewise, if at all possible, we just shouldn’t move pieces to where they’re being threatened.
It wouldn’t be too hard to alter our move-finder so that it can determine if a board position represents a potential capture for our opponent. The problem, however, is that this strategy completely breaks the game. This ruleset will effectively force a stalemate, as either player, right after the first move. All of the rules variations from American Checkers end up making this strategy more effective and thus making the game less playable:
- All pieces start out moving like kings, which means there is no forward pressure. After making an initial safe advance, all it has to do is move pieces deeper within the initial structure back and forth until the opponent provides an opportunity for attack.
- Capturing isn’t mandatory, so there’s no need to make captures that would expose you to unnecessary retaliation, when retreat is an option (or where we have the option of moving a piece over to make any threatened counterjump impossible).
- Since there are no multiple captures, a player who falls behind cannot meaningfully recover with a single bold stroke.
The end result is that good play produces an aggressively passive game that I’m not actually sure can ever be won except by exploiting an opponent’s blunders.
What Were the Changes in Memory Usage?
The original program took up about 5 kilobytes, with a kilobyte of that being metadata:
Program header 116 bytes 10- 100: Main loop 209 bytes 6000-6999: Computer move 1,843 bytes 7000-7999: Human move 624 bytes 8000-8999: Draw board 364 bytes 9000-9999: Init board 996 bytes Program footer 794 bytes ------------------------------------------ Total 4,946 bytes Total non-metadata 4,036 bytes
The reimplementation, on the other hand, is only just over one kilobyte, and only 161 bytes of metadata:
Program header 116 bytes BASIC stub 19 bytes Main loop 108 bytes Game select/init 190 bytes Draw board/score 204 bytes Human move 75 bytes Computer move 169 bytes Generic move engine 162 bytes Utility functions 99 bytes Program footer 26 bytes ------------------------------------------ Total 1,168 bytes Total non-metadata 1,007 bytes
The main loop has had its size halved, the init and drawing code dropped in size by about a third, and then the game-playing logic and program metadata each end up being smaller by a factor of six.
This is a frankly implausible amount of memory savings.
How Did We Save That Much Space?
It’s not really common for machine code to save that much space over an HLL. Aggresively hand-optimizing for space usually ends up saving you five to ten percent over a properly-tuned compiler. For smaller programs, in fact, an interpreted language like BASIC is often much smaller than the equivalent machine code—most of its standard library lives in ROM and complicated mathematical formulas can be encoded very compactly: A=A+B can take up as few as five bytes, but even if those are 8-bit values you’d spend eight to ten bytes doing the memory reads and writes you need. For a 16-bit integer value it’ll be quite a few more. For Mandala, though, the code doesn’t require any particularly fancy math. It’s mostly memory iteration and complicated control flow, which machine code is very good at. Furthermore, for the most complicated math and I/O, we simply call out into the BASIC ROM’s own routines to do that work for us instead of replicating it. The only routine where we replicate or replace behavior we could steal from the ROM is our random number generator, and that’s mainly because our xorshift-based generator lets us use the least significant bits in the results and thus restrict our randomness ranges with simple bitmasks instead of also needing to pack in a multiplier.
Furthermore, Sinclair BASIC has some serious disadvantages for its code density. Each program line has a 4-byte penalty, which isn’t too bad on its own… but the ZX81 and the Timex Sinclair 1000 alike do not permit multiple statements on a single line of BASIC code. This means you’ll be paying that 4-byte penalty more often, and often IF statements end up have to get repeated to simulate conditional blocks. Worse, every time a numeric constant appears in the code, there is a byte spent for each digit of the number’s decimal representation, but then an additional six bytes are spent to store its preparsed binary representation as well. There are a lot of numbers in Mandala Checkers and it doesn’t use any tricks to minimize the use of actual numeric constants. Those issues are somewhat compounded by the way that the computer-move logic flails around a bunch to make up for the bugs that the author seems to have noticed but actually isolated well enough to fix.
There are techniques to work around around these numeric-constant penalties—it’s faster but slower to store the constants as strings and parse them at runtime, a handful of numeric constants are hardcoded into the BASIC bytecode, and as a last resort an often-reused constant may be cached in a variable with a short name—but this program doesn’t bother with any of them. Once your program doesn’t run in 2KB the next smallest RAM size is 16KB, which is roomy enough that this is really no longer an issue. This BASIC program was optimized for speed, not size.
That leaves the 768 bytes that we saved out of the program footer. This is another side effect of the 2KB vs 16KB configuration of the ZX81: as we noted back when we first looked at the ZX81’s file formats, the screen memory gets saved out with every program, and the screen size is variable:
The screen memory (or “display file”, as they call it) is actually a list of 24 lines, each of which is individually terminated with a newline. If you have 4KB or more of RAM installed in the system, each line will be the full 32 characters long, which, with terminators (and an additional newline at the start that signals the beginning of the screen), takes up 793 bytes (33 per line, 25 per line, one initial newline). Under more constrained conditions, an empty screen is just 26 newlines.
My reimplementation is written for a 2KB machine, so it gets the fully compressed empty screen in its memory dump and thus the file ends up 768 bytes smaller. The total file size doesn’t shrink quite that much because I’ve got an extra 19 bytes to represent a tiny BASIC program that exists only to transfer control to the machine code.
Of course, the screen won’t stay 26 bytes. That leads us to the final question…
How Much Memory Does This Thing Actually Use?
I don’t bother with any memory allocation routines in this program, and because of the way the system expands and contracts its memory usage on its own, I also wasn’t comfortable with using any memory that existed past the edge of the screen memory. That meant that all my data exists, pre-initialized, within the file’s memory map itself. That’s handy because it means that our memory footprint can be measured directly from the file size. It’s not complete, but the only additional sources of are the expandable screen memory and the processor stack. So the question is thus: how much screen memory is actually used?
I looked into the system memory as the program was running to determine how much screen memory is used. In 16KB mode this stays solid at 794 bytes across the board; in 2KB mode, though, the last 12 bytes or so of each line are blank and thus cut out of the list, which drops the screen memory size by nearly 40%, to 487 bytes. This leaves 556 bytes spare for use by the processor stack in our own computing as well as in the calls the BIOS itself might make. With a fully-utilized screen, that headroom would drop to 199 bytes. That’s a point where I’d start getting nervous about the processor stack possibly trashing my system state.
But that’s also the point where I’m finished with this project. Time to move on to something new.