Category Archives: sinclair

Mandala Checkers: Wrap-Up

In the end, the only extra addition I made to Mandala Checkers was a little code to announce the computer’s move:


The final source code is here. I had a number of other ideas in mind, but ultimately I decided it wasn’t worth the effort to put into them. The rest of this post is a set of questions I had about the program and the porting process along the way, and the answers I found.

How Could We Improve the AI?

The original program’s AI was quite weak. It would take a capture if one were available, but it would otherwise move randomly. Even if we decide to not bother with proper move lookahead or search spaces, it’s pretty easy to devise a better set of heuristics:

  • The game is won after making seven captures. We should therefore only make captures if we’re tied or ahead, or if the capture isn’t into a place where our opponent can just capture right back.
  • Likewise, if at all possible, we just shouldn’t move pieces to where they’re being threatened.

It wouldn’t be too hard to alter our move-finder so that it can determine if a board position represents a potential capture for our opponent. The problem, however, is that this strategy completely breaks the game. This ruleset will effectively force a stalemate, as either player, right after the first move. All of the rules variations from American Checkers end up making this strategy more effective and thus making the game less playable:

  • All pieces start out moving like kings, which means there is no forward pressure. After making an initial safe advance, all it has to do is move pieces deeper within the initial structure back and forth until the opponent provides an opportunity for attack.
  • Capturing isn’t mandatory, so there’s no need to make captures that would expose you to unnecessary retaliation, when retreat is an option (or where we have the option of moving a piece over to make any threatened counterjump impossible).
  • Since there are no multiple captures, a player who falls behind cannot meaningfully recover with a single bold stroke.

The end result is that good play produces an aggressively passive game that I’m not actually sure can ever be won except by exploiting an opponent’s blunders.

What Were the Changes in Memory Usage?

The original program took up about 5 kilobytes, with a kilobyte of that being metadata:

           Program header       116 bytes
  10- 100: Main loop            209 bytes
6000-6999: Computer move      1,843 bytes
7000-7999: Human move           624 bytes
8000-8999: Draw board           364 bytes
9000-9999: Init board           996 bytes
           Program footer       794 bytes
           Total              4,946 bytes
           Total non-metadata 4,036 bytes

The reimplementation, on the other hand, is only just over one kilobyte, and only 161 bytes of metadata:

           Program header       116 bytes
           BASIC stub            19 bytes
           Main loop            108 bytes
           Game select/init     190 bytes
           Draw board/score     204 bytes
           Human move            75 bytes
           Computer move        169 bytes
           Generic move engine  162 bytes
           Utility functions     99 bytes
           Program footer        26 bytes
           Total              1,168 bytes
           Total non-metadata 1,007 bytes

The main loop has had its size halved, the init and drawing code dropped in size by about a third, and then the game-playing logic and program metadata each end up being smaller by a factor of six.

This is a frankly implausible amount of memory savings.

How Did We Save That Much Space?

It’s not really common for machine code to save that much space over an HLL. Aggresively hand-optimizing for space usually ends up saving you five to ten percent over a properly-tuned compiler. For smaller programs, in fact, an interpreted language like BASIC is often much smaller than the equivalent machine code—most of its standard library lives in ROM and complicated mathematical formulas can be encoded very compactly: A=A+B can take up as few as five bytes, but even if those are 8-bit values you’d spend eight to ten bytes doing the memory reads and writes you need. For a 16-bit integer value it’ll be quite a few more. For Mandala, though, the code doesn’t require any particularly fancy math. It’s mostly memory iteration and complicated control flow, which machine code is very good at. Furthermore, for the most complicated math and I/O, we simply call out into the BASIC ROM’s own routines to do that work for us instead of replicating it. The only routine where we replicate or replace behavior we could steal from the ROM is our random number generator, and that’s mainly because our xorshift-based generator lets us use the least significant bits in the results and thus restrict our randomness ranges with simple bitmasks instead of also needing to pack in a multiplier.

Furthermore, Sinclair BASIC has some serious disadvantages for its code density. Each program line has a 4-byte penalty, which isn’t too bad on its own… but the ZX81 and the Timex Sinclair 1000 alike do not permit multiple statements on a single line of BASIC code. This means you’ll be paying that 4-byte penalty more often, and often IF statements end up have to get repeated to simulate conditional blocks. Worse, every time a numeric constant appears in the code, there is a byte spent for each digit of the number’s decimal representation, but then an additional six bytes are spent to store its preparsed binary representation as well. There are a lot of numbers in Mandala Checkers and it doesn’t use any tricks to minimize the use of actual numeric constants. Those issues are somewhat compounded by the way that the computer-move logic flails around a bunch to make up for the bugs that the author seems to have noticed but actually isolated well enough to fix.

There are techniques to work around around these numeric-constant penalties—it’s faster but slower to store the constants as strings and parse them at runtime, a handful of numeric constants are hardcoded into the BASIC bytecode, and as a last resort an often-reused constant may be cached in a variable with a short name—but this program doesn’t bother with any of them. Once your program doesn’t run in 2KB the next smallest RAM size is 16KB, which is roomy enough that this is really no longer an issue. This BASIC program was optimized for speed, not size.

That leaves the 768 bytes that we saved out of the program footer. This is another side effect of the 2KB vs 16KB configuration of the ZX81: as we noted back when we first looked at the ZX81’s file formats, the screen memory gets saved out with every program, and the screen size is variable:

The screen memory (or “display file”, as they call it) is actually a list of 24 lines, each of which is individually terminated with a newline. If you have 4KB or more of RAM installed in the system, each line will be the full 32 characters long, which, with terminators (and an additional newline at the start that signals the beginning of the screen), takes up 793 bytes (33 per line, 25 per line, one initial newline). Under more constrained conditions, an empty screen is just 26 newlines.

My reimplementation is written for a 2KB machine, so it gets the fully compressed empty screen in its memory dump and thus the file ends up 768 bytes smaller. The total file size doesn’t shrink quite that much because I’ve got an extra 19 bytes to represent a tiny BASIC program that exists only to transfer control to the machine code.

Of course, the screen won’t stay 26 bytes. That leads us to the final question…

How Much Memory Does This Thing Actually Use?

I don’t bother with any memory allocation routines in this program, and because of the way the system expands and contracts its memory usage on its own, I also wasn’t comfortable with using any memory that existed past the edge of the screen memory. That meant that all my data exists, pre-initialized, within the file’s memory map itself. That’s handy because it means that our memory footprint can be measured directly from the file size. It’s not complete, but the only additional sources of are the expandable screen memory and the processor stack. So the question is thus: how much screen memory is actually used?

I looked into the system memory as the program was running to determine how much screen memory is used. In 16KB mode this stays solid at 794 bytes across the board; in 2KB mode, though, the last 12 bytes or so of each line are blank and thus cut out of the list, which drops the screen memory size by nearly 40%, to 487 bytes. This leaves 556 bytes spare for use by the processor stack in our own computing as well as in the calls the BIOS itself might make. With a fully-utilized screen, that headroom would drop to 199 bytes. That’s a point where I’d start getting nervous about the processor stack possibly trashing my system state.

But that’s also the point where I’m finished with this project. Time to move on to something new.


Mandala: Initial Port Complete

Last time we ported over and bugfixed the initialization and board-drawing code for Mandala Checkers. This time we’ll adapt the code that lets the players make moves.

We’ll end up reorganizing the code a bit from the BASIC original, though. In that, the human and computer code was completely separated into their own blocks of a thousand line numbers, and they would draw their moves on the board themselves. That was necessary for the BASIC version because the BASIC code for drawing the whole board was extremely and unacceptably slow. Our machine code drawing routine, on the other hand, is fast enough that we can just call it after each move is made. This will also clean up a little infelicity in the initial implementation: the players’ scores only appear after they’re scored a point in the original. We draw the scores as part of the board display.

This also means that the human and computer systems now have much more in common in their logic; making a move, or checking to see if a move is legal, now can use identical code. The only thing we need to verify separately is that the right kind of piece is in the start position.

Improvements Over The Original

That also said, a rather distressing number of corners were cut or accidentally filed off in the initial implementation. The human move code, for instance, completely trusted the human player regarding the validity of moves. Executing a move was a matter of erasing the source square and then putting a human piece on the destination square. If the move was not a single square in a diagonal direction, it would also clear the square whose coordinates were the average of your start and end square, and give you credit for a capture. This would let you move opposing pieces and turn them into your own. It would let you jump and capture your own pieces for points. It would let you jump and “capture” completely empty squares for points. Or, you could just move a piece anywhere on the board, get a point for a capture, and erase some totally disconnected square that happened to have an unfortunate coordinate:


This is not really acceptable. Since our computer-move code needs to verify moves independently, we will avail ourselves of that code in the player-move code as well.

The computer-move code is not in the greatest of shape, either. It will, at least, only ever perform legal moves. However, consider this code, from where it is systematically looking for legal capture moves:

6050 LET Y=-11
6055 IF Z+Y>88 OR Z+Y<11 OR Z+2*Y>88 OR Z+2*Y<11 THEN GOTO 6070
6060 IF A(Z+Y)=H AND A(Z+2*Y)=E THEN GOTO 6100
6070 LET Y=-9*(Y=-11)+9*(Y=-9)+10*(Y=9)+(Y=100)
6080 IF Y0 THEN GOTO 6055

The variable Y here is supposed to represent the distance from the start square Z, and it’s checking both that moves are in-bounds (in line 6055) and legal captures (in line 6060) and moving to do the capture if they are. If this is not a legal move, line 6070 is supposed to advance and consider the next direction. It is using the “do math with equality checks” trick that I discussed in the context of speeding up C64 BASIC programs. In Sinclair BASIC, true comparisons have a value of 1 instead of -1, but this test loses the plot very quickly. Y starts at -11 (thanks to line 6050) and it is supposed to cycle through -11 to -9 to 9 to 11 to cover the four directions it can move. However, instead of going to 11, it instead goes to 10, which will never yield a legal move, and then there is an additional, always false test against 100. The end result of this is that the AI will never actually consider any captures to the southeast.

This error is repeated for noncapture moves as well, so it also will not consider moves to the southeast. The implementation seems to have grasped that something was very wrong with this code, because it has multiple levels of flailing about trying to find valid moves, and if after all of it it still can’t find any, it concedes the game and quits.

That’s actually a valid thing to do—it requires incredibly contrived play, but it is at least theoretically possible to stalemate your opponent with this ruleset, and so if that happens the AI needs to give up instead of searching forever for a move that does not exist.

In porting over the computer’s move routines, though, I removed these faults. Taking the logic bit by bit and merging it together into a coherent whole, the rules it were using can be summarized neatly:

  • If a capture is possible, make the capture closest to the opponent’s edge.
  • If no captures are possible, randomly select a piece and attempt to make a non-capture move with it, preferring to advance on the opponent.
  • If, after 200 or so attempts to find a movable piece, you still have found no move, conclude you have stalemated and concede the game.

The BASIC implementation actually selects squares completely at random and treats that square not having a computer piece on it as a failed attempt at a move, which makes it much more likely to concede unnecessarily as it loses pieces. In my port, I only decrement the attempt counter when it finds a piece but finds it is blocked. This does mean that my implementation will hard-lock the computer if there are no pieces on the board at all, but this cannot happen according to the rules of the game; the human player will win when the computer is reduced to two pieces.

The Main Loop

The main loop is pretty simple. It’s a little more complicated than the BASIC version, because we need to manage selecting a game mode, and we let the player choose whether they want another game after a game ends instead of just quitting out to BASIC, but the basic concept is still very similar. It’s just a loop that alternates moves by the human and computer, drawing the board and checking for game-over after each move.

With these components in place this gives us a creditable port of both Mandala Checkers and Chopper Checkers, in a single program, that can load and run on a 2KB Timex Sinclair. That was my minimum goal, but I’m going to mess with it a little more before I conclude that I’m done with the project.

Mandala: Premature Optimization

Before moving on to tackle the implementation of the next part of Mandala Checkers, I did a quick review of my initialization and display code to comment it more thoroughly and to see where I could optimize or generalize it. I found a couple of places where I could make my support routines more powerful and this ended up shrinking my program-so-far from 492 bytes to 465—over 5%.

The optimizations I found are sound—that is, they don’t actually break anything—but they also look incredibly sleazy. With this post I’ll look a little bit at those optimizations and show how they work. It turns out they’re valuable in general and not just as one-off sneaky tricks.

Trick 1: Beyond Call Elimination

Some optimizations don’t depend on any of the context or instructions around them. These are called peephole optimizations—they are essentially a search-and-replace transform for sequences of logical but inefficient instructions that can be replaced with smaller, more efficient, and possibly more obscure instructions.

There’s not always more obscure, though. Here are two rules involving jumps and procedure calls that are extremely logical:

  • A jump instruction to the very next instruction may simply be deleted.
  • A procedure call instruction followed immediately by a procedure return instruction may be replaced with a simple jump instruction.

When we put these rules together, we get a less obviously logical rule:

  • When two procedures are defined back-to-back, and the last thing the first procedure does is call the second, both the call to the second routine and the first routine’s return instruction may be deleted.

This third rule follows immediately from applying the second rule and then the first rule, and it suggests that if we organize our program’s functions appropriately, we can save some space by allowing some function calls to “fall through” into other functions. It turns out that my initialize-board code can reasonably fall through into the draw-board-on-screen code. That looks fine now, but it will start looking weird later when I want to redraw the board after a player makes a move—it will sort of look like I’m calling into the middle of the initialize-and-draw-board subroutine.

It gets weirder when the first procedure calls the second procedure multiple times. I don’t have an example of this yet in Mandala, but a whole lot of my test code needs to print out 16-bit hex values. I do this with three routines; hexout_16 calls hexout_8 twice, and hexout_8 calls hexout_4 twice. In each case, the final call can be eliminated, and the resulting code is one where hexout_16 is doing procedure calls into the middle of itself:

        push    hl
        ld      a, h
        call    hexout_8
        pop     hl
        ld      a, l
        push    af
        call    hexout_4
        pop     af
        and     15
        add     28
        rst     $10

If you actually step through it, this code is doing some extremely strange things with its stack, especially if it’s in a context where hexout_16 is the only function that’s ever actually externally called. But if you break it up mentally into three functions that fall into each other, the design clears up right away.

Trick 2: Doing Trick 1 Backwards

The main thing I realized from my review of Mandala so far was that I was doing a lot of duplicative or unnecessary work when printing out status or prompt strings. My only routine for printing anything out was a simple routine named print_at, which took a pointer to a two-byte screen location code followed by a string of characters. It would move the cursor to that location code, then print out characters until it reached a value of 255. (Normally one stops at a character code of 0, but the ZX81’s wacky non-standard character set uses character zero for space.) Making this call cost six bytes, and so I had several places where I would print out a bunch of unnecessary newlines or spaces to spare myself the call, and there were other places where I would call it multiple times in a row to write at various parts of the screen.

This is inconvenient. I’d be far better off generalizing the function so that if it saw, say, a 254, that would make it treat the next two bytes as a new location code. The routine started out looking something like this:

  1. Read two bytes from the string and move the cursor accordingly.
  2. Advance the string pointer two bytes to skip over the location code.
  3. Read a byte from the string.
  4. If the byte is 255, return from the function.
  5. If the byte is not 254, go to step 8.
  6. Advance the string pointer one byte to skip past the 254.
  7. Go to step 1.
  8. Print out the character read.
  9. Advance the string pointer one byte.
  10. Go to step 3.

There are a number of things we can do with this. The most obvious is to notice that if I ever need to print something out without relocating the cursor first, I can set a label print in the middle of the function, letting me effectively say “start at step 3.” This produces a result identical to what would happen if print_at moved the cursor and then called print… as long as we don’t look too closely at the loopback in step 7.

The second thing we can do is notice that there’s no reason we ever have to start at the first step stated. Instead of jumping around with 254 case, we can change step 5 to “If the byte is 254, go to step 0“, remove step 6 entirely, and move step 5 to before step 1.

That’s cute, and sound, and reliable, and it preserves the exact semantics of the previous function, but it’s also ugly. Is there a better way?

There is. The whole reason we need to have these extra jumps is because we need to increment the string pointer and then jump somewhere, and our tests only let us jump or return as the result of a test. But the real problem here is that we’re advancing the string pointer too late, all the way down in step 9! If we move that string pointer advance to just after step 3, we can make the test simply be “If the byte is 254, go to step 1” and our work is entirely done. It’s both cleaner and smaller.

Adding functionality does still cost a bit of space, though: interpreting 254 specially added four bytes to my print_at routine. However, setting up the pointer and making the call costs six bytes, so being able to wipe out even one call as a result results in a space savings. I was able to wipe out much more than one call.

Harm Reduction

But wait, modern programmers might observe. This involves some pretty aggressive use of crisscrossing GOTO statements. GOTO statements are considered harmful, they might say. Aren’t I corrupting the youth of today, they might ask.

To this I answer, no, no I am not, and I am not for a variety of reasons. The first is that all of these jumps are within a single screenful of code, and this was considered acceptable by the more practical developers of that era. The second is that the structure programming wars of the “GOTO considered harmful” era also eschewed things we take for granted as part of well-structured code today, such as mid-function return statements, or break and continue statements within a loop.

And finally, I’m in the clear here because the actual control flow here doesn’t even require break or continue:

  • Repeat:
    • Consume two characters from the string
    • Move the cursor as specified by those two characters
    • Repeat:
      • Consume one character from the string
      • If it is not 254 or 255, print it

      While character consumed was not 254 or 255

    While character consumed was not 255

That’s rigid enough to satisfy even the most rigorously excessive of structural programming requirements. It’s not quite what I actually do, directly—the actual machine code uses the equivalents of break and mid-function return, but the Z80 instruction actually has “return immediately if condition is true” as a specialized one-byte instruction. There’s really no reason to not take advantage of it if it’s there.

Trick 3: Dual-purpose Data

With the new printing procedure in hand, I went through the code and started coalescing large collections of print statements into single calls. However, there was still one nasty knot of calls left. These calls revolved around printing out the current score. It needed to print out the labels for each score, and then it needed to compute the character codes based on the current score and print those out individually.

The trick here is that this is not remotely necessary. Since the scores in this game are only ever one digit long, we can just embed the scores directly in the score string, and look at the character we’d print to determine what the score was. That moves the interpret-score-as-character code out of the board display and into the has-a-player-won logic, but it lets us get rid of a bunch of unnecessary math and lets us coalesce several more print statements.

This is also an instance of a somewhat more general technique where you use your display data directly as model data. This is a bad idea in modern GUI applications, where the view can change unpredictably and you want a stable model to inform any redraws, but for fixed character-cell displays like these systems, there’s no reason to not use the video matrix as a repository in memory of what is currently on the screen.

Mandala Checkers: Initialization and Display

The wallpaper program last time was just a warm-up. In this post I’ll kick off my intended rescue project.

This program was published in 51 Game Programs for the Timex 1000 and 1500 as two very similar listings under the same chapter: Mandala Checkers and Chopper Checkers. I’m not convinced either is similar enough to checkers to deserve the moniker, though. Here are the rules:

  • Players take turns moving a single piece one square diagonally in any direction.
  • Alternatively, they may move a piece two squares diagonally in any direction if this would jump over an opposing piece. Doing so removes the opponent’s piece from the board and scores the player a point.
  • The first player to score seven points wins.

This is only a checkers variant if you allow every piece to start out as a king, captures being non-mandatory, and captures not involving multiple jumps to count as variations within the checkers space. I’ll keep the name for these ports, but I’ll let the record show how far we are from the stated inspiration.

Whether it’s really checkers or not, though, it does seem like an interesting little game to mess with. Unfortunately, it’s both reasonably large (about 4KB of actual code) and pretty much every aspect of the implementation is problematic at best.

Over the course of a few posts, I’m going to reimplement this in machine code instead of BASIC, and try to refine it into a fully respectable program. If it ends up actually being fun to play, too, then it might make another fun porting challenge the way Light’s-Out has been.

Since this is shaping up to be a much larger program, and since I’ve already done my walkthroughs of developing for the ZX81, I’m not going to walk through this project instruction by instruction the way I have for other, more recent programs. Instead, I’ll be committing updates to GitHub as I go along, and I will keep my discussion in these posts to more high-level decisions and analysis.

The Initial Implementation

The top level organization of the program is pretty simple, at least. The board is initialized and drawn, and then it alternates human and computer moves until somebody wins. Breaking it down by line range and program size, we get this table:

           Program header       116 bytes
  10- 100: Main loop            209 bytes
6000-6999: Computer move      1,843 bytes
7000-7999: Human move           624 bytes
8000-8999: Draw board           364 bytes
9000-9999: Init board           996 bytes
           Display file/vars    794 bytes
           Total              4,946 bytes 
           Total non-metadata 4,036 bytes

The only differences between Mandala Checkers and Chopper Checkers is the initial board layout and the graphics used to draw the pieces. Here’s the board it creates for Chopper Checkers:


Here are the issues I’ve identified with this:

  • The pieces on the right look awful. I think the idea is X vs O, but these Xs look more like an open white circle on a black background, which is confusing when the actual open background squares are white.
  • The players are left vs. right, when traditionally one would lay this out to be bottom vs top, with the human player on the bottom.
  • Having numbers on both axes makes moves ambiguous, and the sample move doesn’t really help with that.
  • Initialization and initial board drawing takes 8 seconds, and that’s with the speedup induced by disabling the display during setup. Without it setup takes more like 30 seconds.

So for our first steps into this rescue mission, let’s reimplement the initialization and fix the display:


Rotating the pieces into place is easy enough, and I settled on traditional algebraic notation from chess to represent rows and columns. This did mean that I needed to reverse the direction of the Y axis. Flipping the two rows for the computer pieces’ display made them look much nicer, and now this looks more like black vs white checkers.

The machine code for making the display was also quite a bit denser than the BASIC was. I added a little menu at the front to let me also swap within the same program between the Chopper setup and the Mandala setup:


Where the BASIC code for the initialization and display of even one of these was 1,360 bytes, the machine code was only 492. If this kind of savings keeps up, we may well be able to fit the whole program into the Timex Sinclair 1000’s native 2KB instead of requiring the 16KB RAM addition!

Continue reading

Rescuing a ZX81 Type-In Program

One of the first books of type-in programs I had as a child was 51 Game Programs for the Timex Sinclair 1000 and 1500, edited by Tim Hartnell. I’ve cited two games from it already on this blog—Surge was a nice demonstration of supplementing BASIC with machine language routines, and my NES project Galaxy Patrol was loosely based on the game from this book of the same name.

That said, very few of the listings in this book were actually good. Many of the programs were extremely buggy or incomplete, or had a lot of baffling design decisions that imply that these were programs that were written and then published without much of a formal editing pass. (Some clearly weren’t tested at all, with obvious crashes immediately on startup.) The original Galaxy Patrol itself did not even properly end the game; colliding with an asteroid or running out of fuel simply printed your score near the top of the screen, where it promptly scrolled off.

Some of this was probably to try to at least sort of function on the original 1KB ZX81, but this excuse holds less water when the foreword insists on the 16KB RAM expansion. (The US’s edition, the Timex 1000, had 2KB of RAM by default which allowed a reasonable number of the programs to operate unmodified, but also left room to have more intuitive user interaction flows that were nevertheless notably absent.) Some of it may also have simply been that the BASICs on these systems were extremely awkward and unpleasant to use and often very slow to run.

The end result is that there are a number of programs in this book that would actually be neat little programs, but which need to be rescued from themselves to actually work. Galaxy Patrol could itself be considered just such a rescue operation, but I’ve got two more now that I’d like to take on. One is simple enough I can do it in a single post; the other will be a longer-term project.

The program itself would ask you for your name and then use it to create an evolving mosaic display that would scroll down the screen forever:


And this is cute, but it’s also really slow. It takes about 30 seconds to fill the screen, and the speed is extremely inconsistent. We can do better than this.

The Original Program

Here is the original listing, as it was presented in the book:

  40 INPUT A$
  45 LET A$=A$+" "
  46 IF LEN A$<16 THEN GOTO 45
  47 LET A$=A$( TO 16)
  50 FOR G=1 TO 16
  60 IF RND>=.5 AND CODE A$(G)<128 THEN LET A$(G)=CHR$ (CODE A$(G)+128)
  70 IF RND>=.5 AND CODE A$(G)>127 THEN LET A$(G)=CHR$ (CODE A$(G)-128)
  80 NEXT G
 120 FOR H=1 TO 16
 130 FOR A=-16 TO 16
 140 IF A=0 THEN GOTO 160
 150 PRINT A$(ABS A);
 160 NEXT A
 180 LET A$=A$(2 TO )+A$(1)
 190 NEXT H
 200 GOTO 50

This is an example of what I mean about there not being a whole lot of editorial control at this point. Even though the program is only 22 lines long, it’s got extremely inconsistent line numbering, spends memory it doesn’t need to (there’s no reason for G and H to be separate variables), and introduces a weird assymetry in its handling of reverse video. Lines 60 and 70 both either set or remove inverse-video from a character based on whether it’s not already set/removed and whether or not a coin flip comes up heads. However, because the coin is always flipped twice, this also means that there’s a 50% chance of a reverse-video tile being turned off but only a 25% of a normal-video character being turned on. Most BASICs would let you put multiple statements after a THEN clause but the ZX81 BASIC doesn’t—in fact, it doesn’t even allow multiple statements per line at any time.

As for the rest of the program’s behavior…

  • Lines 10-40 get the input from the user.
  • Lines 45 to 47 expand it with spaces or truncate it as needed so that it is exactly 16 characters long.
  • Lines 50-80 flip a quarter of the normal-video characters to reverse video, and flip half the reverse-video characters to normal-video.
  • Lines 120 through 190 print out 16 lines of the wallpaper pattern using the current reverse-video settings, and more specifically…
  • Lines 130 through 160 print the name string backwards then forwards, a character at a time.
  • Line 170 pushes the screen up a line so that the program doesn’t stop due to printing off the bottom of the screen.
  • Line 180 rotates the string one character to the left, which is what lets the wallpaper pattern read sensibly top to bottom.
  • Finally, line 200 jumps back up to reassign the reverse-video status of the characters and then loops forever. The program stops when the user hits the BREAK key.

The big problem with this at runtime is really the slow and inconsistent output speed. The inconsistent output speed is mainly because lines 50-80 take over a second to run.

Improving the Implementation

The most direct solution to our speed problem is to port the display and update routines to machine language. However, the introductory code that gets and prepares the string in the first place is not so easy to replace. INPUT is a sophisticated and finicky beast and we don’t want to replicate it or try to interact with it outside of BASIC. But if we do let BASIC handle the early parts, we have the question of how to tell our machine language program about it. Variables live at pretty random locations and can change at runtime as a result of text being printed out on the screen. We would really prefer a fixed buffer somewhere, and if we want it to be fixed, that means it has to be part of the program itself. That’s a little annoying because it will show up in our listing.

But we can make a virtue of this annoyance: if we put a 16-byte comment at the top of our program we can use that comment as a buffer independent of the big mess of nonsense that our machine code will be shown as. --WALLPAPER ML-- is conveniently 16 bytes long. After counting out some more offsets in the code, we can nail down the BASIC part of our improved program:

   2 REM {a big mess of unreadable machine code}
  30 INPUT A$
  40 LET A$=A$+" "
  50 IF LEN A$<16 THEN GOTO 40
  60 FOR I=1 TO 16
  70 POKE 16513+I,CODE A$(I)
  80 NEXT I
  90 RAND USR 16536

Continue reading

An Ambition, Completed

I belong that that somewhat narrow cohort that straddles Generation X and the Millenials—we grew up with the microcomputer omnipresent but we remember a time before The Internet. The cohort has gotten various attempts at nicknames but I’ve always been fond of the name Oregon Trail generation, after the famous educational software.

My parents were also somewhat technically inclined (or at least gadget-oriented), and as a result I had grown up alongside a number of game consoles and home computers. A lot of the work I did on those, both as a hobby and for schoolwork, shaped my talents in the years to come.

One of my goals here on Bumbershoot over the past few years has been to demystify the machines I grew up with. I’m now a genuine software professional, after all—surely I’ve learned enough to see how the tricks were done and to join in myself, if a bit late.

As such, once I realized that this was what I was doing, I formalized the nature of the task and started getting systematic about it. And now, with this latest release, I have finished it.

The Rules

Here are the rules I had set for this task.

  • Childhood ends in 1992. This is a somewhat arbitrary division point, but it turns out I can pick any year from there to my 20th birthday and not change the list of targeted systems, and 1992 means I don’t have to distinguish between when a platform was released in North America and when I got access to it.
  • Systems require a full year of exposure to count as “one I grew up with.” I had brief acquaintance with the Macintosh and the SNES, but both were systems I didn’t really have any experience with until adulthood. I first beat Super Metroid in 2010.
  • Programs must run on the metal. Interpreted languages are permitted only to the extent that they can then transfer control to machine code. This started out as “BASIC and LOGO don’t count” but eventually became a requirement that, for at least one project, I produce a binary where every single byte in the program exists because I put it there.
  • Programs must be for the platform. Writing a highly portable C program and jamming it through a compiler for each system doesn’t count. I should make use of a generous set of system-specific capabilities that let it show off what it’s good at. Any stunt that reaches demoeffect quality automatically achieves this goal.
  • The platform must have been intended to receive third-party programs. That means the graphing calculators we had in high school don’t count, even though they had Z80 chips in them and you could program them with special connectors and PC programs that no longer run on 64-bit operating systems.
  • Period tools are not required. Since half the targets are game consoles, cross-development tools were the norm even at the time.

So, without further ado, the systems I grew up with.

Continue reading

ZX81: Mind Games

I was idly fiddling around with the ZX81 again, and I found a cute little anomaly.

Start with a simple line of code like 10 PRINT "HELLO". That gets stored in memory at the top of your BASIC program space, and the first two bytes of it are a 2-byte big-endian number. We can see that because if we then type POKE 16509,39 the line number changes to 9994, and POKE 16510,15 takes it up to 9999. A moment’s work will show that 39*256+15 is indeed 9999.

9999 is also the largest permissible line number in Sinclair BASIC. If you try to type in a line with a five-digit line number, a syntax error will be flagged on the line number itself.

You may also recall from back when we were hiding machine language programs away from Sinclair BASIC that we’d do that by putting a $FF byte right after the last real line of program code—corresponding to a notional line number of at least 65280. Perhaps too-high line numbers become invisible! We can see that if we POKE a 255 – or even a 64 – into 16509 – the line vanishes, and it comes back if we put a 39 back in.

But what happens if we leave the high byte alone and just poke a 16 into 16510, for a line number of 10000? It turns out it doesn’t vanish—instead, we get a line number of A000. That looks suspiciously like hex, but it isn’t (not least because $A000 isn’t 10,000, it’s 40,960). What we’re seeing is the top digit overflowing into the characters after the numbers. The ZX81’s wildly nonstandard character set puts the letters immediately after the numbers, so overrun goes directly to letters.

A little experimentation shows that the largest value we can get a line number to take before it disappears is “G383”, which corresponds to a hex value of $3FFF. That’s interesting because it shows how it’s deciding whether or not a line is “real”—it’s not checking to see if the value is above 9999, it’s checking to see if it fits in 14 bits, because that’s how many bits it takes to represent 9999 ($270F).

That’s all well and good, but are they really BASIC lines? Creating a full program that includes lines above 10000 is challenging but not impossible—we just have to work backwards with a set of commands like these:

  • 10 PRINT "END"
  • POKE 16509,42
  • POKE 16510,248
  • 10 PRINT "MIDDLE 2"
  • POKE 16509,41
  • POKE 16510,4
  • 9000 PRINT "MIDDLE 1"

That produces this listing:


Not only that, this program actually runs: it prints out BEGINNING, MIDDLE 1, MIDDLE 2, and END just as you’d expect. Branches even work: while you can’t refer to line B000, if you add a line 2000 GOTO 11000 to the program, it will dutifully only output BEGINNING and END.

So, is this good for anything?

Probably not, really. Lines of code past 9999 are uneditable inside the program, and given the line numbers and line lengths alone for 10,000 lines of BASIC would consume 40KB before you even had any actual code that actually did anything, even on a Spectrum 128 you’d run out of memory long before you ran out of line numbers.

But if you were creating the program with an external code generator and wanted to have a piece of code at the very end that could be seen and run but not edited, this would let you do it.