Category Archives: sinclair

An Ambition, Completed

I belong that that somewhat narrow cohort that straddles Generation X and the Millenials—we grew up with the microcomputer omnipresent but we remember a time before The Internet. The cohort has gotten various attempts at nicknames but I’ve always been fond of the name Oregon Trail generation, after the famous educational software.

My parents were also somewhat technically inclined (or at least gadget-oriented), and as a result I had grown up alongside a number of game consoles and home computers. A lot of the work I did on those, both as a hobby and for schoolwork, shaped my talents in the years to come.

One of my goals here on Bumbershoot over the past few years has been to demystify the machines I grew up with. I’m now a genuine software professional, after all—surely I’ve learned enough to see how the tricks were done and to join in myself, if a bit late.

As such, once I realized that this was what I was doing, I formalized the nature of the task and started getting systematic about it. And now, with this latest release, I have finished it.

The Rules

Here are the rules I had set for this task.

  • Childhood ends in 1992. This is a somewhat arbitrary division point, but it turns out I can pick any year from there to my 20th birthday and not change the list of targeted systems, and 1992 means I don’t have to distinguish between when a platform was released in North America and when I got access to it.
  • Systems require a full year of exposure to count as “one I grew up with.” I had brief acquaintance with the Macintosh and the SNES, but both were systems I didn’t really have any experience with until adulthood. I first beat Super Metroid in 2010.
  • Programs must run on the metal. Interpreted languages are permitted only to the extent that they can then transfer control to machine code. This started out as “BASIC and LOGO don’t count” but eventually became a requirement that, for at least one project, I produce a binary where every single byte in the program exists because I put it there.
  • Programs must be for the platform. Writing a highly portable C program and jamming it through a compiler for each system doesn’t count. I should make use of a generous set of system-specific capabilities that let it show off what it’s good at. Any stunt that reaches demoeffect quality automatically achieves this goal.
  • The platform must have been intended to receive third-party programs. That means the graphing calculators we had in high school don’t count, even though they had Z80 chips in them and you could program them with special connectors and PC programs that no longer run on 64-bit operating systems.
  • Period tools are not required. Since half the targets are game consoles, cross-development tools were the norm even at the time.

So, without further ado, the systems I grew up with.

Continue reading

Advertisements

ZX81: Mind Games

I was idly fiddling around with the ZX81 again, and I found a cute little anomaly.

Start with a simple line of code like 10 PRINT "HELLO". That gets stored in memory at the top of your BASIC program space, and the first two bytes of it are a 2-byte big-endian number. We can see that because if we then type POKE 16509,39 the line number changes to 9994, and POKE 16510,15 takes it up to 9999. A moment’s work will show that 39*256+15 is indeed 9999.

9999 is also the largest permissible line number in Sinclair BASIC. If you try to type in a line with a five-digit line number, a syntax error will be flagged on the line number itself.

You may also recall from back when we were hiding machine language programs away from Sinclair BASIC that we’d do that by putting a $FF byte right after the last real line of program code—corresponding to a notional line number of at least 65280. Perhaps too-high line numbers become invisible! We can see that if we POKE a 255 – or even a 64 – into 16509 – the line vanishes, and it comes back if we put a 39 back in.

But what happens if we leave the high byte alone and just poke a 16 into 16510, for a line number of 10000? It turns out it doesn’t vanish—instead, we get a line number of A000. That looks suspiciously like hex, but it isn’t (not least because $A000 isn’t 10,000, it’s 40,960). What we’re seeing is the top digit overflowing into the characters after the numbers. The ZX81’s wildly nonstandard character set puts the letters immediately after the numbers, so overrun goes directly to letters.

A little experimentation shows that the largest value we can get a line number to take before it disappears is “G383”, which corresponds to a hex value of $3FFF. That’s interesting because it shows how it’s deciding whether or not a line is “real”—it’s not checking to see if the value is above 9999, it’s checking to see if it fits in 14 bits, because that’s how many bits it takes to represent 9999 ($270F).

That’s all well and good, but are they really BASIC lines? Creating a full program that includes lines above 10000 is challenging but not impossible—we just have to work backwards with a set of commands like these:

  • 10 PRINT "END"
  • POKE 16509,42
  • POKE 16510,248
  • 10 PRINT "MIDDLE 2"
  • POKE 16509,41
  • POKE 16510,4
  • 1000 PRINT "BEGINNING"
  • 9000 PRINT "MIDDLE 1"

That produces this listing:

zx_bad_lines

Not only that, this program actually runs: it prints out BEGINNING, MIDDLE 1, MIDDLE 2, and END just as you’d expect. Branches even work: while you can’t refer to line B000, if you add a line 2000 GOTO 11000 to the program, it will dutifully only output BEGINNING and END.

So, is this good for anything?

Probably not, really. Lines of code past 9999 are uneditable inside the program, and given the line numbers and line lengths alone for 10,000 lines of BASIC would consume 40KB before you even had any actual code that actually did anything, even on a Spectrum 128 you’d run out of memory long before you ran out of line numbers.

But if you were creating the program with an external code generator and wanted to have a piece of code at the very end that could be seen and run but not edited, this would let you do it.

Compatibility Across ZX Spectrum Variants

The Spectrum is a new platform for me, and that means that as is traditional, I’ve ported Lights-Out to it. The final tape file weighs in at 1,050 bytes, which is the largest 8-bit implementation I’ve done to date, but that extra space is being used to buy something. One of those things is obvious:

spectrum_lights_out

This is my first implementation that actually makes use of custom graphics of any kind. The default graphics characters for the Spectrum are less comprehensive than even the ZX81, but a certain amount of custom graphics is painless and so that is clearly what developers are expected to use.

The second thing can’t be shown on screenshots, but also accounts for some of the space used. Despite the fact that the two machines are broadly incompatible, this program will run with no modifications and no model-detection code on both the 16KB Spectrum and the Timex Sinclair 2068. (By virtue of running on the 16KB Spectrum it also runs on the rest of the computer line, because within that line things stayed pretty consistent. There are a few ways you can go wrong, but not many.)

I’ve already covered the general design of this program, and the Spectrum/Timex port is really just an expansion and adaptation of the ZX81 port. I’ve uploaded the source code and updated the Lights-Out Collection download to include this version too.

Below the fold I’ll talk about the changes that had to be made to move from ZX81 to Spectrum, and the compatibility restrictions that permitted a 100% machine code program to run on the Spectrum and the TS2068.

Continue reading

SpectraLink: Creating tape files from scratch

Last time, we created a self-loading and auto-running BASIC/ML hybrid program and saved the combination out to tape. We built our program in the emulator using ordinary BASIC commands. That’s the most painless workflow yet for making a machine-code program with period tools—at least with what we’ve explored here.

But it’s 2017. We want to have cross-development workflows that don’t require us to manually fire up an emulator and mess around with memory injection and hand-written BASIC programs. Let’s get this up to speed.

Continue reading

Getting Started With the ZX Spectrum

It’s time to revisit an old friend.

spectral_sorcery

Well, that’s a lie. The ZX81 wasn’t really an old friend in the first place, and as we can see above, this isn’t a ZX81—we’ve got not just mixed-case text but working exclamation points!

This is its successor, the ZX Spectrum. This machine never reached American shores, unless you count the spectacularly ill-fated Timex Sinclair 2068, which you shouldn’t. The TS2068 had a completely different RAM layout and ROM system, which in turn meant that basically no software ran on it unless it was 100% BASIC, and maybe not then.

But the Spectrum was very well-beloved elsewhere in the world, and had many clones, and is a much cleaner platform for experimenting with Z80 assembly code than most of my other options.

If you want to work with these yourself, FUSE is the premier Spectrum emulator overall, while EightyOne, the ZX81 emulator I had recommended for Windows, also covers the primary Sinclair line.

In this post, I’ll be outlining what it takes to produce a machine code program for the Spectrum, and how to mix it with BASIC. This is a lot different than the systems we’ve looked at previously, to the point that it almost feels like this is the first system we’ve looked at that actually intended hybrid BASIC/ML code as a common use case.

Continue reading

Lights Out: ZX81 Release and Design Notes

So. On to my ZX81 program – an implementation of the Lights Out puzzle. This is one of my go-to simple programs for putting an interactive system through its paces.

lightsout81

As such, this is not the first time I’ve implemented the puzzle—I included C64 implementations on the first Bumbershoot collection back in 2015…

lightsout64

…and quietly included source for a DOS port that was otherwise not distributed…

lightsout86

…and after I’d worked out the basics of linking directly against the Windows layer in Hello World 4 Different Ways I also quietly did a Windows console port based on the DOS edition.

lightsout32

Despite all that, I’ve never really talked about implementing it. So let’s talk a little about the puzzle and how implementation strategies differ when developing for an old home computer compared to a more powerful system.

The general design

The puzzle itself is easily modeled as a 5×5 rectangular array of boolean values, with a move that selects an index and toggles its neighbors in the four cardinal directions, if they exist. A modern implementation would separate the model (this 5×5 array and operations upon it) from the view (the actual display of the puzzle).

As it happens, none of my implementations do this. Because in each case the display is text on the screen, the screen’s display of the puzzle itself is used in place of the 5×5 array. Moves directly manipulate the screen memory and the puzzle state is read by consulting it.

So despite being implemented four ways for four computers using four different instruction sets, all four implementations are broadly similar:

  1. The static parts of the display are drawn. This includes the title, the puzzle board in solved state, and some kind of message area at the bottom of the screen. If we need to do something special to have a screen to draw on, that also happens here.
  2. Generate a random but solvable puzzle.
  3. Read the keyboard and execute the requested move. (Alternately, if the user has requested a reset, go back to step 2, and if the user has requested to quit, proceed to the final step.)
  4. Check to see if the puzzle is solved. If it’s not, go back to step 3.
  5. Congratulate the user on a solved puzzle. Ask if they want another game, and if so go back to step 2.
  6. Clean up the screen and return to the context from which the program was invoked.

Let’s take each of these in turn.

Displaying the board

This is, at the end of the day, a bunch of fancy print statements. Different platforms handle things like colors and the edges of the screen differently, but the simplest approach usually involves blitting things directly into screen memory.

Generating Puzzles

I accomplish this by executing a thousand or so random moves. This is not the most efficient way to produce a guaranteed-solvable Lights-Out puzzle but it does produce a nice visual effect of the puzzle being scrambled. (Since making the same move twice perfectly undoes the move, the optimum way to generate a puzzle is to flip a coin for each of the 25 spots and execute a move on that spot if the coin comes up heads.

Making Moves

All my implementations use roughly the same algorithm for this. Each letter appears on the screen as the center of a notional button to press; I have a routine that computes the location of each letter within the screen memory. I can then examine screen memory at that location to flip the light there. I can also then compute the addresses of its neighbors to the north, south, east, and west. In each implementation the board and screen are designed such that moving off the board hits a point of valid screen memory that does not have a letter in it—this means I don’t need to bounds-check my moves but instead can simply see if there’s a letter at the point of interest.

Determining if the user has won

This is the same algorithm across all implementations. Go through each letter from A through Y, compute the part of screen memory that cell resides in, and examine it to see if it’s on. If it is, then we know the puzzle is yet unsolved. If we make it all the way through the loop, then we know that victory has been achieved.

Platform-Specific decisions

Making the program fit the platform requires more decisions than just the ones above, of course. I’ll run through these in the order I implemented them.

Commodore 64

This was the first implementation. As a result, it matches my description above almost exactly, with no additions or compromises. The board is drawn with character graphics, and the character cell with the letter in it is the only one that changes. If a letter is on or off, it is shifted in and out of inverse video, and then the color memory is independently changed to match. “On” was light red inverse video, and “off” was dark grey on a black background. Most of the rest of the text was in a lighter grey, but not quite reaching full white.

Random numbers are generated by calling out to the BASIC ROM, which is a bit opaque but is also very compact.

This is a reasonably compact program, weighing in at 977 bytes. It could be ported to the VIC-20 simply by altering a few constants.

MS-DOS

The character graphics on the IBM PC lend themselves more readily to designing displays like this, and with 80 columns to work in the result here feels the cleanest of all my implementations.

DOS (more properly, the PC BIOS) does not really have a notion of “inverse video,” though. Instead, background and foreground colors are encoded separately alongside each letter. This means that the solution for displaying lights as on or off ends up being rather more elegant, because we need only adjust or inspect these color attributes to make our moves.

The random number generator here is a simple linear congruential generator that lets the x86 chip handle the multiplication.

The DOS version is a .COM file that is the smallest of the implementations overall, at 812 bytes.

Microsoft Windows (Windows 2000 or later)

The core logic here is mostly an adaptation of the DOS implementation. However, working with the console is complicated because there’s no guarantee that the window will have a display large enough to hold the board in it. The Windows Console API addresses this by allowing the developer to allocate and provide their own text buffer. This buffer cannot be reliably accessed directly—there are API functions to do so instead—but it behaves in a manner roughly analogous to the old PC BIOS color text mode. Like the DOS implementation, we only ever read or write color data (“attributes”, as the API calls them) after drawing the initial board.

The Windows Console is also a Unicode environment, so to get access to our box-drawing characters and such all of our strings are represented as UTF-16. Fortunately for us, NASM has a convenient macro for that.

The end result is a reasonably faithful translation the DOS implementation into 32-bit x86 code (notable mainly for shortening the RNG routine from 24 instructions to 10) and replaces all reads of screen memory and syscalls to BIOS or MS-DOS with Windows ABI calls. (I give simpler examples of how to do this in my old article about Hello World four different ways. The end result depends only on kernel32.dll, but the alignment requirements for a Windows executable make this the largest of the programs, weighing in at 5,120 bytes.

ZX81/TS1000

The game logic here is largely the same as the other three implementations, but the display logic is almost entirely different. The general C64 trick of relying on inverse video for a letter to represent a light’s status has been kept, but the ZX81 doesn’t have the kind of box-drawing graphics characters that any of my previous platforms used. Instead, it offers sixteen characters which fill in the four quadrants of the character cell in all possible ways, and then a somewhat more restricted dithered-grey.

My solution to the display of the board, then, was simply to not draw any walls at all—an “off” light is a 3×3 grid of inverted spaces with the letter in inverse video in the center, and an “on” light retains a half-character-cell-wide black border but leaves the letter in normal video. I am actually very, very happy with how this looks, and if I had done this before the DOS and Windows versions I’d have seriously considered using this rendering technique instead of what I actually went with.

This does make the actual “flip” operation more expensive, since instead of writing three color bytes (DOS, Windows) or one character byte and one color byte (C64) we need to instead write nine bytes scattered across screen memory. This cost is more than offset by the fact that the initial screen draw is now just a matter of filling the whole screen with inverted spaces.

Filling the whole screen with inverted spaces—in effect, having the program provide white text on a black background—also has the happy side effect of ensuring that every line is its full fixed length long. That makes computing the addresses of letters in screen memory much easier, because it is simply the address (start of display file)+33*row+column+1.

The final size of LIGHTSOUT.P is 903 bytes, but much of that isn’t the program. The raw binary is actually the smallest of all our platforms, at 704 bytes. Some of this might be me getting better at implementing the program with practice, but the far more likely reason is the simplification of the initial board display.

ZX81 compatibility concerns

It took several tries to get the ZX81 build to really work the way I wanted it to. Once the basic implementation had been debugged, my code ran fine in the sz81 emulator, but tests on the more accurate and more configurable EightyOne emulator showed that despite having a core binary size of (at the time) 710 bytes, attempting to load it into a 1K ZX81 would lock the system and attempting to load it into a 2K system would successfully load, but would not run. As long as there was at least 4KB of expansion memory, though, it would run fine.

This is a bit mysterious, as thanks to the screen-memory-as-game-state trick this program should require no more memory than it actually takes to load. But, it turns out, that is the trick: the system does need 2KB to hold both the program and the full display. We use every byte of the screen in making the display thanks to the inverse-video background, but even if we didn’t do that, the puzzle display is more than enough to blow past our limits. So that’s why we were getting the lockup, at least; my original linking program was including a full 793-byte display file at the end of the program, and this was enough to make the load operation blow past all of RAM in the 1K case. We can fix that by replacing it with a 25-byte “compressed” display file instead. At that point it loads in 1KB, but still seems to need 4KB to run. What’s going on? Why 4 instead of 2?

The issue turns out to be in my assumption of getting to have a fully expanded chunk of screen memory even if there’s room for it. If you have less than 3.5KB of RAM, then the Sinclair ROM will re-compress the display file back down to 25 bytes as your program starts. If you have more, then it will re-expand it to the full 793. I was relying on the latter behavior in my board display code.

This happens inside the ROM’s implementation of the CLS command, which starts at location $0A2A. Early on it does indeed check the value of RAMTOP to decide whether or not to do a collapsed or expanded display. Unfortunately, it does the check about halfway through. I solve this by replicating the first half of the routine and then jumping to the expanded display-file case.

I think the accepted practice was actually to use a ROM routine at $0918, which was the machine code implementation of PRINT AT. Subtract your target row from 24 and put that in register B. then subtract your target column from 33 and put that in register C. (Yes, this puts 0, 0 just off the lower-right hand side of the screen. I don’t know either. I think the idea here is to count down how many characters are left in each direction within the screen as a whole.) This routine will move the cursor there, expand the display file as necessary (thus doing a gigantic overlapping memory memory-blit) pretty much up to the top of the machine stack) and then loads location $400e with the memory location of you wanted.

One of the nice things about having an inverse-video screen is that I don’t have to mess with that, so I don’t.

The final issue was more trivial, but annoying nevertheless. I had to drastically shorten my “out of memory” message because there wasn’t enough room left to have a display file that could actually display it. Thus my reasonably grammatical original sentence got crammed down to “2KB+ RAM REQUIRED, SORRY”. So it goes.

But now it’s done!

Downloads

I’ve collected all four implementations in binary form into one zip file. The source code for each version is in the Github repository as usual.

References

  1. Logan, I. and O’Hara, F. The Complete Timex TS1000/Sinclair ZX81 ROM Disassembly. Melbourne House, 1982.
  2. Baker, Toni. Mastering Machine Code on Your ZX81. Reston, 1982.

Getting a Decent and Fast PRNG Out of an 8-Bit Chip

Working on this Z80 project has been burning my brain a bit. I’m going to step back a bit and play around specifically with one part of it: the pseudo-random number generator.

Reader alert: There’s a whole lot of grindy code grinding in this post. If that’s not your thing, you can probably skip this article. On the other hand, if you want to see how my thought process works when I’m trying to get a clean and functional assembly language implementation of something, this article is the good stuff. Do as you will.

The usual technique I’ve used for PRNGs is the Linear Congruential Generator, but this usually relies on having a chip that can multiply for you. That’s fine for x86, but it’s much more obnoxious on the 6502 or the Z80. When I was working on the C64, I just called out to the floating point routines for RND, but when I needed an RNG for Galaxy Patrol on the NES I had no choice but to roll my own 32-bit multiplier and go with that.

I have, however, more recently become aware of the Xorshift family of PRNGs, and in particular the investigations by one Brad Forschinger for high-quality constants in the 16-bit space. His winning algorithm, in C, is as follows:

uint16_t rnd_xorshift_32() {
    static uint16_t x=1,y=1;
    uint16_t t=(x^(x<<5)); 
    x=y; 
    return y=(y^(y>>1))^(t^(t>>3));
}

He also provides an implementation in 16-bit (self-modifying!) x86 assembly, which weighs in at 23 instructions and 43 bytes.

It’s worth noting that the ARM chip, which is what powers basically every cell phone ever and also every Nintendo handheld from the Game Boy Advance on, is incredibly efficient at encoding this stuff, because nearly every ARM instruction includes the option of shifting arguments on the way in. So, for instance, the line t=x^(x<<5) becomes the two instructions:

        ldrh    r0, [r2]
        eor     r0, r0, r0, lsl #5

Running the C routine through GCC reveals that it gets it down to 12 instructions and 4 bytes of supporting data (x and y), for a total of 52 bytes and code that reads like a straightforward hand-translation. That’s in the less space-efficient ARM instruction encoding, though: when using the more compact THUMB2 encoding the same number of instructions drops to 40 bytes. Not bad at all! It’s not every day you see a compiler emit code that’s more compact than hand-tuned self-modifying assembly code.

(Older ARM chips such as the one powering the Game Boy Advance used a different compact encoding now called THUMB1. GCC can still emit this, with command flags like -marm7tdmi -mfp=softfp -mthumb. Unlike its successor, THUMB1 loses a lot of expressive power, and this means extra instructions are necessary and the routine grows to 44 bytes. Still not bad, but it’s no longer beating the x86 code.)

Meanwhile, I’m now adapting this to the Z80. This turns out to be a superb level of hassle, and working through it has been more of a chore than I’m used to it being. Below the cut, my first attempt of porting the routine to Z80 assembly language:

Continue reading