Category Archives: retrocoding

How to Not Learn Assembly Language in DOS, Part 2

Last time, we talked about writing memory directly and invoking interrupts. This time, we’ll cover writing I/O ports directly and writing our own interrupts.

The Project

Our first draft of the Smoking Clover program relied on the delay() routine in Borland’s DOS extensions to control animation speed, and used BIOS interrupts to rewrite the palette. This produces very inconsistent animation depending on CPU speed. In this article we will reorganize the animation to be governed by IRQ0, the timing interrupt. This also means that, since we’re changing the palettes inside an interrupt, we can’t use interrupts to do it—we’ll need to use the VGA chip’s I/O ports to alter colors. We’ll also need to actually properly program the timer. In the end, we should also have a reusable timer callback system.

Code begins below the break.

Continue reading

How to Not Learn Assembly Language in DOS, Part 1

After a couple of weeks of ZX81 work, let’s jump ten years forward in time and port Bill Gosper’s Smoking Clover effect to DOS and VGA. Basically every description I’ve seen of this, including the one on Wiki, is some variation of the minimal definition in ESR’s old Jargon File:

Many convergent lines are drawn on a color monitor in such a way that every pixel struck has its color incremented…. The color map is then repeatedly rotated.

The latest edition of the Jargon File actually specifies that the lines are drawn such that one endpoint is in the middle of the screen and the others are one pixel apart around the perimeter of a large square. This is more or less what I attempted when I first implemented the effect back in high school—but doing this turns out not to produce the clover effect. You get a more tunnel-like effect instead.

What worked for me was to have my endpoints march, exactly pixel at a time, around a circle much larger than the display screen. This in turn isn’t exactly the same thing as points equally spaced along the perimeter of a circle, either—what I ultimately did was solve x2 + y2 = r2 for successive values of x, in exactly the range where x <= y. I then reflected that line across the X and Y axes as well as the line y=x. This finally produced the result I wanted:

clover_001

Somewhat unusually for this blog, this program is actually written with no use of assembly language whatsoever. Back when I was exploring the HAT function, I demonstrated how to invoke interrupts and memory-mapped I/O in Turbo Pascal 5.5.

In this article I will bring C up to speed. Like my earlier work here, this will be written using Borland’s Turbo C 2.0.1, still available free of charge from Embarcadero Software.

What We Need

So, aside from a perverse breed of bragging rights, what does assembly language buy us, anyway? Mostly, what it buys us is direct access to the hardware:

  • We can trap to the operating system, firmware, or hypervisor. In DOS we do that with the INT instruction. We’ve used this all over our DOS work here, for everything from changing graphics modes to printing messages to exiting the program.
  • We may directly read and write video memory or other kinds of memory-mapped I/O. In DOS, we’ve used this just to control graphics memory—on the 6502-based systems, it’s how we controlled all our peripherals.
  • We may directly read and write the I/O ports. Not all chips have these—the 6502 and its cousins in particular do not— but they figure heavily in both the Z80 and the x86 line of chips. We didn’t touch the I/O ports on the ZX81, but we’ve used them in DOS to produce nonstandard CGA modes and to program the Adlib and Sound Blaster chips.
  • We may override the operating system’s own interrupt vectors and provide our own. We used this when we coerced digital sound out of the PC speaker.

All of these things are possible within Borland’s HLLs. In this article, we’ll focus on the first two, and leave the last two for another time.

Continue reading

Lights Out: ZX81 Release and Design Notes

So. On to my ZX81 program – an implementation of the Lights Out puzzle. This is one of my go-to simple programs for putting an interactive system through its paces.

lightsout81

As such, this is not the first time I’ve implemented the puzzle—I included C64 implementations on the first Bumbershoot collection back in 2015…

lightsout64

…and quietly included source for a DOS port that was otherwise not distributed…

lightsout86

…and after I’d worked out the basics of linking directly against the Windows layer in Hello World 4 Different Ways I also quietly did a Windows console port based on the DOS edition.

lightsout32

Despite all that, I’ve never really talked about implementing it. So let’s talk a little about the puzzle and how implementation strategies differ when developing for an old home computer compared to a more powerful system.

The general design

The puzzle itself is easily modeled as a 5×5 rectangular array of boolean values, with a move that selects an index and toggles its neighbors in the four cardinal directions, if they exist. A modern implementation would separate the model (this 5×5 array and operations upon it) from the view (the actual display of the puzzle).

As it happens, none of my implementations do this. Because in each case the display is text on the screen, the screen’s display of the puzzle itself is used in place of the 5×5 array. Moves directly manipulate the screen memory and the puzzle state is read by consulting it.

So despite being implemented four ways for four computers using four different instruction sets, all four implementations are broadly similar:

  1. The static parts of the display are drawn. This includes the title, the puzzle board in solved state, and some kind of message area at the bottom of the screen. If we need to do something special to have a screen to draw on, that also happens here.
  2. Generate a random but solvable puzzle.
  3. Read the keyboard and execute the requested move. (Alternately, if the user has requested a reset, go back to step 2, and if the user has requested to quit, proceed to the final step.)
  4. Check to see if the puzzle is solved. If it’s not, go back to step 3.
  5. Congratulate the user on a solved puzzle. Ask if they want another game, and if so go back to step 2.
  6. Clean up the screen and return to the context from which the program was invoked.

Let’s take each of these in turn.

Displaying the board

This is, at the end of the day, a bunch of fancy print statements. Different platforms handle things like colors and the edges of the screen differently, but the simplest approach usually involves blitting things directly into screen memory.

Generating Puzzles

I accomplish this by executing a thousand or so random moves. This is not the most efficient way to produce a guaranteed-solvable Lights-Out puzzle but it does produce a nice visual effect of the puzzle being scrambled. (Since making the same move twice perfectly undoes the move, the optimum way to generate a puzzle is to flip a coin for each of the 25 spots and execute a move on that spot if the coin comes up heads.

Making Moves

All my implementations use roughly the same algorithm for this. Each letter appears on the screen as the center of a notional button to press; I have a routine that computes the location of each letter within the screen memory. I can then examine screen memory at that location to flip the light there. I can also then compute the addresses of its neighbors to the north, south, east, and west. In each implementation the board and screen are designed such that moving off the board hits a point of valid screen memory that does not have a letter in it—this means I don’t need to bounds-check my moves but instead can simply see if there’s a letter at the point of interest.

Determining if the user has won

This is the same algorithm across all implementations. Go through each letter from A through Y, compute the part of screen memory that cell resides in, and examine it to see if it’s on. If it is, then we know the puzzle is yet unsolved. If we make it all the way through the loop, then we know that victory has been achieved.

Platform-Specific decisions

Making the program fit the platform requires more decisions than just the ones above, of course. I’ll run through these in the order I implemented them.

Commodore 64

This was the first implementation. As a result, it matches my description above almost exactly, with no additions or compromises. The board is drawn with character graphics, and the character cell with the letter in it is the only one that changes. If a letter is on or off, it is shifted in and out of inverse video, and then the color memory is independently changed to match. “On” was light red inverse video, and “off” was dark grey on a black background. Most of the rest of the text was in a lighter grey, but not quite reaching full white.

Random numbers are generated by calling out to the BASIC ROM, which is a bit opaque but is also very compact.

This is a reasonably compact program, weighing in at 977 bytes. It could be ported to the VIC-20 simply by altering a few constants.

MS-DOS

The character graphics on the IBM PC lend themselves more readily to designing displays like this, and with 80 columns to work in the result here feels the cleanest of all my implementations.

DOS (more properly, the PC BIOS) does not really have a notion of “inverse video,” though. Instead, background and foreground colors are encoded separately alongside each letter. This means that the solution for displaying lights as on or off ends up being rather more elegant, because we need only adjust or inspect these color attributes to make our moves.

The random number generator here is a simple linear congruential generator that lets the x86 chip handle the multiplication.

The DOS version is a .COM file that is the smallest of the implementations overall, at 812 bytes.

Microsoft Windows (Windows 2000 or later)

The core logic here is mostly an adaptation of the DOS implementation. However, working with the console is complicated because there’s no guarantee that the window will have a display large enough to hold the board in it. The Windows Console API addresses this by allowing the developer to allocate and provide their own text buffer. This buffer cannot be reliably accessed directly—there are API functions to do so instead—but it behaves in a manner roughly analogous to the old PC BIOS color text mode. Like the DOS implementation, we only ever read or write color data (“attributes”, as the API calls them) after drawing the initial board.

The Windows Console is also a Unicode environment, so to get access to our box-drawing characters and such all of our strings are represented as UTF-16. Fortunately for us, NASM has a convenient macro for that.

The end result is a reasonably faithful translation the DOS implementation into 32-bit x86 code (notable mainly for shortening the RNG routine from 24 instructions to 10) and replaces all reads of screen memory and syscalls to BIOS or MS-DOS with Windows ABI calls. (I give simpler examples of how to do this in my old article about Hello World four different ways. The end result depends only on kernel32.dll, but the alignment requirements for a Windows executable make this the largest of the programs, weighing in at 5,120 bytes.

ZX81/TS1000

The game logic here is largely the same as the other three implementations, but the display logic is almost entirely different. The general C64 trick of relying on inverse video for a letter to represent a light’s status has been kept, but the ZX81 doesn’t have the kind of box-drawing graphics characters that any of my previous platforms used. Instead, it offers sixteen characters which fill in the four quadrants of the character cell in all possible ways, and then a somewhat more restricted dithered-grey.

My solution to the display of the board, then, was simply to not draw any walls at all—an “off” light is a 3×3 grid of inverted spaces with the letter in inverse video in the center, and an “on” light retains a half-character-cell-wide black border but leaves the letter in normal video. I am actually very, very happy with how this looks, and if I had done this before the DOS and Windows versions I’d have seriously considered using this rendering technique instead of what I actually went with.

This does make the actual “flip” operation more expensive, since instead of writing three color bytes (DOS, Windows) or one character byte and one color byte (C64) we need to instead write nine bytes scattered across screen memory. This cost is more than offset by the fact that the initial screen draw is now just a matter of filling the whole screen with inverted spaces.

Filling the whole screen with inverted spaces—in effect, having the program provide white text on a black background—also has the happy side effect of ensuring that every line is its full fixed length long. That makes computing the addresses of letters in screen memory much easier, because it is simply the address (start of display file)+33*row+column+1.

The final size of LIGHTSOUT.P is 903 bytes, but much of that isn’t the program. The raw binary is actually the smallest of all our platforms, at 704 bytes. Some of this might be me getting better at implementing the program with practice, but the far more likely reason is the simplification of the initial board display.

ZX81 compatibility concerns

It took several tries to get the ZX81 build to really work the way I wanted it to. Once the basic implementation had been debugged, my code ran fine in the sz81 emulator, but tests on the more accurate and more configurable EightyOne emulator showed that despite having a core binary size of (at the time) 710 bytes, attempting to load it into a 1K ZX81 would lock the system and attempting to load it into a 2K system would successfully load, but would not run. As long as there was at least 4KB of expansion memory, though, it would run fine.

This is a bit mysterious, as thanks to the screen-memory-as-game-state trick this program should require no more memory than it actually takes to load. But, it turns out, that is the trick: the system does need 2KB to hold both the program and the full display. We use every byte of the screen in making the display thanks to the inverse-video background, but even if we didn’t do that, the puzzle display is more than enough to blow past our limits. So that’s why we were getting the lockup, at least; my original linking program was including a full 793-byte display file at the end of the program, and this was enough to make the load operation blow past all of RAM in the 1K case. We can fix that by replacing it with a 25-byte “compressed” display file instead. At that point it loads in 1KB, but still seems to need 4KB to run. What’s going on? Why 4 instead of 2?

The issue turns out to be in my assumption of getting to have a fully expanded chunk of screen memory even if there’s room for it. If you have less than 3.5KB of RAM, then the Sinclair ROM will re-compress the display file back down to 25 bytes as your program starts. If you have more, then it will re-expand it to the full 793. I was relying on the latter behavior in my board display code.

This happens inside the ROM’s implementation of the CLS command, which starts at location $0A2A. Early on it does indeed check the value of RAMTOP to decide whether or not to do a collapsed or expanded display. Unfortunately, it does the check about halfway through. I solve this by replicating the first half of the routine and then jumping to the expanded display-file case.

I think the accepted practice was actually to use a ROM routine at $0918, which was the machine code implementation of PRINT AT. Subtract your target row from 24 and put that in register B. then subtract your target column from 33 and put that in register C. (Yes, this puts 0, 0 just off the lower-right hand side of the screen. I don’t know either. I think the idea here is to count down how many characters are left in each direction within the screen as a whole.) This routine will move the cursor there, expand the display file as necessary (thus doing a gigantic overlapping memory memory-blit) pretty much up to the top of the machine stack) and then loads location $400e with the memory location of you wanted.

One of the nice things about having an inverse-video screen is that I don’t have to mess with that, so I don’t.

The final issue was more trivial, but annoying nevertheless. I had to drastically shorten my “out of memory” message because there wasn’t enough room left to have a display file that could actually display it. Thus my reasonably grammatical original sentence got crammed down to “2KB+ RAM REQUIRED, SORRY”. So it goes.

But now it’s done!

Downloads

I’ve collected all four implementations in binary form into one zip file. The source code for each version is in the Github repository as usual.

References

  1. Logan, I. and O’Hara, F. The Complete Timex TS1000/Sinclair ZX81 ROM Disassembly. Melbourne House, 1982.
  2. Baker, Toni. Mastering Machine Code on Your ZX81. Reston, 1982.

Getting a Decent and Fast PRNG Out of an 8-Bit Chip

Working on this Z80 project has been burning my brain a bit. I’m going to step back a bit and play around specifically with one part of it: the pseudo-random number generator.

Reader alert: There’s a whole lot of grindy code grinding in this post. If that’s not your thing, you can probably skip this article. On the other hand, if you want to see how my thought process works when I’m trying to get a clean and functional assembly language implementation of something, this article is the good stuff. Do as you will.

The usual technique I’ve used for PRNGs is the Linear Congruential Generator, but this usually relies on having a chip that can multiply for you. That’s fine for x86, but it’s much more obnoxious on the 6502 or the Z80. When I was working on the C64, I just called out to the floating point routines for RND, but when I needed an RNG for Galaxy Patrol on the NES I had no choice but to roll my own 32-bit multiplier and go with that.

I have, however, more recently become aware of the Xorshift family of PRNGs, and in particular the investigations by one Brad Forschinger for high-quality constants in the 16-bit space. His winning algorithm, in C, is as follows:

uint16_t rnd_xorshift_32() {
    static uint16_t x=1,y=1;
    uint16_t t=(x^(x1))^(t^(t>>3));
}

He also provides an implementation in 16-bit (self-modifying!) x86 assembly, which weighs in at 23 instructions and 43 bytes.

It’s worth noting that the ARM chip, which is what powers basically every cell phone ever and also every Nintendo handheld from the Game Boy Advance on, is incredibly efficient at encoding this stuff, because nearly every ARM instruction includes the option of shifting arguments on the way in. So, for instance, the line t=x^(x<<5) becomes the two instructions:

        ldrh    r0, [r2]
        eor     r0, r0, r0, lsl #5

Running the C routine through GCC reveals that it gets it down to 12 instructions and 4 bytes of supporting data (x and y), for a total of 52 bytes and code that reads like a straightforward hand-translation. That’s in the less space-efficient ARM instruction encoding, though: when using the more compact THUMB2 encoding the same number of instructions drops to 40 bytes. Not bad at all! It’s not every day you see a compiler emit code that’s more compact than hand-tuned self-modifying assembly code.

(Older ARM chips such as the one powering the Game Boy Advance used a different compact encoding now called THUMB1. GCC can still emit this, with command flags like -marm7tdmi -mfp=softfp -mthumb. Unlike its successor, THUMB1 loses a lot of expressive power, and this means extra instructions are necessary and the routine grows to 44 bytes. Still not bad, but it’s no longer beating the x86 code.)

Meanwhile, I’m now adapting this to the Z80. This turns out to be a superb level of hassle, and working through it has been more of a chore than I’m used to it being. Below the cut, my first attempt of porting the routine to Z80 assembly language:

Continue reading

ZX81: Alternatives

Last time we fired up the ZX81, we managed to create and link a machine language program to print out a message. That was great, but it also looked like this:

zx81hello_rem

This was the accepted practice of the time, but it’s still a bit ugly. Can we do better? Toni Baker’s book Mastering Machine Code on Your ZX81 suggested three other ways to store machine code alongside this one. In the first, you lower the amount of space available to BASIC and then store your machine language program inside the space left over. This protects the program well, but you cannot save it out, nor can you load it directly. A more capable BASIC that can load machine language programs into arbitrary locations—such as its successor the Spectrum, or any of the Apple or Commodore BASICs—can use this technique pretty readily, though. I don’t personally like it because it implies you need a separate loader program to set up memory right and then once the program stops it can be tricky to restart. In the second, you store the program in a string variable and call it out of the program variable space. This combines the disadvantage of an inconsistent program location with the disadvantage of a program that is obliterated if the end user ever types RUN.

That’s not ideal. The final suggestion was to store the machine language program inside the last line of the program and then make that last line disappear by POKEing a $FF into the most significant line number byte, making this line disappear from BASIC but nevertheless maintained as part of the display file. Ms. Baker did not recommend this technique, largely because the code’s location would vary as the BASIC program did, so calling it was a bit of a chore and any routine that wasn’t fully relocatable would fail if the program was modified.

But that’s perfect for us. We’re aiming for a small fixed program that never changes. Our target program would be something like

10 RAND USR 16514

That number is wrong, of course, but we can compute the right one. The BASIC program starts at 16509; two bytes each for the line number and line size, then one each for RAND and USR, then five for the digits of the address, and then six for the floating-point representation, and finally a newline to end the line. We then need a $FF to force the next line number to be invalid, and we can start our machine code right after that. A bit of addition shows that this means our machine code starts at 16528—$090 in hex. A pleasingly round number.

We can get the bytes for that program by typing the line in and saving that one-line program to a .P file. The bytes we need are then easy to extract from a hex dump.

Of course, in order to actually make use of all this, I need to make my own version of the appmake program we used last time. This isn’t difficult; it’s mostly just concatenating stuff together and then computing and loading in a few constants based on the length of the program. That ends up being a hundred lines or so of Python, most of them just arrays of constants. After reassembling our Hello World program with an origin of $4090 instead of $4082, we get a program that runs identically but has a much more pleasing listing:

zx81hello_post

And there you have it. Now I just have to do something more interesting than Hello World.

From the Archives

I’ve kind of been neglecting my Commodore stuff lately. That said, even though I haven’t done much new, I’ve finished cleaning up and commenting some of the earliest C64 code I wrote that’s still good enough to use. These are a simple bitmap library and a simple music playroutine. Both date from when I was writing things that would have been simple BASIC programs in a more sophisticated BASIC than the C64 had and wanted some better primitives.

The Bitmap Library

Bitmapped graphics on the C64 manage the difficult trick of being simultaneously perfectly logical and a gigantic nightmare. It’s a 320×200 bitmap, and to set a point (x, y), you first compute the relevant memory address A=320*INT(Y/8)+8*INT(X/8)+(Y%8). Then you must POKE A, PEEK(A) OR (2^(7-INT(X%8))). So that’s the nightmare part. The logical part is that this is how you would set up a bitmap if every 8×8 block were a custom character, so if you’ve already done some work setting up graphics for work with custom character graphics, you can translate it to the bitmap screen with no work whatsoever.

This bitmap library basically just does a bunch of unoptimized 16-bit math to compute that and then does a similarly straightforward hand-translation of Bresenham’s algorithm for drawing lines. Over the years I’d made little variants of it to experiment with ways to make it smaller or faster, but it doesn’t seem like I ever really found any refactorings that I liked enough to keep. Most of them revolved around not having to recompute the pixel masks or addresses, but these kept either being not much smaller, or being just a little smaller and much slower.

I haven’t really revisited this routine with an eye to optimizing it, because it works well enough, at 600-odd bytes it’s not a huge imposition for a program that needs it, and most work in bitmaps doesn’t really want arbitrary line-drawing anyway because you can’t do it with acceptable speed on a 1MHz machine.

The music player

It’s been said that a rite of passage for every Commodore programmer is writing their own music-playing routine. This was mine, I guess. Pretty much every program I’ve written that needs to play a tune of some kind has included this in it, but if I ever make a proper demo, I sure won’t be using this.

The basic notion is straightforward; each voice gets an array of notes and durations and it just steps through the arrays as needed, and at different rates. The “at different rates” is a little interesting because most modern playroutines work like the old Amiga trackers and keep the time index fixed for all voices. This doesn’t do that which makes the results a lot easier to write out, either by hand or with the help of a script.

Also amusing, though I never actually did anything with it, is the fact that the three voices don’t have too all be of the same length, because they can loop independently as well.

The reasons not to use this library mostly boil down to instrument control. The note arrays also include control commands that let you set the waveform characteristics of the notes, and also how much time to give to the “release” of each note so that it has a bit of time to fade out before the next note starts. A big reason that all the respectable players are tracker-based is that tracker-based systems let you alter the timbre and effects on a note while it’s still being held, and that’s a very important part of SID music. This system is fundamentally incapable of making that happen.

Still, to date I’ve never needed anything more powerful than this, so it’s crept into about a dozen programs over the years. Never underestimate the utility of being Basically Good Enough As It Is.

ZX81: On to Machine Code

All right. It’s time to step away from BASIC and start working with machine code. By the end of this post we’ll have a technique for mixing machine code with BASIC programs, and also have a functioning Hello World program that’s 100% machine language while still being as easy to load and run as any BASIC program.

Sinclair BASIC offers us only one mechanism to get into machine code from BASIC: the USR function. This takes the address of a machine code routine and jumps to it. The ABI rules are pretty simple:

  • The I and IX registers are sacred to the display routines in the ROM, and so we are not permitted to use those if the display is active.
  • EDIT: ADDED 11 Mar 2017: The AF' shadow register is also reserved by the display routines, which means that you may not use the associated EX command. EXX is still fine, however.
  • The IY register is set to $4000, the start of system variable space. BASIC expects this to be the case but will put some reasonable effort into forcing it to that value. In practice, though, if you need IY at all $4000 is probably where you were hoping to put it anyway.
  • You return to BASIC with the RET instruction or one of its conditional variants. The value in BC is treated as an unsigned integer and this becomes the value of the USR function.

An Example From the Literature

Let’s look at the game “Surge”, by Tim Rogers. This was a short type-in program about flying a spaceship through an asteroid belt. To check for collisions, the program consults the value of the screen memory where it is about to draw the ship’s next position. This task is complicated by the fact that lines of screen memory might be different lengths, so you can’t simply compute the location of a point on the screen from the base address. The location might not even exist. To address this situation, location $400E (named DF_CC by the documentation) holds the address of the point in screen memory the cursor is pointed at. So, to check the screen memory at row R, column C, place the cursor with PRINT AT R,C;, which prints nothing at a location and keeps the cursor there, and then checking screen memory with PEEK(PEEK 16398 + 256*PEEK 16399).

That PEEK statement is pretty cumbersome, so Tim Rogers rendered it as a USR routine instead, which is seven bytes long:

2A 0E 40    LD HL, ($400E)
4E          LD C, (HL)
06 00       LD B, 0
C9          RET

(Oh, um, I seem to have not actually mentioned this yet, but the ZX81 uses a Zilog Z80A for its CPU. This is actually the first time Z80 code has appeared on Bumbershoot Software.)

An interesting property of this code is that every single byte in it except for one is either a character code for a symbol on your keyboard or the token of a BASIC keyword. That means you can almost completely type it in, in its binary form, into a program comment. This is exactly what Surge does:

surge_01

And since it’s the first line of the program, it’s at a very consistent location, so making use of it is pretty easy too:

surge_80

The untypable byte is POKEd into place by the third line of the program, but there’s nothing stopping you from entering the line, entering the POKE as an immediate command, and then leaving the now-perfect line as-is.

And in case you’re curious, the untypable byte isn’t $00. It’s $4E. $00 is space.

Generalizing the Technique

I’ve always wanted to do this trick on the Commodore 64, but you really can’t do it in Microsoft-descended BASICs. Those BASICs use null bytes as line terminators, and it’s very difficult to write a useful assembly language routine with no $00s in it. Each program line also includes a pointer to the next line, but the BASIC does not trust those pointers, rewriting them on every load and using them only to quickly search the program for a specific line number. Every other operation relies on the null terminators.

Sinclair BASIC, however, prepends each line with its length and this length value is the only source of truth. There are thus no real restrictions on this technique at all. Unprintable bytes show up as question marks in listings and do no harm. $76 (newline) and $7E (the byte that indicates the next five bytes are a parsed floating-point number) will garble the listing so that you can’t reconstruct the size of the program from inspection, but otherwise don’t hurt anything. A program large enough that it doesn’t fit on the screen might confuse the ROM enough to soft-lock it when you LIST the program, and that’s extra nasty because as we’ve seen Sinclair BASIC likes to fill the top of the screen with program listing when it’s not doing anything else, but this isn’t a problem in practice for three reasons. First, it doesn’t always happen. Second, as we noted last time it’s very easy to make the program autorun right after loading so there’s no chance to list it. And finally, the saved-out state of the BASIC interpreter includes the state that indicates where the listing is scrolled to, so a long program can simply be scrolled off the screen before it’s ever loaded.

That does leave one problem, though: there is still no equivalent of SYS. The USR function returns a value, and in BASIC, unlike C, values may not be ignored. Functions and statements are fundamentally different things in BASIC. There seem to be two traditional solutions:

  • If your program is a meaningful mix of BASIC and machine language, machine language routines all return a status code of some kind that the BASIC framework uses to decide what to do next. That’s probably going to mean “assign the result to a variable”, but a real slick dispatch system could get a lot of mileage out of a statement like GOTO USR 16514.
  • If your whole program is machine code, take the result from USR and use it to reseed the random number generator. Variations on RAND USR 16514 seem to show up a lot.

In the end, that meant that this technique was the one that everyone actually used. There are alternatives—one of which I will explore in detail later on—but jamming your machine code into REM statements is unquestionably the community standard.

Getting In On the Act

It’s time to start writing some machine code ourselves. This is the first Z80-based system we’ve worked with here, so we’ll need a new toolchain. I ultimately went with the z88dk suite, which targets tons of systems (including the ZX81) out of the box, includes its own C compiler, and is also part of the stock repos of both Fedora and Debian. It got me off the ground pretty much immediately and I’m very happy with the result.

Let’s get started with a simple Hello World program:

Continue reading