Category Archives: retrocoding

Compatibility Across ZX Spectrum Variants

The Spectrum is a new platform for me, and that means that as is traditional, I’ve ported Lights-Out to it. The final tape file weighs in at 1,050 bytes, which is the largest 8-bit implementation I’ve done to date, but that extra space is being used to buy something. One of those things is obvious:


This is my first implementation that actually makes use of custom graphics of any kind. The default graphics characters for the Spectrum are less comprehensive than even the ZX81, but a certain amount of custom graphics is painless and so that is clearly what developers are expected to use.

The second thing can’t be shown on screenshots, but also accounts for some of the space used. Despite the fact that the two machines are broadly incompatible, this program will run with no modifications and no model-detection code on both the 16KB Spectrum and the Timex Sinclair 2068. (By virtue of running on the 16KB Spectrum it also runs on the rest of the computer line, because within that line things stayed pretty consistent. There are a few ways you can go wrong, but not many.)

I’ve already covered the general design of this program, and the Spectrum/Timex port is really just an expansion and adaptation of the ZX81 port. I’ve uploaded the source code and updated the Lights-Out Collection download to include this version too.

Below the fold I’ll talk about the changes that had to be made to move from ZX81 to Spectrum, and the compatibility restrictions that permitted a 100% machine code program to run on the Spectrum and the TS2068.

Continue reading


SpectraLink: Creating tape files from scratch

Last time, we created a self-loading and auto-running BASIC/ML hybrid program and saved the combination out to tape. We built our program in the emulator using ordinary BASIC commands. That’s the most painless workflow yet for making a machine-code program with period tools—at least with what we’ve explored here.

But it’s 2017. We want to have cross-development workflows that don’t require us to manually fire up an emulator and mess around with memory injection and hand-written BASIC programs. Let’s get this up to speed.

Continue reading

Getting Started With the ZX Spectrum

It’s time to revisit an old friend.


Well, that’s a lie. The ZX81 wasn’t really an old friend in the first place, and as we can see above, this isn’t a ZX81—we’ve got not just mixed-case text but working exclamation points!

This is its successor, the ZX Spectrum. This machine never reached American shores, unless you count the spectacularly ill-fated Timex Sinclair 2068, which you shouldn’t. The TS2068 had a completely different RAM layout and ROM system, which in turn meant that basically no software ran on it unless it was 100% BASIC, and maybe not then.

But the Spectrum was very well-beloved elsewhere in the world, and had many clones, and is a much cleaner platform for experimenting with Z80 assembly code than most of my other options.

If you want to work with these yourself, FUSE is the premier Spectrum emulator overall, while EightyOne, the ZX81 emulator I had recommended for Windows, also covers the primary Sinclair line.

In this post, I’ll be outlining what it takes to produce a machine code program for the Spectrum, and how to mix it with BASIC. This is a lot different than the systems we’ve looked at previously, to the point that it almost feels like this is the first system we’ve looked at that actually intended hybrid BASIC/ML code as a common use case.

Continue reading

Structured C64 BASIC: A Complete Example

Now that we’ve worked through the basic principles, let’s go through a worked example. I don’t work in BASIC much, but the largest program I’ve written that’s pure BASIC is a directory-editing program for disk images, weighing in at about 120 lines of code. It should serve to demonstrate the core principles.

The Problem

When I was organizing some of my projects and other programs into disk images, I kept running into a host of minor irritations endemic to disk work on the C64. My initial list of goals was:

  • The ability to reorder directory entries, including entries that didn’t refer to any files. You see, instead of storing directory entries as a linked list, CBM DOS stored as an array and when a file is created the first “empty” entry is used. That means that if you’ve deleted files, your newly created files appear seemingly randomly in the middle of the listing, instead of at the end. That’s super-annoying so I wanted to be able to move those gaps down to the end of the list.
  • Undelete files.
  • Hide files so that they existed on the disk but didn’t show up in directory listings.
  • Create unloadable delimiter files that did exist in the directory listing, as dividers or headings.

Deletion and renaming were easily managed with ordinary disk commands, so I wasn’t as interested in those. I didn’t need a DOS shell or directory commander—I needed a program that would let me do things to the directory structure that the DOS didn’t provide as commands.

Now, as it turns out, that whole list can be boiled down to three operations:

  • Exchange two entries in the directory listing.
  • Alter the file type of a file in the directory listing.
  • Optionally scan the directory structure and adjust the block allocation map to match it.

This gives us all the operations I want to perform. Any permutation can be built out of an exchange primitive. Deletion and undeletion involve changing the file type to or from “not a file” and then performing the scan for the block structure. Hiding files is accomplished by changing the file type to “not a file” and then not performing the scan, leaving its blocks protected and inaccessible. And delimiter files can be created from any other file by setting their file type to the special delimiter type.

The Initial Drafts

The basic implementation strategy falls out of the considerations above. To do reordering and file type manipulation, I would load all the disk blocks that hold the directory into memory, edit them in place, and then write them all back out at the end. The DOS provides a special “validate” command that does the directory structure scan and block allocation reconciliation. I would also add a “move slot A to slot B” command that performed repeated exchanges, to simplify moving gaps in the directory to the end of the list.

My first draft stored the directory blocks in a two-dimensional array of integers. One array dimension was the disk blocks, and the other was the bytes within them. This worked, and it was relatively simple to implement, working much like our sector-dumper program from last time did. However, it was impossibly slow; exchanging directory entries involved multiple loops that would copy values around, each executing dozens of times. If I wanted to get decent performance out, I’d need to write it in machine code, or create a completely external tool to handle it. Neither of these options appealed to me much.

I then set about trying to implement the core operations more efficiently, which in turn meant changing the data structures. The alterations were pretty drastic.

First, instead of reading and writing every single sector in track 18 (where the directories and allocation maps were stored), I would only read entries in the directory listing itself. These are arranged as a singly linked list. I created an array SN% that stored the order of tracks to write back and a variable NS for the actual number of sectors we read. I then created a parallel array D% to hold “dirty bits”—these all started as zeroes and got a 1 written to them if I ever altered anything in that relevant sector. At the end of the program, I would only need to write out those sectors that I actually changed. In the extreme this could result in me doing seventeen times less work.

That leaves the contents of the directory entries themselves. These were 30-byte data structures that my exchange-directory-entries routine would be copying back and forth as blocks. Instead of copying blocks of array entries back and forth, I kept each directory entry as a string and simply edited them or swapped references around as needed. These entries were stored in an array named DE$.

Finally, to keep myself honest, I added a couple of variables to check to see if I’d ever deleted or undeleted any files to remind me whether or not I should validate the disk.

All of these changes were a fairly significant rewrite. I did the implementation in C64List (which was also the initial import into Github, and while I made some basic efforts to make the code look nice when LISTed, I was pretty lackadaisical about it. Code flow jumped all over the place, modules were huge and all had differently-sized ranges of lines assigned to them. You couldn’t ever really type LIST with a line range and have any idea of what you would get.

So as written, in the modern era, with modern cross-development tools, this draft was basically fine. It would not have been fine in the 1980s. It would have been extremely difficult to understand or modify without printing out the entire source code and studying it as hardcopy.

So, before I do a walkthrough of the program itself, let’s fix that.

Structuring Constraints

For the purposes of this project, I set the strictest set constraints that I’d commonly seen used in programs published with the intent of being typed in by the end user:

  • The code will be organized into blocks, each of which is assigned one hundred line numbers. Block 0 is lines 0-99, block 1 is lines 100-199, and so on.
  • Line 0 of each block (the line that is an even multiple of 100) will be a comment explaining what the block does.
  • GOTO statements will only ever target locations within the same block.
  • GOSUB statements will always target the beginning of a block and nowhere else.
  • Blocks will be self-contained: any block you enter with GOSUB will be exited with RETURN. (The main program is a special case, and we’ll cover that below.)
  • A reader should be able to type the code verbatim into a C64 without using input abbreviations or tricks. (The C64 line editor had a limit of 80 characters, but BASIC lines could be up to 255 characters long, and this limit wasn’t enforced until after all the keywords had been compressed to a single byte. Clever users could use special keyboard abbreviations to make lines that were over 80 bytes long when LISTed, or use program generators to produce lines that were much longer. This constraint demands that we structure the code so that this is never necessary.)
  • A block will fit on the screen when LISTed. There needs to be room for both the intial comment and the reiterated BASIC prompt after the LIST completes.

I then added some additional structure that made sense for this program.

  • Blocks would be further grouped into megablocks of ten blocks each. These would comprise related blocks that cooperated to perform some task.
  • Megablock 0 would hold the main program and general utilities. Any GOSUB statement that called into a routine would either be calling a block in megablock 0, a block within their own megablock, or the first block in some other megablock (that is, an even multiple of 1000).
  • Block 0 would be the entire main program. The only things it would do would be initialize globals and call into the other megablocks, and the last line in the block would be END. Blocks 1 through 9 would be reserved for general-purpose routines. (This was uncommon. Generally, 9 blocks wouldn’t be enough for your general purpose routines, and so a set of megablocks near the end of the program would be used for them. Likewise the main program would be kept near the end instead of the beginning. Extremely low-numbered blocks would hold routines that needed to run very quickly, taking advantage of the way backwards branches are faster at lower line numbers. Our directory editor is I/O bound basically all the time and doesn’t care about this.)

Of all of these restrictions, only the requirement that each block fit on a C64 screen when LISTed made my task harder. The others provided a structure that made the outlining and implementation simpler.

Structure is your friend. That’s not unique to BASIC, either.

Let’s take a walk through the final program.

Continue reading

C64 BASIC: Disk I/O

It’s time to start actually doing floppy disk I/O in C64 BASIC. This was a little tricker than one might like, because the Programmer’s Reference Guide only covers the most generic forms of the OPEN command, and the drives’ own manuals were infamously opaque. Still, there’s enough information in them to let us do the work.

We’ll start with the basics; we’ll create a data file and put some text in it, then read it back out. Here’s a program that writes the text:

   10 OPEN 1,8,2,"HELLO WORLD,S"
   40 CLOSE 1

And here’s the very similar program to read it back out:

   10 OPEN 1,8,2,"HELLO WORLD,S"
   20 INPUT#1,A$:PRINT A$
   30 IF ST=0 THEN 20
   40 CLOSE 1

OPEN takes four arguments, normally. The first argument (the 1) is BASIC’s name for this open file. The second is the device number, which is 8 as usual for the first floppy drive. The third is specific to the device—disk drives let us use any number from 2 to 14 for this—and it serves a very similar purpose to the first argument. For now it’s arbitrary, but it will become relevant later when we are giving the disk commands directly. We go with 2 as the lowest legal value. Lastly, the final argument is the filename and any optional parameters. The “,S” at the end tells us that this is a SEQuential file.

ST is a special variable that is set whenever we do any I/O. This is 0 on success, and various bits are set if the operation fails in various ways. The result 2, for instance, means the file was not found, and as we can see…


64 is End of File. Unlike feof() in C, though, this is is set on the last legal read, not the first illegal one. That makes the loop logic look a bit different than we might otherwise expect.

To do anything more fun, though, we’ll need to work at the byte level, not the formatted BASIC level. We’ll do that below the fold.

Continue reading

C64 BASIC: Performance Tuning

There’s a bunch of heuristics and rules for getting BASIC code to run slightly faster. Some of them even actually help. The core principles, though, are that command interpretation is work and so you want to do as little of it as possible. This has some occasionally surprising side effects. That means that to be sure what’s going on you’ll need to do some measurement.

Our Sample Task

For our initial experiments, we will put the checkerboard graphic at a hundred random places on the screen. Now, while Sinclair BASIC would let us place the cursor with a command like

PRINT AT INT (RND*22),INT (RND*32);"{H}";

we don’t have PRINT AT in Commodore BASIC. The KERNAL, however, does have a routine called PLOT at $FFF0 which will position the cursor for us. Thus, here is our first cut at the routine, in petcat format:

  10 print "{clr}{gry3}";
  20 s=ti
  30 for i=1 to 100
  40 y=int(rnd(1)*24):x=int(rnd(1)*40)
  50 poke 781,y:poke782,x:poke783,0:sys 65520:print"{CBM-+}";
  60 next i
  70 e=ti
  80 print "{home}{lblu}total time:";(e-s)/60

UPDATE, 10 Sep 2017: This listing has been updated to fix the POKE to 783 (clearing the status flags). The original post zeroed 780 instead, which cleared the accumulator. The PLOT command requires the carry flag to be clear in order to move the cursor. (Otherwise it reads the cursor position instead of setting it.) This edit does not alter the timing information.

We time our program by consulting the TI variable. This is initialized to 0 at power-on and increments every time the clock-tick IRQ is processed. That’s 60 times per second under normal operation, and 0 times per second when we’ve disabled interrupts. We don’t have to worry about interrupts this time, so TI works fine and will give us frame-level accuracy.


Running this program shows an average runtime of 4.47 seconds. That will be our baseline.

Lowering Instruction Count

Our inner loop is executing seven full BASIC commands per loop, all to perform cursor positioning. This isn’t even the usual way people did cursor positioning, though; usually they would exploit the way you could put cursor-move characters into strings. String operations permit a lot of work to be done and can be surprisingly fast. If we add a new line and then replace line 50:

  15 y$="{home}{24 down}"
  50 print left$(y$,y+1);tab(x);"{CBM-+}";

This speeds up the average runtime to 3.87 seconds, which is nearly 15% faster. That’s not bad at all, and clever use of LEFT$ and RIGHT$ can be used to get reasonable horizontal scrolling effects in pure BASIC.

Continue reading

C64 BASIC: Coexisting With Non-BASIC Data

One of the most defining aspects of C64 BASIC is that there is almost no high-level support for the various features the hardware provides. This means that you can usually tell when a BASIC program is for the Commodore because the vast majority of its work is doing memory-mapped I/O with the POKE and PEEK statements. This mixes very uneasily with the requirements of an interpreted, garbage-collected language like BASIC. The VIC-II graphics chip, in particular, tends to want to deal with continuous chunks of memory at well-defined locations. In machine code programs, we can simply not put our own code or data in those locations, but with a higher-level language, we need to somehow ensure that the language runtime won’t do so. (This problem is in no way unique to BASIC, but BASIC shares the dilemma.)

Now, there’s some regions of memory that BASIC and the KERNAL beneath it guarantee won’t be touched. There are four important regions there:

  1. $C000$CFFF (49152 to 53247 decimal) is 4 kilobytes of contiguous space that are free for application use. Machine language programs that expand BASIC’s capabilities do well here.
  2. $033C$03FB (828-1019 decimal) is technically the casette I/O buffer, but if you aren’t in the middle of a tape operation it won’t be touched. It’s also part of the default 16KB of memory that the VIC-II can see, so it’s a good place to stuff sprites or very small machine language support routines.
  3. $030C$030E (780-782 decimal) let BASIC affect the registers at the start of a SYS command. It’s more common to dedicate some memory elsewhere for communication with machine language routines, but if you want to make a syscall direct from BASIC, these locations will let you do that.
  4. $FB$FE (251-254 decimal) are 4 bytes of zero page that can be used for pointers by machine language routines without interfering with any BASIC or KERNAL operations. This is important because you must indirect through the zero page any time you’re doing pointer-like operations, and you’re going to want to do that a lot.

None of these are suitable for storing custom character sets—it’s very inconvenient and slow to set up the VIC-II to use the $C000 range for text and still have things like PRINT and INPUT work—and none of them are even minimally sufficient for bitmaps. If we want to do anything serious with the VIC-II, we’ll have to reconfigure BASIC’s usage of memory.

Continue reading