Atari 800: Properly Mixing Basic and Machine Language

In my previous article, I wrote a small routine to print out “Hello world” on the Atari 800. I used READ and POKE to get the program into a fixed location in memory (1536, or $0600 in hex), and then called it with X=USR(1536). Memory locations $0600 through $06FF (page six) is reserved by BASIC for machine language routines that the programmer might want to use.

255 bytes for all our machine-code support is pretty limiting, though, and this was not usually a BASIC developer’s first resort for machine code. Like on the other 8-bit home computer platforms we’ve worked with, you almost always would prefer your machine code to be in the same memory as the BASIC program’s workspace, but somehow protected from it.

Unfortunately for us, our options on the Atari are quite aggravatingly limited, because the system was very flexible. Where BASIC’s memory began or ended depended on what graphics mode you were in, whether any cartridges were inserted, how many disk drives were connected, and which version of DOS you had booted from. With BASIC, DOS, and the underlying OS all eating away at both edges of your RAM, it also wasn’t usually feasible to move some edge of the program away, either. If you were going to mix machine code in with your BASIC program, you would generally be best off if your program were relocatable—that is, if it refers to no fixed addresses inside itself. In this scheme, you would load machine code routines that got called by machine code routines into page six, and then BASIC would call into these relocatable routines that would run fine wherever you happened to be stashing them this time. Furthermore, if you don’t care about where a block of bytes is, there was a very easy way to declare a block of bytes to be in use and not otherwise touchable by BASIC: we can keep our program inside a string variable. BASIC will maintain the space for our routine inside its own memory manager, and we don’t have to worry about it being overwritten by accident. All we have to worry about is how to call it, how to get away with not using hardcoded addresses for anything, and how to load it in in the first place.

Calling the function is very easy, at least. Atari BASIC includes a primitive ADR, which returns the address of the contents of a string variable. We can use this to turn any string into a pointer to its contents.

Getting away with not using addresses is much harder, especially for something like Hello World—one way or another, we need a way to get the address of the string to print into the IO control block. For unconditional jumps and the like we can get good use out of the BNE and BEQ-like instructions, taking advantage of our knowledge of the flags to make relative unconditional jumps. And the fact that we’ll be only writing little snippets of machine code and weaving them together with BASIC means many of our operations simply won’t need long-range jumps or calls to subroutines on their own.

But there’s one extra trick up our sleeve that lets us get addresses of data more easily: the USR command can take arbitrary numbers of arguments. Remember how when we wrote our original program our first instruction pulled a value off the stack and threw it out? That value was the number of arguments passed to the routine. If we keep pulling values off of the stack, we’ll get each argument, left to right, high byte first. High byte first is a bit backwards, but that means we can take the string to print and its length as arguments and copy them from the stack directly into the I/O control block. This makes the main routine much shorter:


        ldx     #$00

        lda     #$0b
        sta     $0342,x
        sta     $0345,x         ; High byte first!
        sta     $0344,x
        sta     $0349,x
        sta     $0348,x
        jmp     $e456

We can then define space for some strings and create our machine code routine and our data string. Our data string becomes a lot easier to create, too, since it is just a string and so we can just type it in. The one wobble we hit is that our string is supposed to end with an end-of-line character (which for some reason is character 155 in Atari’s ASCII variant instead of the 13 or 10 that it normally is), and so we have to append that on our own in line 60. Otherwise, things look very similar to our original program.

10 DIM A$(128),B$(38)
20 FOR I=1 TO 27
30 READ A:A$(I)=CHR$(A)
50 B$="Hello, relocatable world!"
60 B$(LEN(B$)+1)=CHR$(155)
70 X=USR(ADR(A$),ADR(B$),LEN(B$))
80 DATA 104,162,0,169,11,157,66,3,104
90 DATA 157,69,3,104,157,68,3,104,157
100 DATA 73,3,104,157,72,3,76,86,228

(An aside, for those familiar with the other BASICs I’ve worked with on this blog: in Atari BASIC, you have to DIMension strings to their length, more like C than the BASICs we’re used to. This also means that we don’t have the ability to declare arrays of strings, or concatenate them in the ways we’re used to. Instead, subscript notation gets abused a bit to perform appends, and—unlike the strings of the other BASICs—our strings are mutable so we can just assign directly to each character as we see in line 30 above.)

Actually Gaining Something From This

That’s all well and good, one might think, but this doesn’t really save us much beyond just poking stuff into page 6 as needed. Why bother with this? This brings us to our final little bonus. Unlike most of the computer systems we’ve messed with here, it is possible to type all 256 possible character codes on the Atari keyboard, through a combination of shift states and input modes. So, as long as our program doesn’t have any newlines ($9B) or quotation marks ($22) in it, we can just type the whole program in as a string constant, for an effective preparation time of zero:


This is how you’d generally want to set up your machine code in a mixed BASIC/assembly language environment. However, I am lazy, and I’ve already typed these numbers in. Can’t I just get a program to write it for me? Well, yes, of course, but Atari BASIC gives us some interesting new options. In the Commodore and Apple II BASICs (and, for that matter, BBC BASIC on the Raspberry Pi), we needed to use a special program called a tokenizer to convert BASIC program text into the bytecoded format that the interpreter actually expects. Atari BASIC has a bytecode of its own—with some very strange properties that mean that how the program was typed in will affect what different byte values mean even for programs that are identical when listed—but Atari BASIC includes its own tokenizer and detokenizer. In addition to the traditional SAVE and LOAD commands to copy bytecoded BASIC programs to and from the disk, there is also LIST and ENTER, which will work with untokenized files. Furthermore, there is nothing special about files in Atari DOS; we are entirely within our rights to write a program that itself writes a program by PRINTing each line to a file. That program can then be ENTERed back into our memory as we wish. Better yet, ENTER, unlike LOAD, does not actually destroy the previously existing program, so we can use this to merge together individual bits of program into one full one.

For actually outputting the string itself, though, the PUT statement, which outputs a single arbitrary byte to a channel, is really a bit more convenient. I appended these lines to my first program in this post:

110 OPEN #1,8,0,"D:BUMB.LST"
120 PRINT #1;"10 DIM A$(128),B$(38)"
130 PRINT #1;"20 A$=";CHR$(34);
140 FOR I=1 TO 27
150 PUT #1,ASC(A$(I))
160 NEXT I
170 PRINT #1;CHR$(34)
180 CLOSE #1

After that, all I had to do was put a blank formatted disk in the drive, RUN the program, ENTER "D:BUMB.LST" to merge in the lines it printed, and then delete all the bits of the program that created the file or interacted with DATA statements. That gave me the program in the screenshot.

A Quick Note on Filenames

The D: in the filename there means that this is a file on the disk drive. Other devices we have available to us are C: (the cassette drive), E: (the “editor”—this is actually IO Control Block 0, which we’ve been using for our basic text output), K: (the keyboard, for when you need to pull in data a keystroke at a time), and S: (the “screen”, which is different from the editor because it includes the graphical parts of the display). When BASIC has a graphics mode open, the S: device is opened for interaction with IO Control Block 6. We played around a bit with that last time, too.

What’s Next?

We’ve now pretty much sorted out the techniques you’d use to mix machine code and BASIC. But BASIC required an optional cartridge and it also made big chunks of RAM on the 800XL permanently inaccessible. So next time we will look into how to write pure machine-code programs that can coexist with DOS irrespective of BASIC’s presence, and then look into taking full control of the machine so that not even DOS can vex us.


Atari 800: Stumbling Into a New System

Last year I finished my retrocoding ambition to produce some low-level software for every computer system of my youth. Since then, I’ve poked at various other systems to see if there were things there worth exploring. At VCF West last year I had a long talk with fans of the Atari 8-bit home computer line—not the Atari 2600, but the 400, 800, 800XL, and so forth. (Despite excluding the 2600, its immediate successor, the Atari 5200, is usually grouped with these computers as well—apparently the architecture is very similar.) I spent a month or so reading old books and magazines and playing around with the various Atari computers in emulation. This was quite a lot of fun, and one particular joy was seeing how many pioneers in the industry got their start on the Atari. Here, for example, was a simple program that animated some little pixelly cowboys, demonstrating the basics of how sprites worked on the system:


The main author cedes the virtual floor to “his more talented partner Sid” for some of the subtler bits. That is actually Sid Meier there animating your pixel cowboys. Yes, that Sid Meier. This was definitely worth the trip just for that, even in emulation.

Now, I say “even in emulation” there, but the Atari emulation community has an interesting attitude towards their project. Like most computers, the system will not do much without a BIOS and usually a DOS of some kind. DOS is optional on the Atari thanks to its cartridge and casette ports, but you do still need the BIOS and occasionally a separate cartridge for BASIC. These were copyrighted by Atari, of course, but long ago permission was given to use them with the freeware emulator “Transformer” (recently given new life) and later open-source emulators like Atari800 included a complete old copy of Transformer and piggybacked on the ROM images it included.

The emulator recommended to me at VCF, though, Altirra, does not rely on this. Instead, it includes custom, rewritten, but largely compatible implementations of the BIOS and of BASIC. The BASIC in fact is much more optimized than the period interpreter, and I had trouble with some type-in games because of this—they actually ran too fast. Other than that though the improved responsiveness made the experience a lot less annoying.

This strategy of making open-source clones of the firmway extends to the 16- and 32-bit Atari computers, and this tradition in fact started there. The Atari ST’s core OS (“TOS”, glossed as being named after the CEO of Atari at the time but widely recognized as actually standing for “The Operating System”) was evolved from then-standard technologies and was sort of a mishmash of CP/M, GEM, and POSIX. Hobbyists completed the POSIX layer into full compatibility aas part of the MINT project (“MINT Is Not TOS”), and then the other layers were cloned with a layer called EmuTOS. Atari ST emulators such as Hatari generally run on an EmuTOS core. (As for MINT, Atari eventually made it official and re-expanded the acronym to “MINT Is Now TOS.”)

Hello, Atari

Let’s run some machine code. It turns out that the Atari environment is extremely finicky, so the way we go about various tasks will depend on what kind of program we’re running and from where. We’ll start with BASIC and use a READ/POKE based loader that’s not unlike what we used on the Commodore 64 when we needed to load a small program while otherwise in pure BASIC:

 10 FOR I=1536 TO 1580
 30 NEXT I
 40 X=USR(1536)
 50 DATA 104,162,0,169,11,157,66,3,169
 60 DATA 31,157,68,3,169,6,157,69,3
 70 DATA 169,14,157,72,3,169,0,157,73
 80 DATA 3,76,86,228,72,101,108,108
 90 DATA 111,44,32,119,111,114,108,100
100 DATA 33,155

This does what we want:


Now let’s have some fun with it. Let’s modify one byte of the program in place:

POKE 1538,96

This change won’t do anything good immediately, but if we first switch our display mode…



Now the text is appearing in the graphics screen. This double-width text matches what we saw in Sid Meier’s demo up above, and the bits that previously controlled upper vs. lowercase, and normal vs. reverse video, now control the color of the letters. Hence we see the bulk of the text in odd colors and all in uppercase. (This is not unlike what we saw with the C64’s extended text mode, but with the foreground color instead of the background.)

Continue reading

The Xorshift-Star PRNG for Three 32-Bit Chips

Once I moved beyond the 6502, I ended up porting a 16-bit PRNG from 16-bit x86 to provide me with a decent source of randomness for my projects. After that, in one of my nonretrocoding endeavours, I ended up making use of an enhanced 32-bit RNG that would pass a demanding suite of tests for PRNG quality. That’s actually quite nice, and the source code for the generator is also pleasingly tiny:

uint32_t xorshift64star(void)
    uint64_t x = rng_state;
    x ^= x >> 12;
    x ^= x << 25;
    x ^= x >> 27;
    rng_state = x;
    return (x * 0x2545F4914F6CDD1DLLU) >> 32;

This code is compact enough that it doesn’t really need to be optimized. In fact, on a fully modern 64-bit system, every operation in this function can be done in a single machine instruction, so there’s really nothing to optimize in the first place.

On a less fully modern machine though, we only really get 32-bit registers to work in and all of these operations involve multiprecision math. I have been known to occasionally race against compilers to see how much better I can make hand-tuned assembly code run. When compiling the 16-bit PRNG, gcc’s 32-bit ARM compiler was clever enough that it was able to produce smaller code than hand-written, self-modifying x86 code. Can it do as well when compiling 64-bit math to a 32-bit system? (Spoiler alert: no. No, it can’t. But in both the ARM32 and 32-bit x86 cases, it used some tricks I’d missed in my first pass that let me improve the handwritten versions even further.)

So, What Are We Doing Here?

We’ll be making hand-coded implementations of this function for three 32-bit architectures, each of which will need to approach the problems of multiprecision math differently: ARMv4 (1994), Intel 386 (1985), and the Motorola 68000 (1979). Each of these chips made different decisions about how to handle multiprecision operations, and between them we can explore a variety of techniques for addressing the issues multiprecision math can pose.

Our basic problem here is that there are four 64-bit operations that we are doing here, and we need to do them on a 32-bit chip:

  • Assignment
  • Exclusive Or
  • Bit shifts
  • Multiplication

Assignment and exclusive-or are basically free. Because each bit in the result is copied or mutated individually, one 64-bit operation can simply by done as two 32-bit operations (or indeed four 16-bit operations or eight 8-bit ones—the solutions here are structurally similar all the way back to the Z80 and 6502, if we were so inclined). The only real optimization we can do here is that ARM and 68000 have “load/store multiple” instructions that let us save some instruction space while doing multiword copies into or out of registers.

The other two cases are more fun. We’ll cover them below the fold.

Continue reading

Stabilizing the C64 Display with Sprites

For the most part I came to the 8-bit programming world very late. By the time I’m looking into what makes a system tick, all the development on that platform has ceased or at least plateaued, with the basic tricks of the system (including any consistent undocumented or unintentional capabilities of the chips) well-codified. This isn’t always really consistent—the Sega Genesis, for instance, is much easier to emulate these days than it is to intentionally develop for unless you were there at the time—but it’s still definitely influencing the way I’ve approached my own projects.

However, the development communities didn’t start out knowing these techniques. The main document I used to inform my Commodore 64 graphics work was published in 1996, two years after the C64 had been discontinued and long after it had been technologically eclipsed. For other techniques I’m mostly relying on what ultimately evolved as the community consensus. But effects were usually possible long before they were understood, and ultimately before the underlying principles were worked out thoroughly enough that a design could be settled on as optimal. In this article I’ll be looking at an early technique that was used to stabilize the C64 raster.

Revisiting the Problem

Ordinarily when you are programming C64 graphics, you don’t really have to worry about what the hardware is doing—you just write to the graphics registers and let it handle rendering the screen as needed. For more advanced techniques, such as split-screen displays, the graphics chip could interact with the CPU’s interrupt system to let you juggle the graphical configuration once a certain number of lines of display had been output. This is inherently imprecise. Not only will the CPU not react immediately to this interrupt, the graphics chip can also interfere with the CPU’s ability to run when it is fetching text or sprite data. In the absence of interference—or in cases where the amount of interference is known in advance—it is possible to establish lower and upper bounds on where in a scanline your display currently is. Taking advantage of that knowledge will let you derive timing information that constrains where you can put your reconfiguration code where it will change the display without producing noticeable flicker or other artifacts.

In practice, nobody ever really did this. Early advice would simply take the form of suggesting that you wait 16 cycles or so before writing your registers and everything would be fine, with the number 16 determined by experiment. Later advice, however, would suggest you remove the uncertainty entirely, using one of several techniques to synchronize precisely with the display. The Atari 2600, as we saw last year, gives us this capability for free with a memory location that, when written, halts the CPU until the exact microsecond the next scan line begins. We are not so fortunate on the Commodore 64. The most reliable method for getting this synchronization involves scheduling two interrupts in rapid succession followed by an additional trick to smooth out the final cycle of instability.

Once you have that synchronization, you can then start doing cycle-exact writes to the graphics registers to get horizontally stable visual effects like we saw on the Atari 2600’s displays, like changing the border or background colors to get a striping effect that exists nowhere in the actual display memory:


The method I outlined back in 2015 is not the only way to do this. An alternate technique involving sprites was also well-known and got a good write-up in issue three of C=Hacking magazine. This was distributed as text and has a a complete archive at the Fridge. Pasi Ojala (who has appeared earlier on this blog as the author of the excellent PUCRUNCH utility) had a column there called The Demo Corner where he would explain how various tricks worked, and he was writing back when the C64 was still being manufactured and sold, too.

The Demo Corner is a fine series and I do recommend it if you want to read more about how the Commodore 64 actually does its work—the remainder of this article will be me taking the technique from issue 3 and contextualizing it within the techniques I’ve already derived and worked through.

Continue reading

Dissecting Three Classic Automatic Proofreaders

I’ve been thinking about type-in programs again. In particular, I’ve been thinking about one of the features many magazines and books provided for type-in programs that I never actually saw back when I was a youth typing programs in: automatic proofreader programs that would provide verification codes for the program as you typed it in, thus saving you multiple passes through the program trying to figure out why it was giving you wrong answers.

In poking around through the Internet Archive’s collections, I’ve found three of note and in this article I’ll be picking them apart.

SWAT: Strategic Weapon Against Typos

I encountered the SWAT system from publications associated with SoftSide Magazine, which focused on the TRS-80, the Apple, and the Atari home computers. These have generally been a bit before the time I focus on, though I really do owe the Atari home computers a more thorough investigation. The earliest attestation of the system I’ve found is in the June 1982 issue, and it provides implementations for all three of its target systems.

SWAT was a program intended to be appended to the program that it was to check; one would then start running the program from that point instead of running the program proper. It would then compute a simple checksum of every byte in the program by adding them up and then printing them out in groups. You would then check these codes against a separate, shorter listing that provided the codes a correct listing would produce. If they didn’t match, one edited the program until they did.

This is somewhat interesting because this is much closer to how we would organize such a utility in this day and age. The program would be read in, and a SWAT code table would be printed out. The other systems we will see in this article essentially modify the code editor and require checking as one types.

SWAT takes three parameters: the boundaries of the program to check, the maximum number of lines per chunk (default 12), and the target number of bytes per chunk (default 500). It then scans through the program as it exists in memory, producing a running sum of every byte in the program, modulo 676. Once it reaches the end of a line, it checks to see if this is the maximum number of lines, or if the byte target has been exceeded. If it is, it emits a line on the SWAT table indicating the range of lines, the total number of bytes, and the resulting sum. Instead of printing the sum as a number between 0 and 676, it emits it as two letters. (676 is, after all, 26*26.) The first letter is the “tens digit” of the result.

One interesting thing about this is that it does not operate on the actual text the user typed. The BASICs for these three systems analyze and reformat the instructions so that they may be executed more efficiently at run time (a process that documentation of the time often called crunching, but which modern writers would call tokenizing), and it is the tokenized form of the program that is summarized. This meshes extremely well with Applesoft BASIC, because its tokenizer actually also removes all user-supplied formatting, which means that all program lines are actually converted into a single canonical form. The TRS-80 preserved all user formatting, which meant that the program had to be entered by the user exactly as printed to match the SWAT codes. The Atari systems were particularly unusual—they normalized input lines like Apple did, but some quirks of its tokenization process meant that how lines were tokenized would depend on the order in which they were entered, so skipping around in a program while entering it or editing typos along the way could actually corrupt your SWAT codes. Fortunately, there was a procedure for normalizing a program, and so SWAT simply required users to perform this procedure before running any checks.

As a checksum, this mostly did what it needed to, but it wasn’t ideal. In addition to its false positives, a simple sum of bytes will not catch transposition of characters, and for programs with a lot of DATA statements, this was the most dangerous and difficult-to-identify problem that a user was likely to cause. Summing the collapsed tokens, however, did mean that any misspelling of a word BASIC recognized would be immediately obvious, altering not only the final sum but even the length of the line. For the kinds of programs that SoftSide tended to publish, this was entirely adequate, though. Their programs tended to be pure BASIC and would not have large amounts of machine code or graphical data within them.

That privilege would go to Compute!’s Gazette, which focused on the Commodore line, which also required much more aggressive use of machine code and memory-mapped I/O to function.

Compute!’s Automatic Proofreader (1983-1986)

Compute!’s Gazette started out as a magazine for the VIC-20 and the Commodore 64. In October 1983 they introduced a pair of programs that would provide proofreading support for automatic proofreading. The tighter focus of the magazine—and the close similarity of the operating systems of the two machines, even at the binary level—allowed the editors to provide tools that hooked much more deeply with the machine.

All the Commodore 8-bit computers provided a uniform interface for basic I/O operations, and also provided a number of points where they user could replace core functionality with custom routines. This low-level interface—which they called the KERNAL—allowed a lot of work to be done at the machine code level and still run acceptably across the entire line.

This program worked by copying itself into a block of memory that was only used for tape I/O and which was commonly used by BASIC programs as scratch space for small machine language programs. A simple BASIC loader copied it into place and then ran a routine that attached the bulk of the program to the KERNAL’s character-input routine. This routine, interestingly, wasn’t called when the user pressed a key; instead, once a line had been entered, the screen-editor logic decided which part of the screen constituted that line and then provided the contents of that line as input, followed by the RETURN key that kicked it all off.

This proofreader would read characters and add their codes to a running 8-bit total, wrapping around as necessary, and ignoring spaces. When the return key was detected, it would stash the output state, then move the cursor to the upper left corner of the screen, print out the final sum (from 0 to 255), and then set the cursor back the way it was. As a checksumming algorithm, this had the same problems with not detecting transposition of characters that SWAT did, and it also was less reliable about misspelled keywords (since this scan was happening before tokenization). On the plus side, a new code was generated for every line of text and you could check your work as you typed, or list an entire block and check it by going to the top of the program block and repeatedly pressing RETURN to evaluate each line.

Early versions of the proofreader had two editions, one for the VIC-20 and one for the Commodore 64, but the only actual difference between the versions was that they called a routine in the BASIC ROM to convert the byte into a decimal number, and the BASIC ROM was mapped to a different part of memory in the two machines. The API for the functions was identical, and indeed the BASICs were so similar that this was the same routine, in the end.

Ultimately later editions of this proofreader unified the two versions and usde the actual original value of the “character read” routine that the proofreader hooked itself up to as a switch to decide where to call to print a decimal number. This added a dozen bytes or so to the final program but even on the extremely cramped VIC-20 this was a cost that could be easily paid.

However, the tighter binding to the operating system produced some unique drawbacks as well. The CHRIN routine the proofreader extended was actually called for all kinds of character input, not just program lines. As a result, running a program with the proofreader active would have it corrupt the program’s display with handy check codes for every response the user gives to an INPUT statement. Worse, it would do the same for textual data read off of the disk or tape. Of course, the tape wouldn’t have time to do any reading; once the tape routines started using their temporary storage, this would trash the memory holding the proofreader, and the system would begin trying to execute random temporary data as code and probably crash extremely hard.

Compute!’s Automatic Proofreader (1986-)

Over the next few years, Compute!’s Gazette got more and more sophisticated programs in its lineup—many approaching or exceeding commercial quality—and it also got several more systems it needed to support. In February 1986, they updated their proofreader to use a more sophisticated technique. While they were at it, they also addressed all the shortcomings I listed above.

The most difficult issue to address was where to put the proofreader so that it would not be touched by the running system during normal operation. They fixed this by pushing the start of BASIC’s program memory forward 256 bytes and using the space freed for that as dedicated to the proofreader. However, this was a different place in memory for the five machines they supported, so they also needed to patch the program after loading so that the addresses pointed to the right place. The necessary information for patching turns out to be largely supplied in a portable way by the KERNAL, so this is not as heinous as it sounds, but it does still require the most sophisticated BASIC loader I have seen.

The other system-specific issues were solved by extending the “tokenize a line of BASIC text” function instead of the “read a character” function. This also lets the proofreader intervene less frequently and lets it process an entire lin eof text at once, guaranteed. User input and file I/O aren’t intercepted, and with the program relocated to the old start of BASIC RAM, tape I/O works fine too.

The final—and, for the user, the most important—change was to use a more sophisticated checksum algorithm that can actually reliably flag swapped characters and make it much less likely for typos to cancel each other out:

  1. The checksum is a 16-bit unsigned integer, and its initial value is the line number being processed.
  2. The line is preprocessed by removing all spaces that are not between quotes. So, for instance, 10 PRINT "HELLO WORLD!" becomes 10PRINT"HELLO WORLD!"
  3. Add the byte value of each character to the checksum, but before adding it, multiply it by its position in the line after extraneous blanks are removed. So, for our sample line, the checksum starts at 10, then gets 49*1 and 48*2 added for the line number 10, then 80*3 for the P in PRINT, and so on.
  4. XOR the high and low bytes of the checksum together to produce the final 8-bit checksum.
  5. Express the checksum as a two-letter code. This is basically a two-digit hexadecimal number, but the least significant digit comes first and instead of using the traditional 0123456789ABCDEF digits, it instead uses the letters ABCDEFGHJKMPQRSX.

This scheme was sufficiently effective that they never modified it afterwards and it continued in use until Compute! stopped publishing type-in programs in the early 1990s. That is a solid pedigree.

After the jump, I will dissect the sophisticated BASIC loader that was used to make the same core program work on five different computer models, and then present my reconstruction of the proofreader itself.

Continue reading

2018 Compilation and Review

2018 has come and gone, so it’s time for me to do a summary post and collection of my work on Bumbershoot over the year.


The various projects I did on this blog in 2018 are now collected for download in a single zip file. 2018 was marked more by a series of larger projects rather than a swarm of small programs. I had four sizable projects I worked through line by line and built from first principles:

In addition to those, there were a handful of smaller programs created along the way to test my build systems or my grasp of specific hardware techniques:


2018 was the most active year for the blog by a factor of about two; this is the first year I cracked 5,000 page views, 50 articles, and 100,000 words. The most popular articles this year were largely the same as last year, covering weird CGA modes and file formats and machine code linking on the ZX81. Sneaking into the top 5 was my article about the VSP Glitch on the C64, which seems to have rocketed above my other C64 articles thanks to a Reddit comment linking to it as an explanation for how Mayhem in Monsterland managed its high-speed scrolling.

The three articles I wrote this year that got the most views were the beginning and the end of my Atari 2600 project, and the post on the legacy sound chip on the Sega Genesis. All them seem to, again, have risen above the other articles due to links in forums. As usual, though, Bumbershoot Software mostly works as a standing reference, and search engines drive more traffic than everything else combined by an order of magnitude.

Other Stuff

Off of my usual topics, 2018 was also interesting in that it saw the release of five games that were very different from one another but also targeted quite narrowly at my current gameplay interests. I can’t really rank them against each other for a top five list, so here they are in alphabetical order:

  • Celeste. I played a lot of Thorson’s early work—most notably the Jumper series—and while those were occasionally a bit rough-hewn I consider them foundational to the “challenge platformer” subgenre, which also includes games like VVVVVV and Super Meat Boy, but which distinguishes itself from “masocore” games like I Wanna Be The Guy by always honestly presenting what the current challenge is. (This measure, which I discussed as part of what “perfect play” means across genres, does mean that Limbo also stands with I Wanna Be The Guy despite having much more forgiving platforming challenges.) Celeste is an extremely well-polished challenge platformer and quite possibly the best example of the subgenre now extant. It achieves this through excellent controls and map design but also through accessibility—while much has been made of Celeste’s Assist Mode, even a player whose training and reflexes are a match for the intended design will find that the most punishing or abstruse stages are hidden behind clearly optional unlocks. I have observed that Super Meat Boy is in part about testing the player to destruction, and that as a result its plot and ending cutscenes and such are all extremely perfunctory. Celeste actually wants to tell a story alongside its challenges and it puts the more generally-inaccessible stages in places where no story is being told. It’s a very effective combination.
  • EXAPUNKS. Zachtronics games are, in effect, a series of programming challenges. I like them, but I often have trouble sticking with them because it’s hard to motivate myself to write assembly language programs for pretend computers when I could instead be writing assembly language programs for real computers. The earlier Opus Magnum avoided this fate for me by not being as obviously a programming exercise even though it was one (you schedule motions of mechanical arms to assemble alchemical compounds), but EXAPUNKS seems to avoid it by having a sufficiently exotic programming model. Commanding cute little spider robots to run rampant through a pretend network seems to be far enough from the retrocoding projects I actually do to keep the challenges from interfering with each other in my motivation.
  • La-Mulana 2. The original La-Mulana was an homage to the MSX generally and Konami’s Knightmare II: Maze of Galious specifically. However, Maze of Galious was in my opinion an unplayable mess, while (with a few exceptions) La-Mulana managed to be crammed full of tricks and secrets and still mostly work. It did this by (apparently unconsciously, since they’re open about their inspirations and didn’t list this one) lifting a lot of design aesthetics from Myst—overwhelm the player with information and have all of it be relevant to something eventually. This is then layered on top of fairly-traditional action-adventure exploration gameplay. When that game was refined and modernized for the Wii, the parts that were problematic in the design were polished away and the combat was rebalanced and generally improved. At that point it stopped being a quirky obscure freeware game and started being an interesting genre-jam game that didn’t get imitated. The sequel is in some sense more of the same, but since the original hasn’t been imitated since its release more of the same was very welcome.
  • Return of the Obra Dinn. This is a first-person adventure game in the Myst mold, but manages to evolve the formula there in meaningful ways. Standard Mystlike games tend to involve using information in the environment to bypass obstacles, which in practice often reduces to replacing “find a key somewhere and use it to unlock a door somewhere else on the map” with “find a combination written on a wall somewhere and use it to open a combination lock somewhere else on the map.” Sometimes you’ll have to build the combination out of a lot of disparate pieces—Riven and the La-Mulana series both excelled at this—but Obra Dinn evolves the formula by requiring more aggressive deductive work on the part of the player to get the answers required. Despite being technically just a new, improved twist on a classic game design, it’s been long enough since Riven that this game was received like a bolt of lightning from a cloudless sky. It’s not that good but it is very, very good—and if you didn’t play the old Myst and La-Mulana games, you may indeed have never seen anything like this before.
  • Yoku’s Island Express. This is a super-cheerful action-adventure game built around pinball controls, starring a dung beetle turned postmaster drafted into a plan to save the island’s local gods. Despite all that it remains relentlessly cheerful all the way through (you have a dedicated button to blow a party horn and possibly throw confetti around—while there are in-game reasons to do this you are free to deploy it at any point) and I found the difficulty to stay well within reasonable bounds. I’m not very good at pinball, but I was able to work my way through the game without too much trouble, and it also neatly avoided what I think of as the biggest problem with high-level pinball play—the path to high scores usually involves finding some technique that’s reasonably high scoring and that one can perform with extreme consistency, and then doing that thing for as long as your endurance and precision can hold out. Here, because you have actual plot objectives to accomplish and a cap on “score”—where a normal pinball game would grant bonus points, you get bonus money instead, and there’s a cap on how much money you can carry at a time—you are always encouraged to attempt sequences of more varied skill shots to progress. It’s an interesting case study in how mixing another genre of gameplay into a game can address shortcomings in the original genre’s gameplay.

Bumbershoot Software in 2019

2018 saw me complete my ambition of doing a software release for every platform I grew up with. As such, I don’t have any big pressing projects bearing down on me this year as things I really want to attack going into the new year. That said, this was the state I was in for most of the year and I wrote more than I ever have, and got several releases out to go with them.

As such, I’ll be walking into 2019 with no firm plans for the blog but confident that something interesting will ultimately come up. Onward we go!

Forcing an Aspect Ratio in 3D with OpenGL

OK, enough of Cocoa. Let’s go play with something else for a bit.

I’ve done three posts now on freely-scaled aspect-corrected 2D images.

So lets do it in 3D today instead, for a change of pace.

An Initial Caveat

For most 3D applications, you really don’t want to do an aspect-corrected scaling system like this. Instead you should let the end user specify an FOV angle and then render to the aspect ratio that your drawing area has; the user will just get to see more or less of your 3D world, as needed. This technique is only really interesting if you’re only rendering a smallish region that can’t be meaningfully expanded, and thus need to maintain a 4:3 or 16:9 aspect ratio irrespective of the actual display size.

Restricting the Display in OpenGL

OpenGL is very convenient about letting us only render to part of the screen—the glViewport function lets us pick any fraction of the window’s rectangle to be the part we render to.

Actually getting the window size is OS-specific, but if you’re using SDL or SDL2 to manage your OpenGL context, it will manage it for you. We can compute the viewport we wish to draw in a manner similar to what we did with Cairo:

    int width, viewport_width;
    int height, viewport_height;
    int viewport_x = 0;
    int viewport_y = 0;
    SDL_GL_GetDrawableSize(window, &width, &height);
    viewport_width = width;
    viewport_height = height;
    if (width * 3 > height * 4) {
        viewport_width = height * 4 / 3;
        viewport_x = (width - viewport_width) / 2;
    } else if (width * 3 < height * 4) {
        viewport_height = width * 3 / 4;
        viewport_y = (height - viewport_height) / 2;
    glViewport (viewport_x, viewport_y, viewport_width, viewport_height);

We can then prepare our OpenGL display as if it were any other 4:3 aspect-ratio display, but there is a small problem that may appear:


Despite the fact that we’ve set the viewport to only be part of the window, it turns out that glClear will clear the entire window. Normally we would want our pillarboxes or letterboxes to be a different color.

Clearing Just The Viewport

The solution for this in OpenGL involves taking advantage of the scissor test. This is a different clipping rectangle that can be turned on and off and is independent of the viewport itself. In this case, however, we want it to be the same as the viewport, and we set it by calling glScissor with the same arguments as the ones we computed for glViewport. It turns out that glClear doesn’t respect the viewport, but it does respect the scissor test if you turn it on. Thus, we get our two-color clear (black for the pillarboxes, bright blue for the viewport) with this sequence of GL calls:


This will give us the display we want.

About the Sample Screenshot

The sample code and screenshots here are from my final project from a graphics class I took back in 2004 or so; at the time it would force the screen resolution to 640×480 and its procedurally-generated terrain did not extend much past the viewable area. When I was experimenting with this pillarboxing technique, I was porting its window management code to SDL2, and that meant altering its fullscreen code so that it would still look right when the actual window was 1920×1080, or whatever other resolution the user’s desktop had.