Category Archives: commodore

Implementing SHA-256 on the 6502

After five implementations of various algorithms built mostly around bit shuffling, how about a properly modern, common, useful routine? We just need to find something that has a very similar structure and then we can deploy the knowledge we’ve learned in those other domains.

SHA-2 fits this bill neatly, and Wikipedia even has handy pseudocode we can base an implementation on. In looking at the implementations and specifications, it also seems like SHA-256 is a relatively good fit for the 6502, and may well qualify as the most sophisticated modern algorithm that the 6502 can still comfortably handle. We don’t really need to worry about endianness or word size—since we’re an 8-bit chip we can do multibyte mathematics in any direction we want and the size of the algorithm’s words really only alters our loop bounds—but SHA-256’s largest indexed elements are the round constants and the schedule arrays, each of which are 64 32-bit words long. That translates out to tables exactly 256 bytes in size, which is exactly the size of a 6502 memory page and thus the largest amount of memory that can be efficiently indexed.

Initial Implementation

Translating the pseudocode from the wikipedia article into code was not terribly challenging. I pre-defined tables of the necessary constants in the algorithm, and then set aside a fixed block of RAM to store all the working state that changes as a sum is computed. (In a modern system, we would probably instead operate on a pointer to arbitrary blocks of RAM, but the 6502 is much, much better at operating on blocks of memory at fixed addresses than it is at working with pointers, and I wanted to give the poor CPU every edge it could get.) I then defined several 4-byte blocks of memory for use as accumulators—I defined a variety of small functions that would do 32-bit big-endian math based on the contents of the accumulators and which could store results into and out of the working state as needed. There were a few awkward bits here—I ended up replicating a few functions because the 256-byte index limit got inconvenient at a few points—but nothing about the implementation of the algorithm really ended up having to deviate much from the pseudocode. So the implementation step was quite simple and easy, much as I had hoped.

However, the end result did not come out the way I had hoped it would. While it did give the correct answers to my test vectors, it took 580 milliseconds to consume a single block—nearly 10 milliseconds per byte! I wasn’t willing to push this too far—after all, these algorithms are tuned to be feasible only on modern chips—but we can surely do better than this.

Continue reading

Advertisements

Stabilizing the C64 Display with Sprites

For the most part I came to the 8-bit programming world very late. By the time I’m looking into what makes a system tick, all the development on that platform has ceased or at least plateaued, with the basic tricks of the system (including any consistent undocumented or unintentional capabilities of the chips) well-codified. This isn’t always really consistent—the Sega Genesis, for instance, is much easier to emulate these days than it is to intentionally develop for unless you were there at the time—but it’s still definitely influencing the way I’ve approached my own projects.

However, the development communities didn’t start out knowing these techniques. The main document I used to inform my Commodore 64 graphics work was published in 1996, two years after the C64 had been discontinued and long after it had been technologically eclipsed. For other techniques I’m mostly relying on what ultimately evolved as the community consensus. But effects were usually possible long before they were understood, and ultimately before the underlying principles were worked out thoroughly enough that a design could be settled on as optimal. In this article I’ll be looking at an early technique that was used to stabilize the C64 raster.

Revisiting the Problem

Ordinarily when you are programming C64 graphics, you don’t really have to worry about what the hardware is doing—you just write to the graphics registers and let it handle rendering the screen as needed. For more advanced techniques, such as split-screen displays, the graphics chip could interact with the CPU’s interrupt system to let you juggle the graphical configuration once a certain number of lines of display had been output. This is inherently imprecise. Not only will the CPU not react immediately to this interrupt, the graphics chip can also interfere with the CPU’s ability to run when it is fetching text or sprite data. In the absence of interference—or in cases where the amount of interference is known in advance—it is possible to establish lower and upper bounds on where in a scanline your display currently is. Taking advantage of that knowledge will let you derive timing information that constrains where you can put your reconfiguration code where it will change the display without producing noticeable flicker or other artifacts.

In practice, nobody ever really did this. Early advice would simply take the form of suggesting that you wait 16 cycles or so before writing your registers and everything would be fine, with the number 16 determined by experiment. Later advice, however, would suggest you remove the uncertainty entirely, using one of several techniques to synchronize precisely with the display. The Atari 2600, as we saw last year, gives us this capability for free with a memory location that, when written, halts the CPU until the exact microsecond the next scan line begins. We are not so fortunate on the Commodore 64. The most reliable method for getting this synchronization involves scheduling two interrupts in rapid succession followed by an additional trick to smooth out the final cycle of instability.

Once you have that synchronization, you can then start doing cycle-exact writes to the graphics registers to get horizontally stable visual effects like we saw on the Atari 2600’s displays, like changing the border or background colors to get a striping effect that exists nowhere in the actual display memory:

sync_stripes

The method I outlined back in 2015 is not the only way to do this. An alternate technique involving sprites was also well-known and got a good write-up in issue three of C=Hacking magazine. This was distributed as text and has a a complete archive at the Fridge. Pasi Ojala (who has appeared earlier on this blog as the author of the excellent PUCRUNCH utility) had a column there called The Demo Corner where he would explain how various tricks worked, and he was writing back when the C64 was still being manufactured and sold, too.

The Demo Corner is a fine series and I do recommend it if you want to read more about how the Commodore 64 actually does its work—the remainder of this article will be me taking the technique from issue 3 and contextualizing it within the techniques I’ve already derived and worked through.

Continue reading

Dissecting Three Classic Automatic Proofreaders

I’ve been thinking about type-in programs again. In particular, I’ve been thinking about one of the features many magazines and books provided for type-in programs that I never actually saw back when I was a youth typing programs in: automatic proofreader programs that would provide verification codes for the program as you typed it in, thus saving you multiple passes through the program trying to figure out why it was giving you wrong answers.

In poking around through the Internet Archive’s collections, I’ve found three of note and in this article I’ll be picking them apart.

SWAT: Strategic Weapon Against Typos

I encountered the SWAT system from publications associated with SoftSide Magazine, which focused on the TRS-80, the Apple, and the Atari home computers. These have generally been a bit before the time I focus on, though I really do owe the Atari home computers a more thorough investigation. The earliest attestation of the system I’ve found is in the June 1982 issue, and it provides implementations for all three of its target systems.

SWAT was a program intended to be appended to the program that it was to check; one would then start running the program from that point instead of running the program proper. It would then compute a simple checksum of every byte in the program by adding them up and then printing them out in groups. You would then check these codes against a separate, shorter listing that provided the codes a correct listing would produce. If they didn’t match, one edited the program until they did.

This is somewhat interesting because this is much closer to how we would organize such a utility in this day and age. The program would be read in, and a SWAT code table would be printed out. The other systems we will see in this article essentially modify the code editor and require checking as one types.

SWAT takes three parameters: the boundaries of the program to check, the maximum number of lines per chunk (default 12), and the target number of bytes per chunk (default 500). It then scans through the program as it exists in memory, producing a running sum of every byte in the program, modulo 676. Once it reaches the end of a line, it checks to see if this is the maximum number of lines, or if the byte target has been exceeded. If it is, it emits a line on the SWAT table indicating the range of lines, the total number of bytes, and the resulting sum. Instead of printing the sum as a number between 0 and 676, it emits it as two letters. (676 is, after all, 26*26.) The first letter is the “tens digit” of the result.

One interesting thing about this is that it does not operate on the actual text the user typed. The BASICs for these three systems analyze and reformat the instructions so that they may be executed more efficiently at run time (a process that documentation of the time often called crunching, but which modern writers would call tokenizing), and it is the tokenized form of the program that is summarized. This meshes extremely well with Applesoft BASIC, because its tokenizer actually also removes all user-supplied formatting, which means that all program lines are actually converted into a single canonical form. The TRS-80 preserved all user formatting, which meant that the program had to be entered by the user exactly as printed to match the SWAT codes. The Atari systems were particularly unusual—they normalized input lines like Apple did, but some quirks of its tokenization process meant that how lines were tokenized would depend on the order in which they were entered, so skipping around in a program while entering it or editing typos along the way could actually corrupt your SWAT codes. Fortunately, there was a procedure for normalizing a program, and so SWAT simply required users to perform this procedure before running any checks.

As a checksum, this mostly did what it needed to, but it wasn’t ideal. In addition to its false positives, a simple sum of bytes will not catch transposition of characters, and for programs with a lot of DATA statements, this was the most dangerous and difficult-to-identify problem that a user was likely to cause. Summing the collapsed tokens, however, did mean that any misspelling of a word BASIC recognized would be immediately obvious, altering not only the final sum but even the length of the line. For the kinds of programs that SoftSide tended to publish, this was entirely adequate, though. Their programs tended to be pure BASIC and would not have large amounts of machine code or graphical data within them.

That privilege would go to Compute!’s Gazette, which focused on the Commodore line, which also required much more aggressive use of machine code and memory-mapped I/O to function.

Compute!’s Automatic Proofreader (1983-1986)

Compute!’s Gazette started out as a magazine for the VIC-20 and the Commodore 64. In October 1983 they introduced a pair of programs that would provide proofreading support for automatic proofreading. The tighter focus of the magazine—and the close similarity of the operating systems of the two machines, even at the binary level—allowed the editors to provide tools that hooked much more deeply with the machine.

All the Commodore 8-bit computers provided a uniform interface for basic I/O operations, and also provided a number of points where they user could replace core functionality with custom routines. This low-level interface—which they called the KERNAL—allowed a lot of work to be done at the machine code level and still run acceptably across the entire line.

This program worked by copying itself into a block of memory that was only used for tape I/O and which was commonly used by BASIC programs as scratch space for small machine language programs. A simple BASIC loader copied it into place and then ran a routine that attached the bulk of the program to the KERNAL’s character-input routine. This routine, interestingly, wasn’t called when the user pressed a key; instead, once a line had been entered, the screen-editor logic decided which part of the screen constituted that line and then provided the contents of that line as input, followed by the RETURN key that kicked it all off.

This proofreader would read characters and add their codes to a running 8-bit total, wrapping around as necessary, and ignoring spaces. When the return key was detected, it would stash the output state, then move the cursor to the upper left corner of the screen, print out the final sum (from 0 to 255), and then set the cursor back the way it was. As a checksumming algorithm, this had the same problems with not detecting transposition of characters that SWAT did, and it also was less reliable about misspelled keywords (since this scan was happening before tokenization). On the plus side, a new code was generated for every line of text and you could check your work as you typed, or list an entire block and check it by going to the top of the program block and repeatedly pressing RETURN to evaluate each line.

Early versions of the proofreader had two editions, one for the VIC-20 and one for the Commodore 64, but the only actual difference between the versions was that they called a routine in the BASIC ROM to convert the byte into a decimal number, and the BASIC ROM was mapped to a different part of memory in the two machines. The API for the functions was identical, and indeed the BASICs were so similar that this was the same routine, in the end.

Ultimately later editions of this proofreader unified the two versions and usde the actual original value of the “character read” routine that the proofreader hooked itself up to as a switch to decide where to call to print a decimal number. This added a dozen bytes or so to the final program but even on the extremely cramped VIC-20 this was a cost that could be easily paid.

However, the tighter binding to the operating system produced some unique drawbacks as well. The CHRIN routine the proofreader extended was actually called for all kinds of character input, not just program lines. As a result, running a program with the proofreader active would have it corrupt the program’s display with handy check codes for every response the user gives to an INPUT statement. Worse, it would do the same for textual data read off of the disk or tape. Of course, the tape wouldn’t have time to do any reading; once the tape routines started using their temporary storage, this would trash the memory holding the proofreader, and the system would begin trying to execute random temporary data as code and probably crash extremely hard.

Compute!’s Automatic Proofreader (1986-)

Over the next few years, Compute!’s Gazette got more and more sophisticated programs in its lineup—many approaching or exceeding commercial quality—and it also got several more systems it needed to support. In February 1986, they updated their proofreader to use a more sophisticated technique. While they were at it, they also addressed all the shortcomings I listed above.

The most difficult issue to address was where to put the proofreader so that it would not be touched by the running system during normal operation. They fixed this by pushing the start of BASIC’s program memory forward 256 bytes and using the space freed for that as dedicated to the proofreader. However, this was a different place in memory for the five machines they supported, so they also needed to patch the program after loading so that the addresses pointed to the right place. The necessary information for patching turns out to be largely supplied in a portable way by the KERNAL, so this is not as heinous as it sounds, but it does still require the most sophisticated BASIC loader I have seen.

The other system-specific issues were solved by extending the “tokenize a line of BASIC text” function instead of the “read a character” function. This also lets the proofreader intervene less frequently and lets it process an entire lin eof text at once, guaranteed. User input and file I/O aren’t intercepted, and with the program relocated to the old start of BASIC RAM, tape I/O works fine too.

The final—and, for the user, the most important—change was to use a more sophisticated checksum algorithm that can actually reliably flag swapped characters and make it much less likely for typos to cancel each other out:

  1. The checksum is a 16-bit unsigned integer, and its initial value is the line number being processed.
  2. The line is preprocessed by removing all spaces that are not between quotes. So, for instance, 10 PRINT "HELLO WORLD!" becomes 10PRINT"HELLO WORLD!"
  3. Add the byte value of each character to the checksum, but before adding it, multiply it by its position in the line after extraneous blanks are removed. So, for our sample line, the checksum starts at 10, then gets 49*1 and 48*2 added for the line number 10, then 80*3 for the P in PRINT, and so on.
  4. XOR the high and low bytes of the checksum together to produce the final 8-bit checksum.
  5. Express the checksum as a two-letter code. This is basically a two-digit hexadecimal number, but the least significant digit comes first and instead of using the traditional 0123456789ABCDEF digits, it instead uses the letters ABCDEFGHJKMPQRSX.

This scheme was sufficiently effective that they never modified it afterwards and it continued in use until Compute! stopped publishing type-in programs in the early 1990s. That is a solid pedigree.

After the jump, I will dissect the sophisticated BASIC loader that was used to make the same core program work on five different computer models, and then present my reconstruction of the proofreader itself.

Continue reading

An Ambition, Completed

I belong that that somewhat narrow cohort that straddles Generation X and the Millenials—we grew up with the microcomputer omnipresent but we remember a time before The Internet. The cohort has gotten various attempts at nicknames but I’ve always been fond of the name Oregon Trail generation, after the famous educational software.

My parents were also somewhat technically inclined (or at least gadget-oriented), and as a result I had grown up alongside a number of game consoles and home computers. A lot of the work I did on those, both as a hobby and for schoolwork, shaped my talents in the years to come.

One of my goals here on Bumbershoot over the past few years has been to demystify the machines I grew up with. I’m now a genuine software professional, after all—surely I’ve learned enough to see how the tricks were done and to join in myself, if a bit late.

As such, once I realized that this was what I was doing, I formalized the nature of the task and started getting systematic about it. And now, with this latest release, I have finished it.

The Rules

Here are the rules I had set for this task.

  • Childhood ends in 1992. This is a somewhat arbitrary division point, but it turns out I can pick any year from there to my 20th birthday and not change the list of targeted systems, and 1992 means I don’t have to distinguish between when a platform was released in North America and when I got access to it.
  • Systems require a full year of exposure to count as “one I grew up with.” I had brief acquaintance with the Macintosh and the SNES, but both were systems I didn’t really have any experience with until adulthood. I first beat Super Metroid in 2010.
  • Programs must run on the metal. Interpreted languages are permitted only to the extent that they can then transfer control to machine code. This started out as “BASIC and LOGO don’t count” but eventually became a requirement that, for at least one project, I produce a binary where every single byte in the program exists because I put it there.
  • Programs must be for the platform. Writing a highly portable C program and jamming it through a compiler for each system doesn’t count. I should make use of a generous set of system-specific capabilities that let it show off what it’s good at. Any stunt that reaches demoeffect quality automatically achieves this goal.
  • The platform must have been intended to receive third-party programs. That means the graphing calculators we had in high school don’t count, even though they had Z80 chips in them and you could program them with special connectors and PC programs that no longer run on 64-bit operating systems.
  • Period tools are not required. Since half the targets are game consoles, cross-development tools were the norm even at the time.

So, without further ado, the systems I grew up with.

Continue reading

Lessons Learned: Color Chart Madness Revisited

As long as I’m revisiting old work, I might as well revisit one of the first projects I undertook for this blog, four years ago: Color Chart Madness. I took a sample program from one of my reference books, took it apart, then put it back together again, better.

What’s kind of funny, looking back on it, is that I really had no idea how the VIC-II chip actually worked, and I needed to understand a lot of details about it in order to grasp what was really going on. I didn’t, however, seem to need to understand those details in order to fix all the problems I saw with it. In retrospect, it’s a case study of my past self working around my own ignorance.

What I Thought Was Happening

The program I was dissecting was, at core, a simple rasterbar effect. It wanted to display every combination of foreground and background, so it printed out text in all sixteen colors and then changed the background color sixteen times a frame to produce all the combinations. The initial implementation of this shut off the clock interrupt and then spinlocked on the raster count register in $D012 updating the background color in $D021 every eight lines. The problem with it—beyond the way that it was essentially locking up the system so hard the only way out was a soft-reset—was that the color change was happening a line too early. This meant that the character cells looked a bit uneven, and there was a spurious black line at the very bottom of the text display. Worse, setting the spinlock to terminate one line later ended up making the display incredibly unstable or, failing that, made the color changes happen on different incorrect lines.

cc1_0

My initial theory here as that there was somehow something wrong with the $D012 read—either the register was fundamentally unreliable, or the emulator was taking shortcuts. I rewrote the routine to instead be based on writes to $D012, replacing the old spinlock with a series of raster interrupts. That rewrite, with no other changes to the program constants, fixed the display completely. That only deepened my suspicions about the reliability of the data the first version depended on.

I couldn’t say I really understood what was going on fully, though, because there were still anomalies left behind. In particular, the raster values I had chosen matched the original version’s, but based on both Compute!’s Mapping The C64 and the C64 Programmer’s Reference Guide those values were one less than they should have been to change the color at the top of each letter. Furthermore, even though I had a display that looked the way I wanted it to, changing the raster target by a single line did not actually consistently change the color-change point by a single line.

As it turns out, these facts were only confusing me because I had gravely misunderstood the way the VIC-II builds its displays. My understanding of this system advanced in fits and starts over the course of about a year:

  • Cycle-exact Delays for the 6502, where I adapted some techniques I’d seen in an Atari 2600 demo to systematically trigger the anomalies I had encountered in Color Chart Madness.
  • Color Test: an Atari 2600 program, where I produced a complete Atari 2600 program on my own. This gave me direct experience with directing a raster display, and having to do that work “by hand” on the 2600 give me the background I needed to understand how the VIC-II did it automatically.
  • Flickering Scanlines, where I take that background and finally understand the 1996-era document that explains the operation of the VIC-II chip and thus which lets me really understand the anomalies above.
  • A taxonomy of splitscreens, where I begin to apply this knowledge to extracting reasonable effects out of the C64.
  • VIC-II Interrupt Timing, where I finally sit down, do the math, and work out the precise timing tolerances required to get various effects without relying on the advanced, cycle-exact techniques the demoscene standardized on.

That’s a little over a year’s worth of fitful exploration, research, and experimentation. At this point we should be able to just do a brief and accurate explanation of what issues I had run into and why the techniques I used fixed it.

What Was Actually Happening

The reason switching from a spinlock to a raster interrupt fixed the display is actually pretty trivial. There’s nothing at all wrong with reading $D012, and it is indeed that value that triggers the interrupt. Furthermore, it really was testing for the line before the start of any given character. However, it takes an extra 40 cycles or so to actually process the raster interrupt and get into code that we had, ourselves, written—and those 40 cycles were enough for the screen to display enough of that line in the old color to keep the display looking right. It was still setting it “too early”, but the grey background of the screen is actually an opaque foreground color so it masked our flaws. Take that away and the discontinuity is much more flagrant:

cc3_1

Explaining the flickering and discontinuity when we push the raster trigger one line forward is a little bit more involved. The key fact I was missing was that the C64’s CPU doesn’t actually get to run all the time—at the start of every line of text, the graphics chip (the VIC-II) steals 43 cycles worth of time from the CPU, monopolizing the memory bus so that it can load all the graphics information it needs for the next line. (Because these lines mess up your display timings, they are referred to by C64 programmers as badlines.) If we look at the initial spinlock-based implemenation, we see that actually getting around to updating the background color will take between 11 cycles (if we check the raster the very cycle it changes) and 21 cycles (if it changes the cycle just after we check). On a badline, the CPU will go dormant 13 cycles after the raster changes. That means that depending on exactly how the spinlock syncs up with the display, it will change the color either before or after the full row. Furthermore, the full spinlock cycle is 10 cycles long, and that means it won’t stay synced with the display; thus on different frames or even different points in the same frame there is no consistency. Thus, the flicker.

There’s less inconsistency with the interrupt-based implementation. It relies on the KERNAL to save and restore state and ultimately hand over control, so the 43-cycle delay of the badline will always be paid. However, adding that time into the rest of the display means that the color ends up changing halfway across the screen one line down, which means we get visible display tearing and in some sense the worst of all worlds.

But with an understanding of how that timing works, it at least is no longer surprising.

From the Archives: Diffusion Chamber

One of the first programs I wrote when I was learning to develop NES stuff was an animated simulation called “Diffusion Chamber.” I had a DOS version of it, too, in some of my old childhood programs. However, neither of them actually worked properly; the display for the DOS version was completely corrupted and the NES version would occasionally allow the walls to diffuse along with the gas molecules.

I stumbled across the original inspiration for those programs recently; it was from a collection of Scientific American columns by A.K. Dewdney and was mentioned in passing with no real code or discussion:

The diffusion program I call BLEND treats the molecules of a liquid as balls that are able to move only one step horizontally or vertically at a time. A 31-by-31 array called tank holds 930 balls, which are set up initially somewhat like a checkers game….

BLEND proceeds by selecting an array position at random, then selecting a random adjacent position. If a ball occupies the first position and no ball occupies the second, the ball is erased at its current position and redrawn at the new one. If a ball is absent at the first position or present at the second, BLEND could not care less. With microseconds of time on its hands, a movable ball will sooner or later turn up. How long will it take for the two stylized liquids to diffuse completely?

I’m not really sure why I had so much trouble with this in the past. I scale it down to 15×15 to fit on the screen, but it otherwise was extremely straightforward to implement:

diffusion

How straightforward, you ask? You can fit a complete implementation in a single screen of BASIC code:

diffusion_listing

That said, I don’t recommend actually running this code. I’m not sure what kind of computer system Mr. Dewdney was running when this column was written but it had a whole lot more power than Commodore 64 BASIC. That listing is impossibly slow, with waits of multiple seconds required before a viable move is found. You could speed it up trivially by selecting a a gap by index instead of a ball by location, but that isn’t the algorithm he described.

It runs more reasonably quickly if you implement it in assembly language instead, but it doesn’t really get us any closer to answering the question it posed.

Modernizing the Code

Porting the implementation to C and making it be 31×31 instead of 15×15 is pretty trivial, but there’s a nasty complication if we want to run it for any length of time. The C standard library uses multiply-and-add PRNGs, and those have very low entropy in their lower bits and you’re not really supposed to pull more numbers out of them than the square root of their period. For drand48 that only permits about 16 million calls. Given how many random numbers we go through to find a valid move, 16 million calls hardly mixes our fluids at all.

So I went looking for a good, fast, high-period random number generator and wound up back in the xorshift space, with the “xorshift*” algorithm that, apparently, passes all the tests in the BigCrush PRNG test suite:

uint32_t xorshift64star(void)
{
    uint64_t x = rng_state;
    x ^= x >> 12; 
    x ^= x << 25; 
    x ^= x >> 27; 
    rng_state = x;
    return (x * 0x2545F4914F6CDD1DLLU) >> 32;
}

Once we have that, we need some metric for how diffused the two substances are. I decided to simply sum up the number of excess balls of each color in each column, and then average that over columns. This metric starts at 30, and then should drop over time.

I originally had hopes that this metric would approach 1, but it seems like once it gets around 3 it keeps bouncing about in the 2.5-5 range:

diffusion_graph

Based on this, I’d have to conclude that the answer is that the substances end up properly mixed after about two million cycles. From there on out it seems like the simulated Brownian motion is as likely to separate them as mix them further.

Getting More Out of the C64’s Text Mode

I implemented my first version of the Lights-Out puzzle for the C64 in 2015. I was mostly aiming to make it be small and to be as straightforward as it could possibly be. This worked, but as it has since become one of my standard practice programs, when I revisited the 2015 version it seemed very shabby.

I revisited the original program and cleaned it up a bit:

lightsout64_nicer

I think this version looks the nicest of all my current implementations now. For reference, this is what I started with:

lightsout64

This incorporates a best of the changes in presentation I made across implementation platforms:

  • Instead of using simple reverse video, which made the letters black on red, I make the text bright white on red, like I did on all other color-capable platforms.
  • Make the cells 3×3 instead of 1×1 or 3×1, following the design the ZX81 imposed by necessity, but which looked really nice.
  • Keep the grid.
  • Extend the light all the way to the grid border instead of small shapes within them like the Spectrum port used, or allowing gaps between the “light” and the divisors like on the PC or the original C64 edition.
  • A small amount of custom graphics to give a bit of a rounded edge to the game’s play cells.

Implementation

The implementation was a bit trickier than one might expect, though. On the PC and the Spectrum, the foreground and background were controllable at the level of individual character cells. The C64’s text mode allows character-level control of the foreground, but there’s only one background color and it’s applied to the entire screen.

There’s a bitmap mode that works very similarly to the Spectrum, though. We could use that. It’s also very easy to copy text into the bitmap display, because the odd memory layout of the bitmap memory will let you copy memory directly from the character ROM and get legible text out. The high-resolution bitmap display also gives direct control over the foreground and background of each cell.

So that would work, but there’s a simpler alternative. While there is normally a single background color (in $D021), there are actually three background color registers (from $D021 through $D024). These are only used in a special “extended background color mode”. We only need two background colors—black and red—so that’s perfect for us.

Well, almost. The top two bits of our character code are used to store the color of the background. That only leaves us 64 characters to use. Fortunately, that’s the C64’s screen codes, and not ASCII, or we couldn’t even use letters, but as it is, we still only get letters, numbers, and punctuation if we stick to the ordinary character ROM. As a result, drawing the grid means we have to use custom graphics to draw the grid, even if we restricted our selves to the display that we started with. As a consolation prize, at least we can keep the code that turned things inverse-video to swap between black and red backgrounds.

Download

I’ve updated the Lights Out Compilation to include this new version as well as the old.