I’ve been thinking about type-in programs again. In particular, I’ve been thinking about one of the features many magazines and books provided for type-in programs that I never actually saw back when I was a youth typing programs in: automatic proofreader programs that would provide verification codes for the program as you typed it in, thus saving you multiple passes through the program trying to figure out why it was giving you wrong answers.
In poking around through the Internet Archive’s collections, I’ve found three of note and in this article I’ll be picking them apart.
SWAT: Strategic Weapon Against Typos
I encountered the SWAT system from publications associated with SoftSide Magazine, which focused on the TRS-80, the Apple, and the Atari home computers. These have generally been a bit before the time I focus on, though I really do owe the Atari home computers a more thorough investigation. The earliest attestation of the system I’ve found is in the June 1982 issue, and it provides implementations for all three of its target systems.
SWAT was a program intended to be appended to the program that it was to check; one would then start running the program from that point instead of running the program proper. It would then compute a simple checksum of every byte in the program by adding them up and then printing them out in groups. You would then check these codes against a separate, shorter listing that provided the codes a correct listing would produce. If they didn’t match, one edited the program until they did.
This is somewhat interesting because this is much closer to how we would organize such a utility in this day and age. The program would be read in, and a SWAT code table would be printed out. The other systems we will see in this article essentially modify the code editor and require checking as one types.
SWAT takes three parameters: the boundaries of the program to check, the maximum number of lines per chunk (default 12), and the target number of bytes per chunk (default 500). It then scans through the program as it exists in memory, producing a running sum of every byte in the program, modulo 676. Once it reaches the end of a line, it checks to see if this is the maximum number of lines, or if the byte target has been exceeded. If it is, it emits a line on the SWAT table indicating the range of lines, the total number of bytes, and the resulting sum. Instead of printing the sum as a number between 0 and 676, it emits it as two letters. (676 is, after all, 26*26.) The first letter is the “tens digit” of the result.
One interesting thing about this is that it does not operate on the actual text the user typed. The BASICs for these three systems analyze and reformat the instructions so that they may be executed more efficiently at run time (a process that documentation of the time often called crunching, but which modern writers would call tokenizing), and it is the tokenized form of the program that is summarized. This meshes extremely well with Applesoft BASIC, because its tokenizer actually also removes all user-supplied formatting, which means that all program lines are actually converted into a single canonical form. The TRS-80 preserved all user formatting, which meant that the program had to be entered by the user exactly as printed to match the SWAT codes. The Atari systems were particularly unusual—they normalized input lines like Apple did, but some quirks of its tokenization process meant that how lines were tokenized would depend on the order in which they were entered, so skipping around in a program while entering it or editing typos along the way could actually corrupt your SWAT codes. Fortunately, there was a procedure for normalizing a program, and so SWAT simply required users to perform this procedure before running any checks.
As a checksum, this mostly did what it needed to, but it wasn’t ideal. In addition to its false positives, a simple sum of bytes will not catch transposition of characters, and for programs with a lot of DATA statements, this was the most dangerous and difficult-to-identify problem that a user was likely to cause. Summing the collapsed tokens, however, did mean that any misspelling of a word BASIC recognized would be immediately obvious, altering not only the final sum but even the length of the line. For the kinds of programs that SoftSide tended to publish, this was entirely adequate, though. Their programs tended to be pure BASIC and would not have large amounts of machine code or graphical data within them.
That privilege would go to Compute!’s Gazette, which focused on the Commodore line, which also required much more aggressive use of machine code and memory-mapped I/O to function.
Compute!’s Automatic Proofreader (1983-1986)
Compute!’s Gazette started out as a magazine for the VIC-20 and the Commodore 64. In October 1983 they introduced a pair of programs that would provide proofreading support for automatic proofreading. The tighter focus of the magazine—and the close similarity of the operating systems of the two machines, even at the binary level—allowed the editors to provide tools that hooked much more deeply with the machine.
All the Commodore 8-bit computers provided a uniform interface for basic I/O operations, and also provided a number of points where they user could replace core functionality with custom routines. This low-level interface—which they called the KERNAL—allowed a lot of work to be done at the machine code level and still run acceptably across the entire line.
This program worked by copying itself into a block of memory that was only used for tape I/O and which was commonly used by BASIC programs as scratch space for small machine language programs. A simple BASIC loader copied it into place and then ran a routine that attached the bulk of the program to the KERNAL’s character-input routine. This routine, interestingly, wasn’t called when the user pressed a key; instead, once a line had been entered, the screen-editor logic decided which part of the screen constituted that line and then provided the contents of that line as input, followed by the RETURN key that kicked it all off.
This proofreader would read characters and add their codes to a running 8-bit total, wrapping around as necessary, and ignoring spaces. When the return key was detected, it would stash the output state, then move the cursor to the upper left corner of the screen, print out the final sum (from 0 to 255), and then set the cursor back the way it was. As a checksumming algorithm, this had the same problems with not detecting transposition of characters that SWAT did, and it also was less reliable about misspelled keywords (since this scan was happening before tokenization). On the plus side, a new code was generated for every line of text and you could check your work as you typed, or list an entire block and check it by going to the top of the program block and repeatedly pressing RETURN to evaluate each line.
Early versions of the proofreader had two editions, one for the VIC-20 and one for the Commodore 64, but the only actual difference between the versions was that they called a routine in the BASIC ROM to convert the byte into a decimal number, and the BASIC ROM was mapped to a different part of memory in the two machines. The API for the functions was identical, and indeed the BASICs were so similar that this was the same routine, in the end.
Ultimately later editions of this proofreader unified the two versions and usde the actual original value of the “character read” routine that the proofreader hooked itself up to as a switch to decide where to call to print a decimal number. This added a dozen bytes or so to the final program but even on the extremely cramped VIC-20 this was a cost that could be easily paid.
However, the tighter binding to the operating system produced some unique drawbacks as well. The CHRIN routine the proofreader extended was actually called for all kinds of character input, not just program lines. As a result, running a program with the proofreader active would have it corrupt the program’s display with handy check codes for every response the user gives to an INPUT statement. Worse, it would do the same for textual data read off of the disk or tape. Of course, the tape wouldn’t have time to do any reading; once the tape routines started using their temporary storage, this would trash the memory holding the proofreader, and the system would begin trying to execute random temporary data as code and probably crash extremely hard.
Compute!’s Automatic Proofreader (1986-)
Over the next few years, Compute!’s Gazette got more and more sophisticated programs in its lineup—many approaching or exceeding commercial quality—and it also got several more systems it needed to support. In February 1986, they updated their proofreader to use a more sophisticated technique. While they were at it, they also addressed all the shortcomings I listed above.
The most difficult issue to address was where to put the proofreader so that it would not be touched by the running system during normal operation. They fixed this by pushing the start of BASIC’s program memory forward 256 bytes and using the space freed for that as dedicated to the proofreader. However, this was a different place in memory for the five machines they supported, so they also needed to patch the program after loading so that the addresses pointed to the right place. The necessary information for patching turns out to be largely supplied in a portable way by the KERNAL, so this is not as heinous as it sounds, but it does still require the most sophisticated BASIC loader I have seen.
The other system-specific issues were solved by extending the “tokenize a line of BASIC text” function instead of the “read a character” function. This also lets the proofreader intervene less frequently and lets it process an entire lin eof text at once, guaranteed. User input and file I/O aren’t intercepted, and with the program relocated to the old start of BASIC RAM, tape I/O works fine too.
The final—and, for the user, the most important—change was to use a more sophisticated checksum algorithm that can actually reliably flag swapped characters and make it much less likely for typos to cancel each other out:
- The checksum is a 16-bit unsigned integer, and its initial value is the line number being processed.
- The line is preprocessed by removing all spaces that are not between quotes. So, for instance, 10 PRINT "HELLO WORLD!" becomes 10PRINT"HELLO WORLD!"
- Add the byte value of each character to the checksum, but before adding it, multiply it by its position in the line after extraneous blanks are removed. So, for our sample line, the checksum starts at 10, then gets 49*1 and 48*2 added for the line number 10, then 80*3 for the P in PRINT, and so on.
- XOR the high and low bytes of the checksum together to produce the final 8-bit checksum.
- Express the checksum as a two-letter code. This is basically a two-digit hexadecimal number, but the least significant digit comes first and instead of using the traditional 0123456789ABCDEF digits, it instead uses the letters ABCDEFGHJKMPQRSX.
This scheme was sufficiently effective that they never modified it afterwards and it continued in use until Compute! stopped publishing type-in programs in the early 1990s. That is a solid pedigree.
After the jump, I will dissect the sophisticated BASIC loader that was used to make the same core program work on five different computer models, and then present my reconstruction of the proofreader itself.