Author Archives: mcmartin1723

iGlossolalia: Mixing All of Apple’s Officially Supported Compiled Languages

Apple’s official application ecosystem is built around four languages:

  • Swift is Apple’s flagship language. They’ve made some effort to promote and develop it as a general-purpose language, but at the time of this writing it’s only really a credible force in iOS software development and maybe also macOS.
  • Objective-C is a superset of C, adopted by Apple for OS X after Steve Jobs came back to Apple with a suitcase full of NeXTStep technology. Developers nicknamed it “Objectionable-C” for years, but Apple’s development of the language continued over the years, and by the release of the macOS 10.8 SDK in 2012, it had reached a mostly-unobjectionable final form. Two years after that, the first version of Swift would be released.
  • C is the venerable yet still omnipresent systems programming language. Objective-C is a strict and true superset of C—every C program is also a valid Objective-C program. Xcode is built on top of the clang/llvm compiler toolchain, and inherits its excellent support. Apple’s application-level APIs, however, tend to be written in Objective-C, and either have no C wrappers or only very inconvenient ones. As a rule, pure C appears in iOS projects as some portable code being incorporated from some external library.
  • C++ is in a similar position to C, but it has evolved faster and more thoroughly. The clang toolchain also has excellent C++ support—and develops the libc++ library that is the only real alternative to libstdc++ on many platforms—but Apple does not backport features into earlier routines. If you’re hoping to use C++17 features, you’re pretty much stuck targeting iOS 12 and macOS 10.13 or later.

All of these languages can share data to some degree. That’s not the exciting part—most languages provide a foreign function interface for interoperation. These, in effect, are a series of C functions and data structures that provide a uniform representation that can then be glued into the other language’s own foreign function interface. (Alternately, if the language being interoperated with is C in the first place, no additional glue is needed.) In this way, C’s (very simple) data and function call model becomes a lingua franca for many other languages.

That is not how these four languages interoperate.

These four languages can directly share data structures and call one another’s functions. This is true despite the fact that there are three memory-management schemes and two object models between them. Furthermore, to the extent that there is a lingua franca here, it is Objective-C, not C, that serves as such despite it not being the least common denominator.

In this article, I will outline how objects and memory management work in these four languages, how to construct composite objects across them, and what behavior arises when their computation models come into conflict.

A NOTE OF WARNING: Apple doesn’t guarantee much of this, and a lot isn’t even documented. I’m doing my investigations here on Xcode 10, but if it isn’t in the official documentation it probably should not be relied upon long-term. These solutions should serve just fine as stopgaps, though.

Continue reading

Atari 800: Repairing a Mad Science Type-In

Back in February I wrote three posts about writing code for the 8-bit Atari home computers and then didn’t do anything else with it. A big part of that is that I didn’t have any good ideas for things to do with the system. My usual project is the Lights Out puzzle, but when I tried to design that for the Atari 800, I ended up designing it for the Atari 2600 instead. It’s still basically true that the display I got on the 2600 looks better than any display I’ve designed for the 800 that doesn’t more or less clone it.

What I really needed were some examples of people putting the system to good use but without pushing its apparent power far beyond its design. I needed examples of what it was supposed to be doing. I dove back into the old archives and looked through the collections of sample programs, simple games, and basic utilities. I wanted a program that was short, but which made heavy use of the Atari’s unique hardware, and which did something interesting enough to be worth the effort of reverse engineering it.

What I found was Front Attack, a game by Robert J. Neep that was published in 1985 in Compute!’s Atari Collection, Volume 2. It’s a fast-moving arcade game with joystick control, sprites, and a scrolling background. It does all of these things at machine-language speeds, but it is 100% BASIC and never calls out to a machine code routine at any point. It also, however, didn’t work when I typed it in, and kept not working even after I triple-checked the listing. Enough worked that I could tell that this was going to be a program worth investigating. I decided I would debug the game and then write up the bugfixed version. I’ve now done this, and it looks pretty good in-game:


As a bonus, along the way got enough of a feel for the Atari 800’s improvements over the 2600 that I have a pretty good idea of how to make a proper Lights-Out game for it, too. We’ll see where that goes.

Below the fold, I’ll outline how it provides the various things it needs.

Continue reading

NES Splitscreens, Demoeffects, and “Undefined Behavior”

I’ve written extensively in the past about various techniques for getting effects out of old chips. This experience has very little to do with modern hardware, though, even when writing native-code applications, and even if that development is done in assembly language. Application developers nowadays write to published, third-party API standards, and if behavior is anomalous, we blame the device drivers. (This is not entirely true, of course. Back when Valve was porting Left 4 Dead 2 to Linux, they observed that an engine ported to a new platform could be expected to deliver about six frames per second. On the flip side, though, part of the improvement process involved working with hardware vendors to improve their graphics driver performance. Indeed, nVidia’s endless series of “Game Ready” drivers are primarily per-game performance upgrades. But even so, in both of these cases, the game developers are relying on a third-party, mostly-opaque abstraction—OpenGL or Direct3D for the graphics cards, and the C library’s wrapper around the OS system calls for the rest.) The huge diversity of PC hardware—even within a single company’s line of GPUs, as in nVidia’s case—makes targeting any lower level of abstraction unfeasible.

The 8-bit consoles and home computers, however, did not work this way. The supporting hardware on the system was absolutely fixed, and quirks of the hardware were preserved through the life of the console. If revisions do need to be made—the VIC-II used in the Commodore 64 had three design revisions with differences visible to the programmer—the general operating principles of the chips are very unlikely to change.

So it is in this context that I find myself lifting an eyebrow at the headline in this article: Zelda Screen Transitions Are Undefined Behaviour. The text of the article puts enough caveats on that headline that my eyebrow does lower back down, and it is also an excellent exegesis of the NES’s scrolling hardware. If you are at all interested in the seeing how the NES balances its limited memory and its more generous address space to actually construct its display with proper panning hardware—this was, after all, its major advance over the prevoius generation of home computers like the C64—this article is worth reading carefully. It also goes deep enough into the console’s internals to show how it can be confused in useful ways. The technique used is so ubiquitous later, though, that it’s reasonable to question my use of the word “confused” here—after all, if the code behaves consistently (which it does) and it was used widely within the NES canon (which it is) then how do we get away with saying that these techniques are “confusing” the chip?

We can get away with this by looking at the rest of the programming API. In particular, there are two facts that let us conclude with high confidence that the behavior described in the article, while clearly “designed in” to the chip, was not entirely intended:

  • Deploying the technique causes a graphical glitch not “predicted” by other parts of the chip’s interface. This is the two-pixel scroll anomaly the article seeks to explain.
  • The intended effect—a mid-frame alteration in the vertical scrolling value—has a natural implementation (rewrite the vertical scroll value through $2005, the same as you would during VBLANK) but this doesn’t work, despite being a situation that it explicitly handles.

These traits look familiar. In fact, these traits are both shared by the trio of techniques I describe as partial badlines on the C64. Those were definitely considered “tricks” in their own right, but they were also part of a much broader spectrum of advanced VIC-II programming techniques. If we approach the Zelda split-screen effect with the same mindset that I brought to C64 demoeffects, we get a very similar kind of summary:

  • What you do with it: Get around the PPU’s restriction on altering midline scrolling.
  • What is actually happening: The PPU’s VRAM register is overwritten by the PPU while rendering the display, and while the CPU’s writes to the actual vertical scroll value are ignored mid-frame, no other writes are. The CPU can thus alter the base address mid-frame for 8-line-precise scrolling.
  • How to do it: Rewrite the base address through $2006 midframe (ideally during HBLANK) to point to the start of the row you want to display. This will corrupt the Y scroll value to act as if you’d also scrolled down an additional two pixels.

As it happens, a modern developer would probably not do this. With more clever manipulation of the two internal address registers, complete control of mid-screen scroll state is possible. Still, that technique is more complex and may not have been developed until later in the console’s lifespan. In particular, Life Force uses a very similar technique to The Legend of Zelda but has no graphical glitches because it put its status bar at the bottom of the screen where a nonzero but fixed vertical scroll value will be unnoticeable.

Further Reading

The NESdev wiki is the central clearinghouse for NES programming information, and while it got a bit of a late start compared to the C64 in terms of deep investigation, it has long since caught up. Most relevant to the discussion here is the PPU Scrolling page which describes the interaction of the various graphics chip registers with the scrolling mechanisms in great detail, and outlines four different mechanisms for scrolling control.

Getting Off the Ground With the Other Commodore 8-bits

My Commodore work here has been restricted so far to the Commodore 64. However, that system was only one of many related systems they produced over the years:

  • The PET was the first home computer they made with a serious following. Its abilities made it a rival to systems like the TRS-80, but very little else. Still, that was enough to cement their reputation early on.
  • The VIC-20 was their first system to achieve truly widespread success. It was inexpensive and provided significant power for the price. Its existence, and price point, had largely annihilated any potential market space for the ZX81- and Spectrum-based Timex computers in the US. It was also the direct predecessor to the Commodore 64, and its physical appearance and programming model reflect that.
  • The Commodore Plus/4 and Commodore 16 were an attempt to capture the lower-end home computer and small business market, and specifically to compete with Timex computers. It had many enhancements to the BASIC that let the user make use of its graphics and sound capabilities more directly, but the graphics and sound hardware were replaced with a much simpler combined chip called the Text Editing Device, or TED. The Plus/4 also shipped with on-board productivity software. In the end, though, it appears that the market segment the Plus/4 and C16 were intended to compete with Timex in didn’t really exist in the US at all, and all of the systems were failures there. (The on-board productivity software was also poorly-received; BYTE magazine apparently described it as “an insult.”) The Plus/4s and C16s were mostly dumped into the Eastern European marketplace at huge discounts, where they did ultimately do well enough to retain a niche following and a demoscene that lasts through to the present day.
  • The Commodore 128, released a year after the Plus/4, was the true direct successor to the Commodore 64. It added a new, more PC-like video interface, a separate Z80 CPU to let it run CP/M software, twice the RAM, the extra memory mapping and control support needed to make that RAM readily usable, and a vastly more sophisticated BASIC to take command of all of these new abilities as well as its older ones. The BASIC extensions from the Plus/4 were retained, along with new control flow options. Where the extensions were not taken, new ones were added for proper high-level control of the sprites, the sound chip, and the bitmap display, and unlike the Plus/4, the 128 made no compromises compared to the 64. Unfortunately, it largely avoided compromises by including all the C64 hardware directly within it and switching into a “C64 mode” that was, from a software standpoint, nearly indistinguishable from an ordinary C64, including reimposing all the limitations that the 128 was supposed to break.

Commodore made quite a few other systems as well, from the ill-fated CBM-II through the hugely popular and influential Amiga line, but the machines above formed a clear 8-bit lineage based on the 6502 family of microprocessors and a broadly consistent firmware interface that they called the KERNAL.

The KERNAL was a list of memory locations that provided consistent ways to call into system ROM functionality independent of the particular implementation on any given system or ROM patch level. For example, $FFD2 was always the location of the CHROUT “output a character to the current output device” routine, which in turn means that the assembly language components of our Hello World program for the Commodore 64 are portable across all of these machines. In this way the KERNAL provides a functionality similar to what BIOS provided for IBM PC clones—a way for software to ignore some aspects of hardware even in the absence of a shared operating system or modular device driver facility.

But assembly language source compatibility does not translate to binary compatibility. We saw this, at one remove, when we looked at the relocating loader for Compute!‘s automatic proofreader. In the rest of this article we’ll look at the changes each system demands.

Continue reading

C64 Fat Sprite Workarounds

When writing about retro programming topics here on Bumbershoot, I’ve generally wound up oscillating between the absolute basics of getting anything to run at all and techniques to exploit weird corner cases of the hardware to let it do things it was never intended to do. This article is a more unfortunate cousin: a weird corner case that is fully-intended, but vaguely-documented at best and incredibly inconvenient when doing apparently reasonable things.

In particular, today we will be talking about scrolling sprites off the left side of the screen on the Commodore 64.

A Quick Survey of Sprites

A “sprite” is a generic name for any block of pixels that you want to animate over a background of some kind. These days a sprite is usually two triangles rendered with a partially transparent texture. In the 20th century, it was often a highly-optimized pixel-copy routine (a “blitter”, from BLT, itself an abbreviation of “block transfer”). By the 1990s this was sufficient to the needs of the age, but in 1980s hardware blitting was slow and fraught with compromises. It worked—the Commodore Plus/4, the Apple II, and the entire ZX Spectrum line got by with software blitters over a bitmapped display—but it didn’t look great and those parts of 80s home-computer game animation that we remember fondly relied on more than just a blitter.

In particular, systems like the C64, the NES, and the Atari game consoles and home computers provided sprite definition and placement capabilities directly in their graphics hardware. This put a limit on the number of elements that could be efficiently rendered, but allowed the animation to be much more practical and sophisticated for a given CPU power.

Interestingly enough, though, hardware sprites were not necessarily a strict upgrade. The Atari 2600 and NES used the fact of their hardware sprite support to make actual bitmap support completely unnecessary. The C64’s 320×200 screen requires 9KB of RAM to hold a single screen’s worth of bitmap information—the NES’s graphics (without additional support circuitry on the cartridge) only had access to 2KB of RAM and 8KB of ROM. This produced a text-like display mode, and hardware sprites are the only way to get pixel-level control over the display on the NES, and this general approach of separate tile and sprite layers was convenient and efficient enough that they not only were available on the bitmap-and-sprite-capable systems of the time (like the C64 and Atari), but saw usage in lower-power or portable devices at least through the Nintendo DS.

Continue reading

Migrating From Python 2 to 3

Python 2 is retiring on the first day of 2020. There’s even a countdown clock of doom. Python 3—the language that is replacing it—introduces some incompatible changes and normally is treated as if it were a separate language entirely. Even with only a few months of support left, thought. Python 2 remains heavily used.

This is honestly a bit odd. Python 3 is ten years old now, and Python 2 was supposed to leave support in 2015. There was a large outcry at the time that this would not provide enough time to migrate existing code, though, and it was pushed out five years to 2020. A cynic might conclude that this basically resulted in four additional years of not bothering to upgrade, but library support really did improve a lot over that time and I suspect that this time the retirement will stick.

I’ve been sticking with Python 2 in my code here on Bumbershoot mainly because macOS has shipped with Python 2 exclusively all this time, and installing Python 3 cleanly can be something of an ordeal. I can’t really put off upgrading any longer, though. Some of the Linux distros out there are already running Python 3 by default. I won’t be altering any of the code snippets I’ve put in posts here—most of them should work in both versions, anyway, since they’re usually just math. How I’ve chosen to deal with each project, however, has varied a bit.


My largest Python project is the Ophis assembler. When I first wrote it to teach myself Python, it was written for version 2.1. It has been updated over the years to take advantage of new feature (like the addition of True and False in 2.3) and it’s normally been shipped as a standalone application that contains its own interpreter. I thus chose to update Ophis to Python 3 and drop support for Python 2 systems completely.

Most of the work was done by the 2to3 tool. This script ships with Python and it automates most of the necessary changes, where library calls alter their syntax or types are subsumed into more general ones and removed. However, this wasn’t quite enough to get Ophis up and running:

  • Python-2 Ophis was not Unicode-aware, while Python 3 has all strings be Unicode by default. This was a bit of a problem because Ophis makes heavy use of strings-as-byte-sequences to represent assembled code. The strings that were really byte sequences needed to be manually annotated as such.
  • Now that byte sequences were actually, explicitly, byte sequences, it was no longer necessary to use ord and chr to translate between bytes-as-integers and bytes-as-characters as needed.
  • The range and map functions are now “lazy”—they don’t produce values until those values are asked for as part of some other computation. Under the hood, they are now returning iterator objects instead of collections. That’s usually fine—in fact, it’s usually a savings in both space and time—but I had a few places where I really needed an actual collection and not just a series of values to be returned later. I thus needed to pass the result of map or range the list constructor to force it to be properly realized.
  • My one use of a Windows-specific API needed had been changed incompatibly and I needed to modify it.
  • I had an unfortunate tendency to open files with the file constructor instead of the open function, and in Python 3 only the latter works without some extra configuration that I generally shouldn’t be bothering to do. I needed to change those as well.

The Bumbershoot Scripts

According to Github’s statistics, 3.8% of the code in the Bumbershoot Software is in Python. These are all in that uncomfortable space between one-shot scripts and actual reusable distributable programs. I’d like them to be trivially runnable anywhere, but I also don’t want to have to actually impose any kind of installation step. Ideally a user should be able to basically just throw a single file anywhere and have it work.

The earliest versions of Python 3 were released alongside new versions of Python 2, and the languages were much further apart at that point. With the release of 3.2 the developers made an effort to ensure that there was some subset of the language that would behave identically in both Python 2 and 3. It’s possible to import things from the notional __future__ module in Python 2 to make 2.6 or 2.7 behave more like Python 3, which helps for some of the more unavoidable differences in default behavior.

My goal then for these was to update the scripts so that they would run happily ignorant of which version of Python was actually running them. Python 2.7 is nine years old at this point and I have no qualms about insisting on that version if Python 2 is to be used at all.

Half my scripts are, broadly speaking, “linking” scripts—their job is to either consume or emit binary blobs and decorate them appropriately so that they can be fed to emulators or into actual hardware. These will suffer from the same kinds of issues regarding binary and Unicode strings that we saw in Ophis, but there is an extra wrinkle now—getting a byte out of a Python 2 string requires use of the ord function, but getting a byte out of a Python 3 bytestring forbids it. This remains true even though 2.7 and 3.x both accelt b'ABC' as representing the byte sequence 65-66-67.

My solution for this was to rely on a different builtin data type that is shared between the versions: bytearray. This behaves in a manner very similar to Python 3 byte strings in both Python 2 and 3. Furthermore, the bytearray type is mutable, and while mutability often makes programs harder to reason about, here it actually makes things like inserting checksums into binary images after the fact far easier.

Some of these scripts also relied on doing division, which had its behavior change between 2 and 3—in Python 3, 5/2 yields 2.5, while in Python 2 it truncates to 2. Python 2.7 will give you the Python 3 behavior if you import division from __future__, but in my case I generally wanted truncating division because I was generally trying to count out integral numbers of file blocks or something. Fortunately, both 2.7 and 3.x offer // as an integer-divide operator that behaves like Python 2. I had already been using that in Ophis, and I mirrored it in these scripts.

The scripts that didn’t deal with binary files were all content generation scripts that operated purely on strings and lists and floating point numbers, and those became runnable in both 2.7 and 3.x simply by running 2to3 on them and requiring no additional changes.

Going Forward

My plan for future posts where I make use of scripting languages is to provide them in Python 3. Tools that are intended to be generally useful will stick to hybrid 2.7/3.x as before, but tools that would be useful on older machines themselves might instead be provided in C so that they could be compiled to run on the older systems. I’d already done this with bin2asmx earlier, but I suspect I won’t be making a habit of it.

Implementing SHA-256 on the 6502

After five implementations of various algorithms built mostly around bit shuffling, how about a properly modern, common, useful routine? We just need to find something that has a very similar structure and then we can deploy the knowledge we’ve learned in those other domains.

SHA-2 fits this bill neatly, and Wikipedia even has handy pseudocode we can base an implementation on. In looking at the implementations and specifications, it also seems like SHA-256 is a relatively good fit for the 6502, and may well qualify as the most sophisticated modern algorithm that the 6502 can still comfortably handle. We don’t really need to worry about endianness or word size—since we’re an 8-bit chip we can do multibyte mathematics in any direction we want and the size of the algorithm’s words really only alters our loop bounds—but SHA-256’s largest indexed elements are the round constants and the schedule arrays, each of which are 64 32-bit words long. That translates out to tables exactly 256 bytes in size, which is exactly the size of a 6502 memory page and thus the largest amount of memory that can be efficiently indexed.

Initial Implementation

Translating the pseudocode from the wikipedia article into code was not terribly challenging. I pre-defined tables of the necessary constants in the algorithm, and then set aside a fixed block of RAM to store all the working state that changes as a sum is computed. (In a modern system, we would probably instead operate on a pointer to arbitrary blocks of RAM, but the 6502 is much, much better at operating on blocks of memory at fixed addresses than it is at working with pointers, and I wanted to give the poor CPU every edge it could get.) I then defined several 4-byte blocks of memory for use as accumulators—I defined a variety of small functions that would do 32-bit big-endian math based on the contents of the accumulators and which could store results into and out of the working state as needed. There were a few awkward bits here—I ended up replicating a few functions because the 256-byte index limit got inconvenient at a few points—but nothing about the implementation of the algorithm really ended up having to deviate much from the pseudocode. So the implementation step was quite simple and easy, much as I had hoped.

However, the end result did not come out the way I had hoped it would. While it did give the correct answers to my test vectors, it took 580 milliseconds to consume a single block—nearly 10 milliseconds per byte! I wasn’t willing to push this too far—after all, these algorithms are tuned to be feasible only on modern chips—but we can surely do better than this.

Continue reading