Author Archives: mcmartin1723

Getting Off the Ground With the Other Commodore 8-bits

My Commodore work here has been restricted so far to the Commodore 64. However, that system was only one of many related systems they produced over the years:

  • The PET was the first home computer they made with a serious following. Its abilities made it a rival to systems like the TRS-80, but very little else. Still, that was enough to cement their reputation early on.
  • The VIC-20 was their first system to achieve truly widespread success. It was inexpensive and provided significant power for the price. Its existence, and price point, had largely annihilated any potential market space for the ZX81- and Spectrum-based Timex computers in the US. It was also the direct predecessor to the Commodore 64, and its physical appearance and programming model reflect that.
  • The Commodore Plus/4 and Commodore 16 were an attempt to capture the lower-end home computer and small business market, and specifically to compete with Timex computers. It had many enhancements to the BASIC that let the user make use of its graphics and sound capabilities more directly, but the graphics and sound hardware were replaced with a much simpler combined chip called the Text Editing Device, or TED. The Plus/4 also shipped with on-board productivity software. In the end, though, it appears that the market segment the Plus/4 and C16 were intended to compete with Timex in didn’t really exist in the US at all, and all of the systems were failures there. (The on-board productivity software was also poorly-received; BYTE magazine apparently described it as “an insult.”) The Plus/4s and C16s were mostly dumped into the Eastern European marketplace at huge discounts, where they did ultimately do well enough to retain a niche following and a demoscene that lasts through to the present day.
  • The Commodore 128, released a year after the Plus/4, was the true direct successor to the Commodore 64. It added a new, more PC-like video interface, a separate Z80 CPU to let it run CP/M software, twice the RAM, the extra memory mapping and control support needed to make that RAM readily usable, and a vastly more sophisticated BASIC to take command of all of these new abilities as well as its older ones. The BASIC extensions from the Plus/4 were retained, along with new control flow options. Where the extensions were not taken, new ones were added for proper high-level control of the sprites, the sound chip, and the bitmap display, and unlike the Plus/4, the 128 made no compromises compared to the 64. Unfortunately, it largely avoided compromises by including all the C64 hardware directly within it and switching into a “C64 mode” that was, from a software standpoint, nearly indistinguishable from an ordinary C64, including reimposing all the limitations that the 128 was supposed to break.

Commodore made quite a few other systems as well, from the ill-fated CBM-II through the hugely popular and influential Amiga line, but the machines above formed a clear 8-bit lineage based on the 6502 family of microprocessors and a broadly consistent firmware interface that they called the KERNAL.

The KERNAL was a list of memory locations that provided consistent ways to call into system ROM functionality independent of the particular implementation on any given system or ROM patch level. For example, $FFD2 was always the location of the CHROUT “output a character to the current output device” routine, which in turn means that the assembly language components of our Hello World program for the Commodore 64 are portable across all of these machines. In this way the KERNAL provides a functionality similar to what BIOS provided for IBM PC clones—a way for software to ignore some aspects of hardware even in the absence of a shared operating system or modular device driver facility.

But assembly language source compatibility does not translate to binary compatibility. We saw this, at one remove, when we looked at the relocating loader for Compute!‘s automatic proofreader. In the rest of this article we’ll look at the changes each system demands.

Continue reading

Advertisements

C64 Fat Sprite Workarounds

When writing about retro programming topics here on Bumbershoot, I’ve generally wound up oscillating between the absolute basics of getting anything to run at all and techniques to exploit weird corner cases of the hardware to let it do things it was never intended to do. This article is a more unfortunate cousin: a weird corner case that is fully-intended, but vaguely-documented at best and incredibly inconvenient when doing apparently reasonable things.

In particular, today we will be talking about scrolling sprites off the left side of the screen on the Commodore 64.

A Quick Survey of Sprites

A “sprite” is a generic name for any block of pixels that you want to animate over a background of some kind. These days a sprite is usually two triangles rendered with a partially transparent texture. In the 20th century, it was often a highly-optimized pixel-copy routine (a “blitter”, from BLT, itself an abbreviation of “block transfer”). By the 1990s this was sufficient to the needs of the age, but in 1980s hardware blitting was slow and fraught with compromises. It worked—the Commodore Plus/4, the Apple II, and the entire ZX Spectrum line got by with software blitters over a bitmapped display—but it didn’t look great and those parts of 80s home-computer game animation that we remember fondly relied on more than just a blitter.

In particular, systems like the C64, the NES, and the Atari game consoles and home computers provided sprite definition and placement capabilities directly in their graphics hardware. This put a limit on the number of elements that could be efficiently rendered, but allowed the animation to be much more practical and sophisticated for a given CPU power.

Interestingly enough, though, hardware sprites were not necessarily a strict upgrade. The Atari 2600 and NES used the fact of their hardware sprite support to make actual bitmap support completely unnecessary. The C64’s 320×200 screen requires 9KB of RAM to hold a single screen’s worth of bitmap information—the NES’s graphics (without additional support circuitry on the cartridge) only had access to 2KB of RAM and 8KB of ROM. This produced a text-like display mode, and hardware sprites are the only way to get pixel-level control over the display on the NES, and this general approach of separate tile and sprite layers was convenient and efficient enough that they not only were available on the bitmap-and-sprite-capable systems of the time (like the C64 and Atari), but saw usage in lower-power or portable devices at least through the Nintendo DS.

Continue reading

Migrating From Python 2 to 3

Python 2 is retiring on the first day of 2020. There’s even a countdown clock of doom. Python 3—the language that is replacing it—introduces some incompatible changes and normally is treated as if it were a separate language entirely. Even with only a few months of support left, thought. Python 2 remains heavily used.

This is honestly a bit odd. Python 3 is ten years old now, and Python 2 was supposed to leave support in 2015. There was a large outcry at the time that this would not provide enough time to migrate existing code, though, and it was pushed out five years to 2020. A cynic might conclude that this basically resulted in four additional years of not bothering to upgrade, but library support really did improve a lot over that time and I suspect that this time the retirement will stick.

I’ve been sticking with Python 2 in my code here on Bumbershoot mainly because macOS has shipped with Python 2 exclusively all this time, and installing Python 3 cleanly can be something of an ordeal. I can’t really put off upgrading any longer, though. Some of the Linux distros out there are already running Python 3 by default. I won’t be altering any of the code snippets I’ve put in posts here—most of them should work in both versions, anyway, since they’re usually just math. How I’ve chosen to deal with each project, however, has varied a bit.

Ophis

My largest Python project is the Ophis assembler. When I first wrote it to teach myself Python, it was written for version 2.1. It has been updated over the years to take advantage of new feature (like the addition of True and False in 2.3) and it’s normally been shipped as a standalone application that contains its own interpreter. I thus chose to update Ophis to Python 3 and drop support for Python 2 systems completely.

Most of the work was done by the 2to3 tool. This script ships with Python and it automates most of the necessary changes, where library calls alter their syntax or types are subsumed into more general ones and removed. However, this wasn’t quite enough to get Ophis up and running:

  • Python-2 Ophis was not Unicode-aware, while Python 3 has all strings be Unicode by default. This was a bit of a problem because Ophis makes heavy use of strings-as-byte-sequences to represent assembled code. The strings that were really byte sequences needed to be manually annotated as such.
  • Now that byte sequences were actually, explicitly, byte sequences, it was no longer necessary to use ord and chr to translate between bytes-as-integers and bytes-as-characters as needed.
  • The range and map functions are now “lazy”—they don’t produce values until those values are asked for as part of some other computation. Under the hood, they are now returning iterator objects instead of collections. That’s usually fine—in fact, it’s usually a savings in both space and time—but I had a few places where I really needed an actual collection and not just a series of values to be returned later. I thus needed to pass the result of map or range the list constructor to force it to be properly realized.
  • My one use of a Windows-specific API needed had been changed incompatibly and I needed to modify it.
  • I had an unfortunate tendency to open files with the file constructor instead of the open function, and in Python 3 only the latter works without some extra configuration that I generally shouldn’t be bothering to do. I needed to change those as well.

The Bumbershoot Scripts

According to Github’s statistics, 3.8% of the code in the Bumbershoot Software is in Python. These are all in that uncomfortable space between one-shot scripts and actual reusable distributable programs. I’d like them to be trivially runnable anywhere, but I also don’t want to have to actually impose any kind of installation step. Ideally a user should be able to basically just throw a single file anywhere and have it work.

The earliest versions of Python 3 were released alongside new versions of Python 2, and the languages were much further apart at that point. With the release of 3.2 the developers made an effort to ensure that there was some subset of the language that would behave identically in both Python 2 and 3. It’s possible to import things from the notional __future__ module in Python 2 to make 2.6 or 2.7 behave more like Python 3, which helps for some of the more unavoidable differences in default behavior.

My goal then for these was to update the scripts so that they would run happily ignorant of which version of Python was actually running them. Python 2.7 is nine years old at this point and I have no qualms about insisting on that version if Python 2 is to be used at all.

Half my scripts are, broadly speaking, “linking” scripts—their job is to either consume or emit binary blobs and decorate them appropriately so that they can be fed to emulators or into actual hardware. These will suffer from the same kinds of issues regarding binary and Unicode strings that we saw in Ophis, but there is an extra wrinkle now—getting a byte out of a Python 2 string requires use of the ord function, but getting a byte out of a Python 3 bytestring forbids it. This remains true even though 2.7 and 3.x both accelt b'ABC' as representing the byte sequence 65-66-67.

My solution for this was to rely on a different builtin data type that is shared between the versions: bytearray. This behaves in a manner very similar to Python 3 byte strings in both Python 2 and 3. Furthermore, the bytearray type is mutable, and while mutability often makes programs harder to reason about, here it actually makes things like inserting checksums into binary images after the fact far easier.

Some of these scripts also relied on doing division, which had its behavior change between 2 and 3—in Python 3, 5/2 yields 2.5, while in Python 2 it truncates to 2. Python 2.7 will give you the Python 3 behavior if you import division from __future__, but in my case I generally wanted truncating division because I was generally trying to count out integral numbers of file blocks or something. Fortunately, both 2.7 and 3.x offer // as an integer-divide operator that behaves like Python 2. I had already been using that in Ophis, and I mirrored it in these scripts.

The scripts that didn’t deal with binary files were all content generation scripts that operated purely on strings and lists and floating point numbers, and those became runnable in both 2.7 and 3.x simply by running 2to3 on them and requiring no additional changes.

Going Forward

My plan for future posts where I make use of scripting languages is to provide them in Python 3. Tools that are intended to be generally useful will stick to hybrid 2.7/3.x as before, but tools that would be useful on older machines themselves might instead be provided in C so that they could be compiled to run on the older systems. I’d already done this with bin2asmx earlier, but I suspect I won’t be making a habit of it.

Implementing SHA-256 on the 6502

After five implementations of various algorithms built mostly around bit shuffling, how about a properly modern, common, useful routine? We just need to find something that has a very similar structure and then we can deploy the knowledge we’ve learned in those other domains.

SHA-2 fits this bill neatly, and Wikipedia even has handy pseudocode we can base an implementation on. In looking at the implementations and specifications, it also seems like SHA-256 is a relatively good fit for the 6502, and may well qualify as the most sophisticated modern algorithm that the 6502 can still comfortably handle. We don’t really need to worry about endianness or word size—since we’re an 8-bit chip we can do multibyte mathematics in any direction we want and the size of the algorithm’s words really only alters our loop bounds—but SHA-256’s largest indexed elements are the round constants and the schedule arrays, each of which are 64 32-bit words long. That translates out to tables exactly 256 bytes in size, which is exactly the size of a 6502 memory page and thus the largest amount of memory that can be efficiently indexed.

Initial Implementation

Translating the pseudocode from the wikipedia article into code was not terribly challenging. I pre-defined tables of the necessary constants in the algorithm, and then set aside a fixed block of RAM to store all the working state that changes as a sum is computed. (In a modern system, we would probably instead operate on a pointer to arbitrary blocks of RAM, but the 6502 is much, much better at operating on blocks of memory at fixed addresses than it is at working with pointers, and I wanted to give the poor CPU every edge it could get.) I then defined several 4-byte blocks of memory for use as accumulators—I defined a variety of small functions that would do 32-bit big-endian math based on the contents of the accumulators and which could store results into and out of the working state as needed. There were a few awkward bits here—I ended up replicating a few functions because the 256-byte index limit got inconvenient at a few points—but nothing about the implementation of the algorithm really ended up having to deviate much from the pseudocode. So the implementation step was quite simple and easy, much as I had hoped.

However, the end result did not come out the way I had hoped it would. While it did give the correct answers to my test vectors, it took 580 milliseconds to consume a single block—nearly 10 milliseconds per byte! I wasn’t willing to push this too far—after all, these algorithms are tuned to be feasible only on modern chips—but we can surely do better than this.

Continue reading

Some Notes on Player Complicity

A lot of the earlier articles on Bumbershoot Software were written “from my old notebooks”—old notes or forum posts or conversations that I’d kept around and then worked into full articles. That’s fine for retrocoding stuff, where the state of the art moves fairly slowly, but it’s not useful for fields that are still rapidly innovating. My old notebooks still have a lot of material from my 2004-2014 span being involved with the interactive fiction community, but a lot of the figures there are now reasonably well known names in academia and industry, so much of my notes on what we’d now put under the umbrella of “interactive narrative” are extremely outdated. I do sporadically check in on Nick Montfort’s and Emily Short’s blogs to sample the academic and avant-garde commercial viewpoints in the space, but even there I’ve drifted far enough away from both the state of the art and the goals of the craft that there isn’t much I can talk about or grapple with.

However, Emily Short has a new post up on “Choice Poetics” that struck a chord. Back when I’d write nine or ten thousand words each year about the annual Interactive Fiction Competition, I ended up refining some principles and approaches to the entries in the competition that ended up serving me pretty well in subsequent years when approaching computer games more generally. The papers linked in that article—particularly the one formally analyzing Papers, Please—have forced me to re-evaluate some of those old conclusions.

So, let’s talk about complicity in video games. I’m using this term in both a narrow and a broader sense. In the narrow sense, it’s talking about things a game can do to make a player feel bad about atrocities they commit or have their character commit within a game. More broadly, I want to move this beyond the crimes that the word “complicity” suggests and consider situations where the player feels responsible for their actions within or to a game world.

I first attempted to formalize this in 2010 when reviewing Jason McIntosh’s The Warbler’s Nest, because it was brief and extremely pointed and thus crystallized for me what made me feel like it was laying the responsibility for the events of the game directly at my feet. To wit, if a player is to feel responsible for an atrocity:

  1. The atrocity must be optional. Otherwise the player is no more responsible for these events than a reader is made responsible for the events in a story by the act of turning a page.
  2. The atrocity must be the easy path. Otherwise it’s just Bonus Cruelty you have to work harder for.
  3. The player must know the stakes. Otherwise their actions do not have the necessary import. Lock a player in a room with two unlabeled buttons, and make it clear that the game will do nothing else until one of the two buttons is pressed, and the player will feel no responsibility for the result. Indeed, this is only fair; were such a situation to obtain in real life, the responsibility for the consequences of this action would be firmly on the head of whoever rigged up the buttons, or locked the button-pusher in the room.

That was almost a decade ago. By now, I don’t think any of these hold up as well as they did when I originally devised them. Here are the main problems or complications that, I think, have emerged with it:

  • The player’s interaction mode may be too far removed for the analysis to apply. If game offers a predefined good path and evil path, the default assumption for many players is that they are expected to be completionist. The atrocities on the “evil path” may be optional, but if the player seeks to see the whole of the game’s content—particularly if certain maps, challenges, characters, or other such content is gated behind progress within it—they cease being optional when in pursuit of that goal. More simply, a player who has decided to rain meteors and dinosaur attacks upon their SimCity metropolis will feel no more remorse about this than they would for knocking over a tower of Jenga sticks. A player interacting with a game as an artifact to experiment with or as a library to exhaust has already made a decision that removes them from the analysis.
  • People feel responsible for their actions in real life even if they didn’t know the stakes. Blind or actively deceptive choices still fail, but there’s a lot more leeway here than I had originally formulated. Allowing the player character to feeding a starving child a peanut butter sandwich and then rewarding them by having child die in front of them from anaphylactic shock would be a rather mean thing to do, but a player might feel some responsibility for the action, since they acted without some obvious information (the child’s peanut allergy) that they in principle “should” have known. On the other hand, one could imagine a hospital/surgery simulation where the epilogue reveals that the player character was not a surgeon at all, but a deluded lunatic who was just cutting people up in a barn. In such a game the player would be unlikely to feel even fleeting remorse for the actions presented at the time as saving lives.
  • If an atrocity is difficult enough to perform, the analysis breaks down completely. This is hard to explain without actually working through examples, so I’ll defer that to my worked examples before.

Below the fold, I’ll work through a number of games and show how the original and the modified analyses interact with it. That also means I will, by necessity, be spoiling some of the narrative mechanics in them, so I will list the games here as the spoiler warning. Proceed at your own risk if you have not played Utlima IV, Bioshock, The Warbler’s Nest, Spec Ops: The Line, Fallen London, Papers Please, and Undertale. I’ll try to be vague about exact details that are not commonly discussed, but it’s still going to require going further than I’d like without spoiler warnings.

Continue reading

Working with DLLs on Windows, Mac, and Linux

So.

Let’s talk about dynamic libraries.

How to write them, how to build them, how to connect them to your programs.

How to get a result that actually keeps working once you send your program off to someone else.

Exactly how this stuff works, and what you need to do, varies depending on your operating system and build tools. There are four major cases I plan to cover:

  • Building on Linux with gcc or clang
  • Building on Mac with clang
  • Building on Windows under MSYS2 with gcc or clang
  • Building on Windows with Visual Studio

Irrespective of your operating system, gcc and clang often end up behaving very similarly, and clang also has a mode called clang-cl that apparently mimicks the command-line interface of the Visual Studio tools.

There are two semi-major cases that I’m not going to cover: directly using the Visual Studio command-line tools, and the BSD-like systems, which have a different binary format from the three OSes I am covering.

I’m skipping the Visual Studio command line tools because they are verbose, arcane, and exceptionally easy to find documentation for, assuming you can make it happen in Visual Studio itself.. If it is very important to track down the relevant flags for some build, Visual Studio’s Preferences dialog boxes are extremely transparent about what each entry in each dialog box controls, listing the relevant command-line options either in the description of the setting or alongside each possible option. I’ll only bring it into play if there’s something that’s so much easier to do with the CLI that you’d normally break out of Visual Studio to do it.

BSD, on the other hand, I’m skipping because I know basically nothing about it other than that sometimes it is surprisingly and distressingly different. My advice for gcc/clang on Linux or Mac may perhaps be applicable.

Some Words of Encouragement Before The Abyss

Getting a program to load and run while depending on shared libraries is not generally considered a difficult problem. Indeed, it has been a solved problem for decades. As such, the purpose of this article is twofold.

  1. To be an example of what exactly it means for something to be “a solved problem for decades.” One gets the impression that the details of this stuff are not widely known not because they’re too simple to bother explaining, but because people have been aggressively forgetting it as hard as they can once it works well enough that they can walk away.
  2. To be a usable tutorial for people who have ended up on the rusted pointy end of building or distributing software that has custom shared libraries in it. They may be suffering impostor syndrome, wondering how they could be screwing up such an obviously basic thing. Let my barely-contained ranting reassure you, and with luck the actual instructions in here may get you where you need to be.

Continue reading

Fun With Symbol Collision

One of the basic rules of C is that you don’t get to use the same function name twice.

If I write a file with some function in it, say, lib1a.c:

void my_awesome_function(void)
{
    printf("This is my awesome function!\n");
}

And then I write another file lib1b.c with a different implementation of the same function:

void my_awesome_function(void)
{
    printf("This is my ENTIRELY DIFFERENT awesome function!\n");
}

Then if I write a simple main program to call that function…

#include "lib1.h"

int main(int argc, char **argv)
{
    my_awesome_function();
    return 0;
}

It won’t know which one I mean, and when I try to build the program I will get an error message.

/usr/bin/ld: /tmp/cc30kjUg.o: in function `my_awesome_function':
lib1b.c:(.text+0x0): multiple definition of `my_awesome_function'; /tmp/cce04mSV.o:lib1a.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

But this reply reveals that I wasn’t quite accurate in my first sentence there. This is not a basic rule of C. It’s a basic rule of C’s linker. That linker is shared with many other languages, so similar rules end up applying to assembly language as well. Languages like C++ work around this by having the function names in the program not quite match up with the function names that the linker gets to see: C++ is said to “mangle” its symbols on the way to the linker.

But even without that, we have a little leeway. The linker still isn’t as strict as that. It only requires that each specific linker product not export the same symbol twice. This means that C can define the same function multiple times as long as it uses the static directive to constrain it to a single file. The C compiler entirely conceals these functions from the linker, so it never gets a chance to notice the duplication.

Linking the Same Symbol Twice, on Purpose

We can be a little crazier, too. After all, a full program is usually made of many individual linker products. We can create two dynamic libraries that define the same symbol, and we will be permitted to link one program against both libraries simultaneously by Mac, Linux, and Windows.

What happens in these cases is, notionally, system-dependent, but when you build with GCC or Clang, the library we list first when we name our libraries is the one that gets priority in the command-line build.

Linking the Same Symbol Twice, by Accident

We’re pretty unlikely to actually run into the situation we described above in a real system, though. Here’s a more insidious example. Let’s keep our lib1a and lib1b libraries as we had them—but now let’s create lib2a and lib2b libraries to go with them. Here is lib2a.c:

#include "lib1.h"

void function1(void)
{
    my_awesome_function();
}

And here is lib2b.c:

#include "lib1.h"

void function2(void)
{
    my_awesome_function();
}

If we then link lib2a against lib1a and lib2b against lib1b, each of these libraries looks nicely independent; neither exports any symbols that conflict with anybody else, and if we have a main program that looks like this:

#include "lib2.h"

int main(int argc, char **argv)
{
    function1();
    function2();
    return 0;
}

We won’t even list anything other than the apparently-non-conflicting 2a and 2b libraries. We would expect, when we ran this, to get output that looks like this:

This is my awesome function!
This is my ENTIRELY DIFFERENT awesome function!

And, indeed, on Windows and Mac, we do. But on Linux, we don’t; we get two copies of the same sentence, and which sentence we get depends on whether we linked 2a in first or 2b. What gives?

One- vs. Two-Level Namespaces

What gives is that Linux uses a one-level namespace for its linking and loading, and Mac and Windows do not. This is a fairly subtle distinction that almost never matters, but it does here so let’s dig into it a bit.

In a two-level linking scheme, the name of the library a symbol is from becomes, effectively, part of the symbol. Both the PE32 format used by Windows and the Mach-O format used by macOS have this mechanism, and are fairly explicit about it—Windows stack traces name their functions things like kernel32.dll!CreateFileW, and the otool -Lv command on macOS will enumerate any library’s dependencies very precisely. (Usually, given the way macOS applications are expected to work, it’s too precise for my tastes, but that is a tale for another time.)

Linux, on the other hand, uses the ELF format, which provides a single-level (or flat) namespace. The ldd command will produce a list of library dependencies much like otool. However, there is a difference: otool does not list anything that is not encoded in some way within the binary itself. The output of ldd, on the other hand, is computed each time it is run, for its results depend on other files in the system, and possibly even the environment variables at the point the process is started.

Another difference between ldd and otool becomes apparent if we run them over our main2 program. On a Mac, otool reports that main2 links against lib2a and lib2b, just like we commanded. On Linux, though, ldd will produce a list that includes not just those two libraries, but also their dependencies. All of these are loaded in at run time just as if the main program had linked directly against them. This means that, as far as Linux is concerned, we are in the exact situation that we were in when we directly linked conflicting libraries into place, and it behaves accordingly.

We can, however, work around this. The POSIX standard provides a function called dlsym that may load any symbol out of any library, even those that you did not link against in the first place. That’s a mechanism that’s sufficiently general that a library could implement whatever linker/loader scheme it wants. It’s got some neat nonstandard-but-widely-available extensions too. GCC and compatible systems (including macOS’s clang compiler) define a special pseudo-library called RTLD_NEXT which will load the next copy of a function starting just past the library calling dlsym.

We can have some fun with that. Let’s!

Continue reading