Python 2 is retiring on the first day of 2020. There’s even a countdown clock of doom. Python 3—the language that is replacing it—introduces some incompatible changes and normally is treated as if it were a separate language entirely. Even with only a few months of support left, thought. Python 2 remains heavily used.
This is honestly a bit odd. Python 3 is ten years old now, and Python 2 was supposed to leave support in 2015. There was a large outcry at the time that this would not provide enough time to migrate existing code, though, and it was pushed out five years to 2020. A cynic might conclude that this basically resulted in four additional years of not bothering to upgrade, but library support really did improve a lot over that time and I suspect that this time the retirement will stick.
I’ve been sticking with Python 2 in my code here on Bumbershoot mainly because macOS has shipped with Python 2 exclusively all this time, and installing Python 3 cleanly can be something of an ordeal. I can’t really put off upgrading any longer, though. Some of the Linux distros out there are already running Python 3 by default. I won’t be altering any of the code snippets I’ve put in posts here—most of them should work in both versions, anyway, since they’re usually just math. How I’ve chosen to deal with each project, however, has varied a bit.
My largest Python project is the Ophis assembler. When I first wrote it to teach myself Python, it was written for version 2.1. It has been updated over the years to take advantage of new feature (like the addition of True and False in 2.3) and it’s normally been shipped as a standalone application that contains its own interpreter. I thus chose to update Ophis to Python 3 and drop support for Python 2 systems completely.
Most of the work was done by the 2to3 tool. This script ships with Python and it automates most of the necessary changes, where library calls alter their syntax or types are subsumed into more general ones and removed. However, this wasn’t quite enough to get Ophis up and running:
- Python-2 Ophis was not Unicode-aware, while Python 3 has all strings be Unicode by default. This was a bit of a problem because Ophis makes heavy use of strings-as-byte-sequences to represent assembled code. The strings that were really byte sequences needed to be manually annotated as such.
- Now that byte sequences were actually, explicitly, byte sequences, it was no longer necessary to use ord and chr to translate between bytes-as-integers and bytes-as-characters as needed.
- The range and map functions are now “lazy”—they don’t produce values until those values are asked for as part of some other computation. Under the hood, they are now returning iterator objects instead of collections. That’s usually fine—in fact, it’s usually a savings in both space and time—but I had a few places where I really needed an actual collection and not just a series of values to be returned later. I thus needed to pass the result of map or range the list constructor to force it to be properly realized.
- My one use of a Windows-specific API needed had been changed incompatibly and I needed to modify it.
- I had an unfortunate tendency to open files with the file constructor instead of the open function, and in Python 3 only the latter works without some extra configuration that I generally shouldn’t be bothering to do. I needed to change those as well.
The Bumbershoot Scripts
According to Github’s statistics, 3.8% of the code in the Bumbershoot Software is in Python. These are all in that uncomfortable space between one-shot scripts and actual reusable distributable programs. I’d like them to be trivially runnable anywhere, but I also don’t want to have to actually impose any kind of installation step. Ideally a user should be able to basically just throw a single file anywhere and have it work.
The earliest versions of Python 3 were released alongside new versions of Python 2, and the languages were much further apart at that point. With the release of 3.2 the developers made an effort to ensure that there was some subset of the language that would behave identically in both Python 2 and 3. It’s possible to import things from the notional __future__ module in Python 2 to make 2.6 or 2.7 behave more like Python 3, which helps for some of the more unavoidable differences in default behavior.
My goal then for these was to update the scripts so that they would run happily ignorant of which version of Python was actually running them. Python 2.7 is nine years old at this point and I have no qualms about insisting on that version if Python 2 is to be used at all.
Half my scripts are, broadly speaking, “linking” scripts—their job is to either consume or emit binary blobs and decorate them appropriately so that they can be fed to emulators or into actual hardware. These will suffer from the same kinds of issues regarding binary and Unicode strings that we saw in Ophis, but there is an extra wrinkle now—getting a byte out of a Python 2 string requires use of the ord function, but getting a byte out of a Python 3 bytestring forbids it. This remains true even though 2.7 and 3.x both accelt b'ABC' as representing the byte sequence 65-66-67.
My solution for this was to rely on a different builtin data type that is shared between the versions: bytearray. This behaves in a manner very similar to Python 3 byte strings in both Python 2 and 3. Furthermore, the bytearray type is mutable, and while mutability often makes programs harder to reason about, here it actually makes things like inserting checksums into binary images after the fact far easier.
Some of these scripts also relied on doing division, which had its behavior change between 2 and 3—in Python 3, 5/2 yields 2.5, while in Python 2 it truncates to 2. Python 2.7 will give you the Python 3 behavior if you import division from __future__, but in my case I generally wanted truncating division because I was generally trying to count out integral numbers of file blocks or something. Fortunately, both 2.7 and 3.x offer // as an integer-divide operator that behaves like Python 2. I had already been using that in Ophis, and I mirrored it in these scripts.
The scripts that didn’t deal with binary files were all content generation scripts that operated purely on strings and lists and floating point numbers, and those became runnable in both 2.7 and 3.x simply by running 2to3 on them and requiring no additional changes.
My plan for future posts where I make use of scripting languages is to provide them in Python 3. Tools that are intended to be generally useful will stick to hybrid 2.7/3.x as before, but tools that would be useful on older machines themselves might instead be provided in C so that they could be compiled to run on the older systems. I’d already done this with bin2asmx earlier, but I suspect I won’t be making a habit of it.