[UPDATE, 20 April 2019: There has been a very large uptick in readers for this article following the release of some C64 software that openly made heavy use of the technique. This is an informal explainer from when I was just learning how the C64 actually did its graphics work. If you want to see a worked example of the technique, see this article from six months later and for a broader retrospective with links that talk through the actual way the C64’s graphics chip builds a screen and its consequences for things like status bars and split-screen scrolling, this 2018 retrospective is a good place to start, with links to the more detailed articles in handy lists.]
High-level work with computers often ends up being compared either to sorcery (well, OK, “wizardry”) or to martial arts as presented by action movies (The Codeless Code is a fairly impressive series of parables in that vein). Both of those traditions include a notion of “forbidden” spells or techniques, too dangerous to use and possibly violating the usual laws of causality.
The most sophisticated Bad Line manipulation is to create a Bad Line Condition within cycles 15-53 of a raster line in the display window in which the graphics data sequencer is in idle state, e.g. by modifying register $d011 so that YSCROLL is equal to the lower three bits of RASTER…. With this, it is possible to scroll the complete screen sideways by large distances… without having to move the graphics memory with the processor. If you now combine DMA Delay with FLD and Linecrunch, you can scroll complete graphics screens without using much computing time by almost arbitrarily large distances in all directions.
It seems that here in the 21st century the technique is usually called “Variable Screen Positioning” rather than “DMA delay”, presumably because of its ability to scroll complete graphics screens. If you take nothing else away from this article, take this: Do not use this technique. It is calling up that which you cannot put down. The best you can hope for is to be independent of the damage it will inevitably cause.
A secondary thing to take away from it is to be careful when mixing vertical scrolling with split-screen, and to be sure that you’re in HBLANK or VBLANK when changing the scrolling value. That should guarantee you never perform the technique “by accident” while trying to do something else.
A Somewhat Speculative History
A few months back I had the privilege of hearing old war stories from the developers of the 1986 proto-MMO Habitat. One of those stories was that some runs of the VIC-II had a hardware fault that made them crash the system they were on if you pushed them too hard, and that this had caused them some grief when developing their software back in the mid-1980s. I don’t have slam-dunk evidence that they had run afoul of this, but the symptoms they describe actually match up pretty well with this technique being triggered. In fact, if “pushing the VIC-II too hard” meant occasionally running out of cycles during HBLANK, you could easily end up accidentally triggering this hardware bug.
Fast forward to 1996, when the VIC article quoted was written. I suspect that at this time the dangers of the technique were not fully known, because otherwise the dangers would presumably have been listed in what was then and still is a comprehensive article. Between 1996 and 2013, the idea that use of VSP makes some C64s crash sometimes (and does so via somehow corrupting the RAM) gradually becomes common knowledge. It is quite possible that this is because the 1996 article really laid out what the chip does and thus encouraged more people to try it, and thus isolate what had the software trigger was. “The VSP Crash” seems to have been a notion that required no further explanation for much of this time, though there were no solid theories for what exactly caused it or could fix it. It was at least known that not everyone hit it and that power-cycling sometimes helped.
That changed shockingly recently. In February 2013, a member of the C64 scene named Lft released a demonstration fully outlining the nature of the fault and how to avoid it, including a demonstration that could limit the damage to graphic corruption alone, and the level of corruption could be enabled and disabled. The text in the demonstration is a full description of the fault and why it happens; here I hope merely to summarize.
Like most microcomputers, the Commodore 64 used DRAM, which needs to be regularly accessed with read/write cycles to keep the memory values stable. When neither it nor the CPU has anything better to do, the VIC-II graphics chip is responsible for keeping the DRAM occupied. There are two signals that the VIC-II chip must generate to do this; it must generate a sequence of addresses, and it also must inform the RAM chips that an address is ready to be processed. The design flaw that causes the VSP crash is that these two signals are independent; under software triggers that VSP reliably hits, the VIC-II can tell the DRAM to “refresh” a location that does not have an electrically stable address yet, resulting in bits from one location being used to refresh another. This is the cause of the RAM corruption, and it turns out that a lot of it depends on transient electrical properties and the error apparently requires nanosecond levels of synchronization to actually fail detectably; thus, temperature, phase of independent signals set randomly at poweron, or subtle manufacturing details of the motherboard traces all may be factors making a binary work on one machine but break on another.
All of which is to say, extreme kudos to Lft for his detective work.
What It Looks Like and What To Do About It
I’m going to outsource most of this explanation to Lft’s demo since he does a good job of describing the problem and what to do about it:
Let us call memory locations ending in 7 or F fragile. Sometimes when VSP is performed, several fragile memory cells are randomly corrupted according to the following rule: each bit in a fragile memory cell might be changed into the corresponding bit of another fragile cell within the same page.
[We can defend against this bug] in several ways: one approach is to ensure that every fragile byte in a page is identical. If the page contains code, for instance, corruption is avoided if all the fragile bytes are $ea (nop). Similarly, in font definitions, the bottom line of each character could be blank.
Another technique is to simply avoid all fragile memory locations. The undocumented opcode $80 (nop immediate) can be used to skip them. Data structures can be designed to have gaps in the critical places…. Data that cannot have gaps, i.e. graphics, is continuously restored
from safe copies elsewhere in memory.
Thus, for the first time, the VSP crash has been tamed.
His explanation then goes into greater detail about the signals that misbehave and what that means; the link above includes the full scrolltext in comments.
Can It Happen By Accident?
In order to trigger this bug, the following things must all be true:
- The VIC-II must be actively drawing.
- There is no character data left to draw (the “idle state”, where each 8-pixel value is the value of $3FFF).
- Either the vertical scroll or the graphics-enable bit have just been changed in a way that the VIC-II decides it’s time to start fetching character data.
If we make any of these conditions provably not hold, we’re safe:
- If we only mess with vertical scroll during HBLANK or VBLANK, or while the border is being drawn, we’re fine.
- If the VIC-II is in the middle of drawing a character, we’re fine.
- If the value we are scrolling to is not going to begin a character this scanline, we’re fine.
My guess is that I’ve basically listed the conditions in the order of how difficult it is to meet them; making sure that we’re only changing the scroll value while the border is being drawn would be happening as a matter of course anyway. The only time in normal operation the VIC-II is ever visibly idle is if you’ve decided to keep the chip in 25-row mode while adjusting vertical scroll. If you’re setting vertical scroll multiple times in a screen update, you may have a few lines of idleness in there, but under most circumstances that’s going to be as a result of the change you just made to the vertical scroll. The last requirement may be the hardest: if you’re trying to do a split-screen with one screen doing smooth vertical scrolling and the other fixed, the scanline you’d like to split on will be a badline every eighth frame. It seems like even getting an acceptably stable display may be challenging in that case.