I had done a fair amount with DOS as a youth, but I’d never really gotten sound to work. Let’s close that gap.
This is a challenge because music and sound are very much about precise timing. On the other systems I’ve talked about in this blog, we could get precise timing by cyclecounting the CPU. However, the IBM PC and its successors never had this option, because their architecture was and is fundamentally different.
The original PC was a 16-bit chip talking to an 8-bit data bus, with some extra circuitry handling the buffering of 16-bit reads or writes. Furthermore, I/O was handled to some degree by the devices themselves, all sharing a data bus. This meant that until quite late in the PC lifecycle (around the 486), the thing that most determined how long your code took to run was bus contention. This had two major results:
- Optimizing for size was optimizing for speed.
- No operation can be said to take a predictable amount of time, because instruction fetches are subject to bus contention.
This is a nightmare for the techniques that we’ve been exploring in the past, but it proved a great boon for the system architectures as a whole. In order to be able to steadily measure time in the original system, IBM added a timer chip (the Programmable Interval Timer, or PIT) to its system, independent of the CPU. And that furthermore meant that ever-faster CPUs could be introduced and software would remain compatible with proper timings, if it relied on that programmable timer. This gave software a longevity that the 8-bits couldn’t match.
It also meant that we got to have a programmable timer with a lot more modes than were, perhaps, strictly necessary. We’ll get to have some fun with that.
The PC Speaker, As Intended
The PC Speaker’s design is actually quite similar to the Apple II. It’s a one-bit speaker that only really knows about “in” or “out”, and you get the classic PC Speaker beep by making it change state regularly. There are three major differences:
- It’s controlled not via memory-mapped I/O, but through a special I/O port. This isn’t terribly exciting; it just turns out that most peripherals on the x86 are controlled through a separate address space. Our code will be using
OUTinstructions instead of
- Instead of toggling its state when accessed, you actively select the state—a 1 bit means out, a 0 bit means in. That’s not terribly exciting on its own, but it becomes very exciting when combined with the final difference…
- To compensate for the imprecision of CPU timing, the input bit can be tied to the output of the PIT. The PIT has three independent timers numbered 0-2, and timer 2 is dedicated to driving the PC Speaker.
This makes playing simple tones on the PC speaker really easy:
- Set PIT channel 2 into square-wave mode (“mode 3”).
- Load the timer’s wavelength with the desired value, which turns out to be 0x1234DD / F, where F is the frequency you want to play.
- Tell the PC Speaker to take its input from the PIT. The sound will begin.
- Once you’re done with the sound, tell it to stop taking its input from the PIT.
It’s all very asynchronous and concise. Here’s a complete program showing the technique, playing the C-major scale. It times the notes by busy-waiting on the BIOS tick count, which isn’t incredibly clever but does get the job done.
Speaking of the BIOS tick count…
That BIOS tick count is also controlled by the PIT. PIT timer 0 controls the system timer. The BIOS sets it to mode 2 with a wavelength of 0. The counter works by decrementing the counter and then checking for zero, so setting the wavelength to 0 means that it waits 65536 counts (about 55 milliseconds) before firing. Unlike our PC Speaker protocol, this uses timer mode 2. Instead of switching from low to high halfway through the cycle, it instead sends the signal high for one millisecond at the end of the cycle. That’s far more useful here, because timer 0’s output is tied to IRQ 0. We can interfere with that by replacing interrupt 08h.
We can also reprogram its timing just like anything else. That leads to our second PC Speaker technique.
The PC Speaker, 1-Bit Digital Audio Channel
DOS machines run way faster than we’re used to for our projects here. We can totally set IRQ0 to fire at 16kHz or so without overloading. Given that, we have a simple strategy for playing digital audio: have the interrupt load a series of bits and then feed that bit into the PC speaker. It’ll be scratchy—we are, in practice, overloading the speaker so hard that it’s clipping 100% of the time—but it’s still recognizable.
My sample file is a 16kHz, 8-bit unsigned linear PCM file I exported from Audacity. We’ll only be paying attention to the top bit for now.
The core of the IRQ0 processing is pretty straightforward. We mostly just have to make sure that we are properly wrangling far pointers:
lds bx, [dataptr] ; Load our data pointers mov si, [offset] cmp si, datasize ; past the end? jae .nosnd mov ah, [ds:bx+si] ; If not, load up the value rol ah, 1 ; Move bit 7 to bit 1 rol ah, 1 and ah, 2 ; And mask out everything else in al, 0x61 ; Or that with the state of port 0x61 and al, 0xFC ; (setting the PC speaker's target or al, ah ; state to the sample's high bit) out 0x61, al inc si ; Update pointer mov [offset], si jmp .intend ; ... and jump to end of interrupt
There is one other issue, though. If we simply acknowledge the interrupt and continue on our way, then our system clock will freeze, going out of sync for the duration of our audio. But if we forward it on, then we’ll be registering 55 milliseconds of time 16,000 times a second. That’s even worse.
The solution is to only call it when it needs calling. We keep a running count of how many timer cycles have been registered, and when it passes the BIOS-demanded threshold, we forward the call. This is easier than it sounds. The timer cycles are exactly the value we fed to the timer in the first place, and our target number is 65536. We can simply do 16-bit addition on a running total and the carry bit does all the work we need:
mov ax, [subtick] ; Add microsecond count to the counter add ax, counter mov [subtick], ax jnc .nobios ; If carry, it's time for a BIOS call mov bx, biostick ; Point DS:BX at our saved address... pushf ; and PUSHF/CALL FAR to simulate an call far [ds:bx] ; interrupt
The full program is here. One extra little trick I do here is that when I’m looping in the main program, I use the
HLT instruction to put the processor to sleep until the next interrupt. This was apparently uncommon to do in DOS’s heyday, first, because you always had something you were doing outside of interrupts, and second, because some machines in the 486 line would mask interrupts out on their way to sleep, so
HLT would hard-lock your machine forever. Still, DOSBox is fine with it, and so is my laptop, so I’m keeping it in.
It’s now time to move beyond dumb tricks and into complete implausible madness.
PC Speaker: 7-bit Digital Audio With Pulse Width Modulation
It’s sometimes easy to forget this, but computers are physical objects that exist in the real world. That means that the PC speaker actually has to spend some time altering its state to move from the “in” to the “out” position, and vice versa. That amount of time turns out to be standardized at roughly 60 microseconds. That has a couple of implications. One of them is that the volume of the speaker will drop as we reach frequencies of about 10 kHz; the square wave will be so fast that the speaker can’t make it all the way up to full extension, and a smaller wave means a quieter sound.
The other is that this means that if we intentionally swap it too fast, we can pick consistent extension levels for the speaker by tuning the frequency. This technique is far more general than just the PC speaker, and its effectiveness is unreasonably good.
No cheating here beyond a little amplification after recording. That is a PC Speaker. So how do we make this happen?
A reasonable first attempt is to set the frequency to a quarter of the 8-bit PCM value. That would give pulses of the desired length without overdriving, but the problem is that we can’t make sure that our IRQ will hit at a time that lets us reset the speaker between samples. We’ll drift to an edge and stay there.
What we do instead is use PIT mode 0, which is designed for one-shot timers. The signal goes low immediately when you load the counter in this mode, and then when it runs out the signal goes high and stays there. That’s perfect for our needs, as long as we have enough time to get back by the time it’s done. My original recording was a bit quiet, so even at 16kHz I only have to divide the 8-bit values by 2 and I’ll still always be done in plenty of time. The core IRQ logic becomes this:
lds bx, [dataptr] ; Load our data pointers mov si, [offset] cmp si, datasize ; past the end? jae .nosnd mov ah, [ds:bx+si] ; If not, load up the value shr ah, 1 ; Make it a 7-bit value mov al, 0xb0 ; And program PIT Channel 2 to out 0x43, al ; deliver a pulse that many mov al, ah ; microseconds long out 0x42, al xor al, al out 0x42, al inc si ; Update pointer mov [offset], si jmp .intend ; ... and jump to end of interrupt
It’s not much more complicated than the 1-bit digital audio case. It’s also another case, like the CGA composite mode, where programming this special effect involves just sending data in its natural format to a target associated with the effect’s system. The complete code is here.
This technique was apparently perfected some time in the early 1990s and licensed to game developers under the name “RealSound”. It was pretty popular; I had at least two games (Star Control 2 and Eric the Unready) that used it. Unfortunately, nobody really used the technique as a consumer because by the early 1990s pretty much every PC gamer had a proper sound card.
Almost everything I did here started with information I got from the OSDev Wiki, which provided extensive information about the PIT and also taught me the basics of the PC Speaker. It only hinted at the Pulse Width Modulation technique, but that was enough.