Beyond Beep-Boop: Mastering the PC Speaker

I had done a fair amount with DOS as a youth, but I’d never really gotten sound to work. Let’s close that gap.

This is a challenge because music and sound are very much about precise timing. On the other systems I’ve talked about in this blog, we could get precise timing by cyclecounting the CPU. However, the IBM PC and its successors never had this option, because their architecture was and is fundamentally different.

The original PC was a 16-bit chip talking to an 8-bit data bus, with some extra circuitry handling the buffering of 16-bit reads or writes. Furthermore, I/O was handled to some degree by the devices themselves, all sharing a data bus. This meant that until quite late in the PC lifecycle (around the 486), the thing that most determined how long your code took to run was bus contention. This had two major results:

  • Optimizing for size was optimizing for speed.
  • No operation can be said to take a predictable amount of time, because instruction fetches are subject to bus contention.

This is a nightmare for the techniques that we’ve been exploring in the past, but it proved a great boon for the system architectures as a whole. In order to be able to steadily measure time in the original system, IBM added a timer chip (the Programmable Interval Timer, or PIT) to its system, independent of the CPU. And that furthermore meant that ever-faster CPUs could be introduced and software would remain compatible with proper timings, if it relied on that programmable timer. This gave software a longevity that the 8-bits couldn’t match.

It also meant that we got to have a programmable timer with a lot more modes than were, perhaps, strictly necessary. We’ll get to have some fun with that.

The PC Speaker, As Intended

The PC Speaker’s design is actually quite similar to the Apple II. It’s a one-bit speaker that only really knows about “in” or “out”, and you get the classic PC Speaker beep by making it change state regularly. There are three major differences:

  • It’s controlled not via memory-mapped I/O, but through a special I/O port. This isn’t terribly exciting; it just turns out that most peripherals on the x86 are controlled through a separate address space. Our code will be using IN and OUT instructions instead of MOV.
  • Instead of toggling its state when accessed, you actively select the state—a 1 bit means out, a 0 bit means in. That’s not terribly exciting on its own, but it becomes very exciting when combined with the final difference…
  • To compensate for the imprecision of CPU timing, the input bit can be tied to the output of the PIT. The PIT has three independent timers numbered 0-2, and timer 2 is dedicated to driving the PC Speaker.

This makes playing simple tones on the PC speaker really easy:

  1. Set PIT channel 2 into square-wave mode (“mode 3”).
  2. Load the timer’s wavelength with the desired value, which turns out to be 0x1234DD / F, where F is the frequency you want to play.
  3. Tell the PC Speaker to take its input from the PIT. The sound will begin.
  4. Once you’re done with the sound, tell it to stop taking its input from the PIT.

It’s all very asynchronous and concise. Here’s a complete program showing the technique, playing the C-major scale. It times the notes by busy-waiting on the BIOS tick count, which isn’t incredibly clever but does get the job done.

Speaking of the BIOS tick count…

That BIOS tick count is also controlled by the PIT. PIT timer 0 controls the system timer. The BIOS sets it to mode 2 with a wavelength of 0. The counter works by decrementing the counter and then checking for zero, so setting the wavelength to 0 means that it waits 65536 counts (about 55 milliseconds) before firing. Unlike our PC Speaker protocol, this uses timer mode 2. Instead of switching from low to high halfway through the cycle, it instead sends the signal high for one millisecond at the end of the cycle. That’s far more useful here, because timer 0’s output is tied to IRQ 0. We can interfere with that by replacing interrupt 08h.

We can also reprogram its timing just like anything else. That leads to our second PC Speaker technique.

The PC Speaker, 1-Bit Digital Audio Channel

DOS machines run way faster than we’re used to for our projects here. We can totally set IRQ0 to fire at 16kHz or so without overloading. Given that, we have a simple strategy for playing digital audio: have the interrupt load a series of bits and then feed that bit into the PC speaker. It’ll be scratchy—we are, in practice, overloading the speaker so hard that it’s clipping 100% of the time—but it’s still recognizable.

My sample file is a 16kHz, 8-bit unsigned linear PCM file I exported from Audacity. We’ll only be paying attention to the top bit for now.

The core of the IRQ0 processing is pretty straightforward. We mostly just have to make sure that we are properly wrangling far pointers:

        lds     bx, [dataptr]   ; Load our data pointers
        mov     si, [offset]
        cmp     si, datasize    ; past the end?
        jae     .nosnd
        mov     ah, [ds:bx+si]  ; If not, load up the value
        rol     ah, 1           ; Move bit 7 to bit 1
        rol     ah, 1
        and     ah, 2           ; And mask out everything else
        in      al, 0x61        ; Or that with the state of port 0x61
        and     al, 0xFC        ; (setting the PC speaker's target
        or      al, ah          ; state to the sample's high bit)
        out     0x61, al
        inc     si              ; Update pointer
        mov     [offset], si
        jmp     .intend         ; ... and jump to end of interrupt

There is one other issue, though. If we simply acknowledge the interrupt and continue on our way, then our system clock will freeze, going out of sync for the duration of our audio. But if we forward it on, then we’ll be registering 55 milliseconds of time 16,000 times a second. That’s even worse.

The solution is to only call it when it needs calling. We keep a running count of how many timer cycles have been registered, and when it passes the BIOS-demanded threshold, we forward the call. This is easier than it sounds. The timer cycles are exactly the value we fed to the timer in the first place, and our target number is 65536. We can simply do 16-bit addition on a running total and the carry bit does all the work we need:

        mov     ax, [subtick]   ; Add microsecond count to the counter
        add     ax, counter
        mov     [subtick], ax
        jnc     .nobios         ; If carry, it's time for a BIOS call
        mov     bx, biostick    ; Point DS:BX at our saved address...
        pushf                   ; and PUSHF/CALL FAR to simulate an
        call    far [ds:bx]     ; interrupt

The full program is here. One extra little trick I do here is that when I’m looping in the main program, I use the HLT instruction to put the processor to sleep until the next interrupt. This was apparently uncommon to do in DOS’s heyday, first, because you always had something you were doing outside of interrupts, and second, because some machines in the 486 line would mask interrupts out on their way to sleep, so HLT would hard-lock your machine forever. Still, DOSBox is fine with it, and so is my laptop, so I’m keeping it in.

It’s now time to move beyond dumb tricks and into complete implausible madness.

PC Speaker: 7-bit Digital Audio With Pulse Width Modulation

It’s sometimes easy to forget this, but computers are physical objects that exist in the real world. That means that the PC speaker actually has to spend some time altering its state to move from the “in” to the “out” position, and vice versa. That amount of time turns out to be standardized at roughly 60 microseconds. That has a couple of implications. One of them is that the volume of the speaker will drop as we reach frequencies of about 10 kHz; the square wave will be so fast that the speaker can’t make it all the way up to full extension, and a smaller wave means a quieter sound.

The other is that this means that if we intentionally swap it too fast, we can pick consistent extension levels for the speaker by tuning the frequency. This technique is far more general than just the PC speaker, and its effectiveness is unreasonably good.

No cheating here beyond a little amplification after recording. That is a PC Speaker. So how do we make this happen?

A reasonable first attempt is to set the frequency to a quarter of the 8-bit PCM value. That would give pulses of the desired length without overdriving, but the problem is that we can’t make sure that our IRQ will hit at a time that lets us reset the speaker between samples. We’ll drift to an edge and stay there.

What we do instead is use PIT mode 0, which is designed for one-shot timers. The signal goes low immediately when you load the counter in this mode, and then when it runs out the signal goes high and stays there. That’s perfect for our needs, as long as we have enough time to get back by the time it’s done. My original recording was a bit quiet, so even at 16kHz I only have to divide the 8-bit values by 2 and I’ll still always be done in plenty of time. The core IRQ logic becomes this:

        lds     bx, [dataptr]   ; Load our data pointers
        mov     si, [offset]
        cmp     si, datasize    ; past the end?
        jae     .nosnd
        mov     ah, [ds:bx+si]  ; If not, load up the value
        shr     ah, 1           ; Make it a 7-bit value
        mov     al, 0xb0        ; And program PIT Channel 2 to
        out     0x43, al        ; deliver a pulse that many
        mov     al, ah          ; microseconds long
        out     0x42, al
        xor     al, al
        out     0x42, al
        inc     si              ; Update pointer
        mov     [offset], si
        jmp     .intend         ; ... and jump to end of interrupt

It’s not much more complicated than the 1-bit digital audio case. It’s also another case, like the CGA composite mode, where programming this special effect involves just sending data in its natural format to a target associated with the effect’s system. The complete code is here.

This technique was apparently perfected some time in the early 1990s and licensed to game developers under the name “RealSound”. It was pretty popular; I had at least two games (Star Control 2 and Eric the Unready) that used it. Unfortunately, nobody really used the technique as a consumer because by the early 1990s pretty much every PC gamer had a proper sound card.

Acknowledgements

Almost everything I did here started with information I got from the OSDev Wiki, which provided extensive information about the PIT and also taught me the basics of the PC Speaker. It only hinted at the Pulse Width Modulation technique, but that was enough.

3 thoughts on “Beyond Beep-Boop: Mastering the PC Speaker

  1. zment

    Hey, I used the 7bit PWM audio example in my video. I just realized that I’d probably need to ask for permission for it. I couldn’t find any contact address, so I’m commenting here. Is it okay if I use your PWM example sound in my video? Here is the video in question https://www.youtube.com/watch?v=BJS0j2B2EiY

    It’s about using the PC Speaker for playing music in a custom “OS” that runs only Tetris.

    1. Michael Martin Post author

      Yep, that is fine; the comment in the YouTube description is enough to match it. Stuff in the Bumbershoot repository is generally available under the 2-clause BSD license, which boils down to “put a line in the credits saying you used it”, so you’re fine.

      I should probably put in an “about the blog” page up top too, now that you mention it.

  2. Fahr

    I just found this and I have to say it’s really good stuff. This will help tremendously with a project I am working on, so thanks!

    One note on something that puzzled me a bit; when I started exporting my own audio from Audacity, the sample was set to stereo. The result worked fine (at 16kHz), but played twice as fast for some reason. After I dropped the stereo to mono, everything worked as expected.

    I also have 1 question; in your explanation you say that the divider is calculated by 0x1234DD / F – which I get. In your examples, however, you do;

    counter equ (0x1234DC / 16000) & 0xFFFE

    That seems to be ALMOST the same thing, but not quite? The original number is 1 lower and then you drop the least significant bit from the result. Both methods appear to work fine in my tests, so I was just curious why you are using this approach.

    Thanks again! I plan on writing up my whole project at some point and I will definitely link back here.

Comments are closed.