Monday, August 7, 2017

Reconstructing old game PC speaker music

Back when dinosaurs walked the earth regular PCs did not have sound cards by default. Instead they had a small piezoelectric speaker that could only produce simple beeps. The sound had a distinctive feel and was described with words such as "ear-piercing", "horrible" and "SHUT DOWN THAT INFERNAL RACKET THIS INSTANT OR SO HELP ME GOD".

The biggest limitation of the sound system was that it could only play one constant tone at a time. This is roughly equivalent to playing the piano with one finger and only pressing one key at a time. Which meant that the music in games of the era had to be simple. (Demoscene people could do crazy things with this hardware but it's not relevant for this post so we'll ignore it.)

An interesting challenge, then, is whether you could take a recording of game music of that era, automatically detect the notes that were played, reconstruct the melody and play it back with modern audio devices. It seems like a fairly simple problem and indeed there are ready made solutions for detecting the loudest note in a given block of audio data. This works fairly well but has one major problem. Music changes from one note to another seamlessly and if you just chop the audio into constant sized blocks, you get blocks with two different consecutive notes in them. This confuses pitch detectors. In order to split the sound into single note blocks you'd need to know the length of each note and you can't determine that unless you have detected the pitches.

This circular problem could probably be solved with some sort of an incremental refinement search or having a detector for blocks with note changes. We're not going to do that. Let's look at the actual waveform instead.
This shows that the original signal consists of square waves, which makes this specific pitch detector a lot simpler to write. All we need to do is to detect when the signal transitions between the "up" and "down" values. This is called a zero-crossing detector. When we add the duration of one "up" and the following "down" segment we have the duration of one full duty cycle. The frequency being played is the inverse of this value.

With this algorithm we can get an almost cycle-accurate reconstruction of the original sound data. The problem is that it takes a lot of space so we need to merge consecutive cycles if they are close enough to each other. This requires a bit of tolerance and guesswork since the original analog components were not of the highest quality so they have noticeable jitter in note lengths. With some polishes and postprocessing you get an end result that goes something like this. Enjoy.

No comments:

Post a Comment