Encoded Audio Capture the Flag
by Mike Pfeiffer (DJ Pfeif)
Our music radio show encoded text into a broadcasted audio stream as part of a Capture the Flag event at the annual hacking convention (Shell On The Border 3) during the weekend of New Year's Eve 2024.
In 2014, my team and I started a radio show on a local community FM radio station. The programming committee was nice enough to let us broadcast drum and bass music weekly, which was a departure from their normal, and usually more accessible, media format. If you haven't heard of this genre of music, it's fast electronic dance music, considered by many people to be awful.
However, there are those of us who love it enough to broadcast it regularly, get nerdy with it, and ask the question: does hacking belong in music? As a member of the hacking/making community, this underground music has a very appealing DIY backbone that has cemented it as our favorite hacking soundtrack.
After a few years of broadcasting regularly on the air, we changed the name of the show from the overtly obvious "Drum & Bass with DJ Pfeif" to something that reflected some of the developing themes in the show. It is now called Hack The Planet, and if you just moaned, then you're in the right mental space. It's a bit tongue-in-cheek, and weirdly represents the corny facade masking our attempts to be more sophisticated with the daily fun of what we do every time we broadcast.
The main theme of the show is the drum and bass music. But almost everything else in the show is centered around the theme of hacking, from our recreations of famous broadcast intrusions (example: Max Headroom and Ztohoven) to the celebrations of famous phone phreaks and malware. (Examples: Blue Boxes and MEMZ)
"2600" (both the magazine and the frequency) is featured in several places throughout the show as Easter eggs. We do a pretty good job of providing some good hacking/phreaking history if you know where to look. The hacking theme blossomed when we leaned into our online video stream, originally on Facebook and now on Twitch.
We picked up a regular following by podcasting all our weekly shows and making everything as free as possible, which is how shyft found us. He and the fs2600 crew have been hosting a hacking convention called Shell On The Border for the past couple of years in Fort Smith, Arkansas (BYOCTF.com). He reached out and asked us to perform Hack The Planet live during his event. shyft and his team pumped our Twitch stream live to his amazing, self-built arcade/hacking arena during Shell On The Border's Capture the Flag (CTF) event.
At shyft's request, we integrated the CTF into the radio show by placing flags throughout the performance, which we ecstatically developed. In a typical CTF event, hackers hack to find preloaded flags hidden in cool places like deep within code, or encoded into computer madness, or possibly loaded into the Master Boot Record (MBR). Shell On The Border has a unique twist where hackers earn points by capturing flags, which they can redeem to develop and submit their own flags to the local community, thereby continuing and expanding the fun for the duration of the event.
Not being on-site to do some real time hacking, we included five flags in our Twitch stream for conference participants to find. The first was hidden in a honeypot within the chat found using pseudo-shell commands. The second and third flags were found in chat games centered on the rules of hacking and phone phreaking. The fourth flag was encoded in an image posted online. And the final flag was encoded in a sound that we played live during the show.
While this isn't a new technique, we thought it was fun and appropriate to the theme of the show and convention. Our process of developing the fifth flag is described below.
If you've ever used software to edit or work with audio, then you've probably seen a graphical representation of an audio waveform. Programs like Audacity (which is free and open-source) provide a default view of audio in this format. Most widely-available audio editing software packages, or Digital Audio Workstations (DAWs), use the time-domain representation to view sound data.
It represents time along the horizontal axis and the overall amplitude at any given time (think volume) of the sound on the vertical axis. DAWs sometimes have an alternate way to view the audio data: instead of displaying amplitude along the vertical axis, they show frequency. Amplitude is then represented by color changes on the screen.
For example, the louder a frequency, the brighter the point will be at that time. This representation of frequency, called a spectrogram, is what you'd need to use to see the flag that we encoded into Hack The Planet's audio stream.
As a side note, this technique is also useful for seeing certain types of secret information encoded into digital audio files. You never know what kind of data might be lurking in your audio, like audio watermarks for DRM tracking.
Here's a quick reminder of the physics of sound.
Frequency is the measure of how many things occur in a given time period. I would venture that most hackers have a good understanding of frequency when it comes to processing speeds. In sound and music, we measure the number of vibrations of a sound wave in a second, using the familiar unit Hertz (Hz). It's the same unit of measure as the speed of our CPUs. But while our CPUs are measured in gigahertz (GHz), we measure audible sound in the range of 20 hertz to 20 kilohertz. Below 20 Hz and our brains process the sound as a sequence of individual noises instead of a continuous tone; our ears' sensory organs can't sufficiently respond to frequencies above 20 kHz and, if you're like me, then you can't (and don't want to) hear really high frequency noises above ~16 kHz.
For reference, a mosquito's wings buzz at around 600 Hz, and really good bass frequencies fall below 100 Hz. Remember, Hack The Planet plays drum and bass music, so we love those deep bass frequencies!
Our goal was to play a sound over the music that would display as readable text (the flag) when viewed as a spectrogram. We started by creating an image with a white background with black text. We used Inkscape, a free and open-source vector graphics program.
When converting the image to sound, we treat the image as if it were a spectrogram in the first place. Sounds will be encoded as the image is read from left to right, and the height of a black pixel would represent a specific frequency. White space is ignored, and black is converted into oscillation data. The frequency of the sound waves is dependent on the vertical position of the black text; lower text in the image translates to lower frequencies, and higher black pixels translate to higher pitch sounds.
A black line running from the lower-left corner to the upper-right in a converted image would sound like an increasing tone over time. We can manually adjust the length of time of the sound file that we output, so we can stretch output sound to span fractions of a second to minutes in length.
We kept the important part of the message mostly in the lower half of the image so that when it is represented in sound, it stays in the lower parts of the audio spectrum, which is more pleasant to hear than ridiculously high pitched squelches and whines. I'm sure most people won't find the converted audio that we used in the live show to be pleasant or musical, but at least they weren't ear-piercing. There are some neat examples of people using this technique in their own commercially available songs; it's pretty cool to marry listenable music with secret data.
We converted the image into sound wave files using modified Python code developed by Sam (www.hackster.io/sam1902/encode-image-in-sound-with-python-f46a3f). Sam provides some cool examples and code at that link.
The general process is as follows: The image is converted into an array of numbers representing black and white pixels. Then the columns of pixel data are converted into oscillation data and extended for a certain amount of time determined by the overall length of the output file and the width of the image. The frequency spectrum is quantized into ranges determined by the user and the height of the image. We chose to keep our frequencies below 8000 Hz. Finally, the file is gathered in Python's wave library and output as an audio file. We took that file and loaded it into our digital turntables to be played on the air.
During the live performance of Hack The Planet at Shell On The Border 3, we waited until the music fell to a relatively quiet section, when we knew the drum and bass music wouldn't act like overly aggressive noise compared to the encoded flag sounds. We appreciate this transformation of our audio perception here: the music became noise, and what would normally be perceived as noise became the main feature.
The encoded message sounds like a series of chirps and beeps spanning about ten seconds. If you listen closely, you can hear patterns in the sounds, like curves in the image being represented as sweeps in frequency. As the DJ, I gave a verbal announcement over the air that a flag was incoming so that hackers could tune in to the audio stream. After I had played the message, I let listeners know that it would happen again later in the show, hoping that someone would get ready to record the sound for decoding and subsequently earn some hacking points!
That's where Audacity or a similar DAW would help record and then visualize the sound. It's even possible to take the digital audio recording of the performance and translate it back into an image with the music-as-noise coloring the output image, which is what we did as a reminder of how much fun that show was for us!
To view an example of this technique using some familiar text, visit: djpfeif.com/an-image-encoded-in-sound
You can find the original recording of the radio show, Hack The Planet, Episode 473 at: djpfeif.com/2023/12/31/hack-the-planet-473-on-12-30-23-sotb.
The audio flag can be heard at around the 54:00 minute mark.
Details about Hack The Planet and DJ Pfeif are at djpfeif.com.