Retsaot is Toaster, Reversed: Quick 'n Dirty Firmware Reversing [Archive]

Matasano

April 30th, 2008, 04:21

I recently worked on a project that involved embedded systems and reverse engineering. This sort of territory can be a little hairy the first few times out. I ran into some interesting challenges and discoveries along the way which I thought might be worth writing a little bit about. I can’t tell you what the target was. But, it was important. And, we beat the crap out of it. So instead, I’ll tell you what I wish it was: a networked 4-slot toaster.

Now… to make things interesting; Early on, I’d discovered a vulnerability in the toaster that allowed any attacker to load their own firmware on the device. Ouch! My toast! My beautiful toast!

In order to drive home the risk (mostly to the vendor) of the firmware loading vulnerability, I was asked by my customer (also the vendor’s customer) to demonstrate the attack by actually loading malicious firmware onto the device and getting it to run.

Mind you, the request to prove this is actually pretty sane. I had little knowledge of the boot loader, or even of the firmware image format. I couldn’t say for sure that there wasn’t a code-signing feature, which would prevent the toaster from loading any image that wasn’t cryptographically signed by the vendor. That would have rendered the firmware loading attack impotent. To make things worse, the vendor was being pretty light on details. Can’t say I blame them.

So… I was tasked with demonstrating that the vendor’s firmware:

Could be reverse engineered
Modified
Loaded back onto the device
By an attacker

Think “embedded rootkit ("http://en.wikipedia.org/wiki/Rootkits")”. Embedded rootkits are sort of a holy grail: it’s easy to load a rootkit on a PC, but tricky to get them onto a wireless router or switch (or toaster). The reward for doing it, though, is that nobody ever thinks to check their router or toaster for rootkits. Initially, my bar was just going to be to demonstrate I’d actually changed some aspect of the program with my own firmware patch and move on to more penetration testing.

.

Some embedded projects are easier than others. Sometimes you get lucky. You find a debug shell, a file-system, or a serial console. A lot of times, devices that look like black boxes have JTAG ports, to which you can attach a debugger. Devices built in the last 10 years tend to have GDB stubs, so you can target them with cross-debuggers. Some even have firewire, through which you can DMA in and out of system memory.

This wasn’t any of those. Without reversing the firmware, I had no way to execute code on the device. For the purposes of penetration testing, I really couldn’t even see what was actually happening on my target. I hate that. All I had to work with was the firmware image file itself, offline, and some observable external behaviors of the device.

So… if I was going to make any of my objectives happen, I was basically starting purely from static analysis. Back in 2005, Tom wrote about a similar set of steps ("http://www.matasano.com/log/147/why-i-love-vulnerability-analysis-in-2005/") he took given similar starting point. Without realizing it this time out, I actually followed them almost to the letter. There were a few points where our paths diverged slightly because of differences in our target and approaches.

.

This is my story. Details have been fuzzed to protect the guilty toaster.

1. In Which I Get The Image

I start out with a couple megs worth of firmware image file. This file is sent to the device when you do a firmware upgrade the official way. You can usually find firmware images on the vendor’s download site, or on the CDs that came in the box.

2. In Which I Scan The Image

I feed it to deezee. Deezee is part of Matasano’s homegrown “Black Bag” toolkit. What deezee does is search through a binary file for compressed data by looking for zlib signatures, and then extracting anything it finds. Its crazy how often this seems to work on unknown binary files. As it turns out, zlib is the industry standard compression format, even for toasters.

Sure enough, deezee finds not one, but 3 distinct blobs of compressed data packed in the file. Hmm… really… deezee should tell me where it finds these things I think. Lets fix that now ("http://www.matasano.com/download/deezee.patch"). Much better.

3. In Which I Read

With addresses for compressed chunks in hand, I open up the file in my hex editor.

Deezee found the first compressed segment around 384 bytes from the beginning of the file. Here’s where technique comes into the picture.

Deezee found three “hits” in the image. I assume they aren’t just sprinkled haphazardly throughout the image; there should be some kind of header on them. I want to know what that header looks like. Maybe it’s used for things besides compressed blobs? The way I’m going to do that is, I’m going to compare the bytes surrounding the compressed blobs for all three hits.

So, comparing the preceding 384 bytes of that and the other two hits, I see several similarities:

A 4-byte signature. I can tell because it’s always the same four bytes, in the same place relative to the blob.
An ASCIIZ string, apparently describing the firmware version, padded to 128 bytes. You see “padded strings” all the time in binaries; what they are is a C-struct with a “char version[128]”.
A 16-byte ASCIIZ string of numbers describing just the version, again padded with NULs.
4 bytes which, when unpacked as an integer, happens to match the size of the compressed chunk. (If your hex editor won’t tell you the big and little endian 32 bit value of four highlighted bytes, get a new hex editor). This is the trick that cracks most image formats: you look for 16 or 32 bit fields that look like lengths, and try to reconcile them to the file and its features.
4 bytes which… hmm… could be CRC32 checksum? This is a bit of a shot in the dark, but it’s easy enough to check:

Code:

$ cat app.bin | ruby -e &#92;

'require "zlib"; puts Zlib.crc32( STDIN.read ).to_s(16)'

a76ea2ad

And… they match! I’ll need to keep this checksum in mind when I try modifying the file. Bodes well for no code signing!

Why did I try a CRC32 checksum first? Well… first off I assumed the field was 32 bits long. And CRC’s (Cyclic Redundancy Check) ("http://en.wikipedia.org/wiki/Crc32")are used a lot in file formats that encapsulate other files. As you can see above, I used Zlib’s crc32 function to check my file chunk. Could just easily have been OpenSSL’s since it too is a standard Ruby library. Two shining examples of how common CRC32 is.

4. In Which Other Headers Are Found

Now that I know what to look for, I find out that this header also prefixes some other chunks. Ones that aren’t compressed and so weren’t noticed by deezee. I also confirm that CRC32 header field for each chunk and it’s used on the other’s too. Five total, including the compressed ones. The last one is a big chunk of base64 which looks like it might be an encryption signature. Hmm… might be used for code signing somehow? Looks like I still need to confirm or discount this possibility.

5. In Which I Check For Metadata

Go back and look again at the original chunks of compressed data. The zlib headers in the file included the original filenames which give me some clues as to what each is. Going on filenames, I’m thinking I’ve got phase one and two boot-loaders and the third chunk is the actual app. I’m interested in the application. The Unix binutils ‘file’ command doesn’t tell me anything useful about any of them, but it’s always worth a shot.

6. Behold: Low Hanging Fruit

Strings the application file using GNU strings with ‘-t x’ so I get nice hex offsets for the where the strings are found. Lots of interesting stuff:

Uh… well there’s this:
http://www.matasano.com/download/vxcap.pngNice ASCII art. Just a wild guess, but I think maybe this is VxWorks?
The regular string ‘vxworks’ shows up in a lot more places too.
“GNU ld version 2.9-mips3264-010729”. Ok so it’s MIPS. This is also good news. It means that somewhere out there, there’s probably a GNU binutils distribution that can understand this file format.
Lots of what looks like function names close together. When do function names occur in compiled output? Two reasons: debugging code, or a symbol table. A symbol table would be a score; I make a note of that for later.
Lots of assertion messages with references to source code line numbers xxx.c(###). Always handy. Even if you don’t have symbols for functions, with an hour of data entry in your disassembler you can usually fake them based on debug and assertion strings. You want to take the time to do this. By the time you’ve guessed even 10% of the symbols in an image, you’re usually going to be able to comprehend 90% of what the image does, without reading any assembly.
The strings you see when the device boots up.
Lots of error, exception, and status messages such and one typically finds in a program. Sometimes very handy as we’ll soon see.

7. In Which Firmware Is Patched

I’ve got a pretty good idea of what the program is and how it’s rolled up in the firmware image file.

My job right now is to figure out whether I can change the firmware and load it onto the toaster. So I make a simple change to it: there’s a string which gets displayed when the system boots up. Using my hex editor, I change it to something noticeably different, being careful to keep it the same size as the original. Then I re-compress it at the same level as the original, re-roll it into the header format, and change the CRC32 checksum.

I load it onto my target using the bug I found… and… sweet satisfaction! If there’s any code signing here, it doesn’t work. That spooky looking signature at the end is for something else. Knowing this, It’s well worth the effort to keep reversing the image.

8. In Which A Loader Offset Is Sought

Now, the binutils ‘file’ command didnt recognize the compressed image, nor did it recognize either of the apparent “boot-loaders”. If it was a well-known format like ELF or COFF, it would have.

Unfortunately, the executable headers are what tell us how to load the whole program into memory. Without a known executable format, I’ll need to find out the load offsets myself. If I don’t, when I load the program into a disassembler, none of the data and instruction offsets will make sense. You can read a binary image with broken offsets, but it’s not fun.

Headers are usually located at the beginning of the file. But I’m not seeing any strong indications of any sort of header information at all in this one. My guess: an older version of VxWorks, predating VxWorks ELF. On the bright side, firmware images aren’t usually relocatable. Unlike a Windows PE program, which is literally edited and patched up by the linker at runtime, firmware tends to be loaded at a specific address.

I could try reversing either or both of the bootloaders to find that address, but… I might just run into the same problem walking all the way back up to the first boot-loader. To be honest, I’m not really interested in unlocking the secrets of the toaster boot loader. If I was trying to jailbreak the toaster to heat unauthorized off-brand Pop Tarts, maybe I would be. Exercise for the reader.

9. In Which I Infer A Loader Offset

I’ve got another idea. Lets look closer at some of these strings and their surrounding data. I scroll around in my hex editor around the sections with strings until I’ve found what I’m looking for.

http://www.matasano.com/download/color_dump1.png

What we’re looking for is a list of strings preceded by their address table. At this point, I really don’t care what the strings are, just that they are recognizable text. The addresses prefixing the strings in the table are 32-bit big endian numbers pointing to the ASCIIZ strings. Only parts of the addresses match up to real addresses in the file, though, and this is key.The first entry (blue) points to 0x104DEDD8. The string actually lives at 0x004D9DD8 in the file. Subtract the two and you get 0x10005000 - the load offset for this segment.

If I’m unsure of my assumptions here, I check some of the other entries in the table (red, green, and brown). Same pattern holds. We’ve got a winner.

10. In Which I Sanity Check The Offset

Search for 0x10005000 and 0x5000 near the beginning of the file. I want to confirm my assumptions about the lack of a header. Nothing comes up. I’m still very certain this is my winner. The lack of a search hit here suggests again that the loader decides where to load everything, though this program’s compiler knew where that would be when it was compiled too.

11. In Which I Dissassemble

We are now at step 11 and it is time to load the file in a disassembler.

We use IDA Pro ("http://www.hex-rays.com/idapro/"). IDA wants to know the architecture of the file. We choose a processor type of “mipsb”; that’s MIPS, which we learned from scanning the image format for strings, and running in big-endian mode.

How do we know it’s big endian? Big endian is a safe guess for non-X86 architectures, but in this case it’s more than a guess: the addresses and lengths in the file were big endian. What if we hadn’t seen a string identifying it as MIPS? I’d probably take a fragment of the file and feed it through “objdump -d” multiple times, specifying MIPS, PowerPC, ARM, and SPARC, in that order. You know you have a match when the instructions sort of make sense.

I let it load as one big segment. Once loaded in IDA, I go to “Edit -> Segments -> Rebase Segment” and enter the address I came up with in step #9. Rebasing tells the disassembler what the load offset is.

Here I have to tell you about Thomas’ irrational fear of IDA’s rebasing. According to Tom this is based on a bad experience rebasing a 10 megabyte firmware image, waiting a day for it to finish, and only then noticing he got the address wrong. Here’s a handy trick Thomas uses instead of rebasing: feed the file to binutils “objcopy”. Objcopy is a best-kept-secret for reversing: you can feed it a file of type “binary”, and tell it to spit out an ELF file, specifying the header values. Several other tools suck at handling raw files, but rock at handling ELF. Why is his fear irrational? Well… I just disable IDA’s auto-analyze feature before I rebase. Then I do some manual poking around first to make sure I’m on the right track before turning it back on again. But… nobody tell Thomas this! He always comes up with crazy cool tricks using other tools when he gets pissed off at IDA.

12. In Which Work Out A Symbol Table

Based on where I found strings in the image, I have a pretty good idea of the general region of memory where program data lives. Remember those function names I filed for later? Now is later.

I take the addresses of a few function name strings I find close together and search the file for the four bytes making up their (newly rebased) addresses. I’m consistently getting search hits near the end of my probable data segment, all in the same general area. Look around there and start converting every 4-byte aligned chunk starting with 0x104 to an offset. I start seeing offsets to strings of what looks like function names at 16-byte intervals. The string offsets are right next to offsets pointing way back into the code area and some pointing into the data area. I look a little closer at the hex dump of this area:

http://www.matasano.com/download/color_dump_2_with_idastruct.png

Blue is what points to the “name” of the symbol, red points to the actual thing in memory. The last column is the “type” of thing. Green things (0x500) are functions, and purple (0x700) are data. There’s also some oddballs strewn about with a type of 0x900. These point way out of bounds past the end of my rebased file. Could be a segment I don’t know about from this file, or it could just be something else created at run-time. I don’t stress about this, stick with what I know for sure right now.

I take this pattern and translate it into a structure for IDA, then find where the table begins and ends based on the pattern. This is an array, so that’s how I define it in IDA. The last element of the array is 16 bytes of 0x00 helping me (and IDA) see where it ends.

13. In Which I Import The Table Into IDA

Things start getting recognized quickly by IDA’s auto-analysis once I tell it about all these symbol offsets. But I also want IDA to know the names of everything in the symbol table.

I write a quick Ruby script ("http://www.matasano.com/download/vx_sym2idc.rb") to write an IDC script to do this work for me. IDC is the scripting language built into IDA; in the 21st century, most people write their IDA tools in Python or Ruby, but it’s sometimes faster to write short scripts that use IDC as a half-way format.Run ruby… run IDC… take a look at the results. Awesome! Just about everything was identified in this symbol table! Furthermore, everything was identified in the same segment off of one load address.

Just to make sure about that missing header I check my final results from the symbol table. Indeed, two of the function symbols point right at 0x10005000, the beginning of the program. One’s called ‘sysInit’ and the other ‘start’. There can’t be a header there, it’s a function. Though I can tell as much from the name “start”, a little googling and research ("http://spacegrant.colorado.edu/~dixonc/vxworks/docs/vxworks/guide/c-config7.html#86422") tells me “sysInit” is what VxWorks usually uses as its program entry point.

.

Besides proving the firmware loading risk, I had what was shaping up as a target rich source of vulnerabilities on my hands. I really wanted a look at the code running on the device for the purposes of additional vulnerability testing. Now that I had readable, cross-referenced disassembly, I had a lot more to work with for finding and confirming other vulnerabilities both already found, or that I might find as I moved forward.

As Tom put it, “hilarity ensued”. Truly, vulnerability research is just as good in ‘08 as it was in ‘05. Unfortunately, I can’t talk specific details from there, but I wanted to share with our readers the road I took to this point.

Getting a full reversing of an embedded system like this is pretty satisfying and can open up lots of possibilities. The DD-WRT ("http://www.dd-wrt.com"), OpenWRT ("http://www.openwrt.org"), and NLSU2-Linux Unslung ("http://www.nslu2-linux.org/wiki/Unslung/HomePage") crowds have gone nuts with this sort of thing with pretty ("http://www.dd-wrt.com/wiki/Supported_Devices") impressive ("http://wiki.openwrt.org/TableOfHardware") results putting Linux distros on just about anything they can get their hands on.

I’d love to hear about some of your own embedded system reversing experiences.

http://www.matasano.com/log/1047/toast-spells-tsaot-in-reverse/