# MDB, CTF, DWARF, and other angelic things

So what's this all about then? Debugging. I've written a lot of C, I still write a lot of C and I sure as hell end up debugging a lot of C. One thing that pisses me off is when I've got a core file, but I've no idea about the exact version or build of the ELF binary that produced it. The bottom line is that I still need to find the failure. Luckily, I've got mdb.

mdb is the Illumos Modular Debugger†† . It was originally designed to make debugging kernel issues easier, but it turns out it has some serious mojo up in user-space as well. Unfortunately, we'll also find it has some significant shortcomings too.

First and foremost, mdb is not a source debugger. It doesn't know anything about your code. You won't find back traces with line numbers or source code listings. What's more, you won't even find your stack variables exposed. How is this still useful? Well, when all you've got is a core, useful is as useful does... and it turns out it has some pretty damn fine extensibility.

### Compressed Type Format

Before we dive into the awesomeness of mdb, we're going to be pretty damn disappointed if we didn't build our binaries right. My hope is that in the very near future, the Illumos linker will do all of this magically for you, but for now we have to roll up our sleeves and get CTF into our binaries.

CTF (or Compressed Type Format) is basically definitions of data types and functions that orther tools (like mdb and DTrace) can understand. While I won't go into DTrace here, if you put CTF in your binaries, you'll win big over on the DTrace front also.

If CTF sounds a bit like the information provided in DWARF debugging sections, it is because it is a subset. Why didn't the tools builders just use DWARF? I've decided that it's just because they're assholes. Okay, maybe not, but I don't have a real reason and that one sounds both funny and quite believable.

We're going to put CTF into our binaries from the DWARF itself using the ctfconvert and ctfmerge tools†† . In the process, we'll be careful to leave the DWARF debugging information intact so that if we have the opportunity to pop into a source debugger (like gdb or dbx) we'll have all the fancy line numbers and variables from our stack frames.

In order to explore further, I need a program to debug. Instead of using a large and complex program, I'm going to write a "small" one that has a few bugs and look at using these tools to diagnose the issues. Every program you every build on an Illumos-based distribution should link against libumem†† . I've written of finding memory leaks with libumem before.

The program will read all of the words in the dictionary and store them in a hash table along side some metadata about their length an the number of capital vs. lowercase letters in them. Simple, pointless, but resembles real and useful code in that it allocates memory and puts it in a data structure.

For our hash table implementtion, I'll use the excellent one from Concurrency Kit.

First our Makefile:

Next our program:

Now, we compile it. ctfconvert will take the DWARF sections in the object files and convert them into CTF sections (rewriting the object file). Remember the -g, or else the original DWARF section will be removed in the process. ctfmerge does the same thing but for the final target: it taks the CTF sections in all the object files and merges them into a single CTF section in the target binary by rewriting it (and yes, you need -g in this step too else bye-bye DWARF section.

Next we run it (with UMEM_DEBUG enabled) and the bugs will come nibbling at us:

Remember the goal here is to be able to diagnose the failure wihtout the binary myprog. So, let's get start in mdb and check why we crashed and get a stack trace.

Now, you'll notice we dont't have line numbers here. We've got code... machine code... real code. We see that we died trying to free 0x179cf90 in free() call from 0x42 bytes into the free_words function. We can disassemble that easily enough:

So, now it is clear this free call the attempting to free the hash value, not the key. We might have been able to find the information out from libumem by asking what it knows about the 0x179cf90 pointer we tried to free:

This is may seem confusing at first becahse it actually looks like our stack trace. You might think, "well duh, I just freed it there, so of course." But we didn't free it, we crashed because we failed to free it. This stack is umem telling us that the buffer we attempted to free was previously freed at the above stack trace. This is clearly a double free.

Since we know that this allocation should have been of type struct word_meta, we can attempt to print it. An this is where CTF comes in... mdb doesn't have the binary, it doesn't have the source code, yet it know about the types.

As expected, it has been 0xdeadbeef'd from the previous free. Double free confirmed. This rather obvious bug (in word_get_meta) is left as an exercise to the reader.

We have some memory leaks in this... which I'll tackle in the next post. The libck.so makefile target should be foreshadowing for the advanced programmer.