Practical Binary Analysis: Book Review

A computer program is like a memo. Often, a vague memo.

You are the boss. You want a pile of files to be put away. You could do it yourself, but instead you instruct someone else to do it. There are a lot of them and they are all mixed up. So you write a memo to an employee that says “put the files away” and sis-bam-boom you’re all set.

Or are you?

It turns out that the files fall into four categories, each associated with a different set of file cabinets. All the files have labels with words and numbers on them. One set is filed alphabetically. Two sets are filed by the number that is on the label. The fourth set are stacked in an outbox in any order where they are to be processed by someone else at a later time.

So, if you just say, “file these” a lot of things can go wrong, unless he person who gets the memo knows what to do.

Your memo could read “Find the person who knows how to file stuff. Then have that person file these files properly.” In a computer language, that is a little like loading a library then using instructions from that library to carryout a task in a specific way.

```# file.py import alpha-numeric-filing as fileit```

``` ```

```def main(): files.fileit(file)```

… or code to that effect.

The point is, telling your underlings to do something only works if the underlings do it correctly. This is also true with computers, where a line of code seems to be instructing the processor, way down deep in the hardware, to do a certain thing, but it may or may not be actually doing that thing.

Computers are famous for making strange and unexpected (to the non-expert) calculations. For example, a computer will calculate the base 10 log of 2 as 0.000029995663981195 off of the actual number because of the way numbers are represented and manipulated inside the machine. So when you ask for that value in a computer program, you are making an incorrect assumption if you think it is being done correctly.

That is a known mathematical phenomenon and not of consequence to most people or most programs, but there may be other issues that are more important and even less predicted. Like the filing problem mentioned above, there may be assumptions built into the deeper level operation of a computer program or script once it is complied or interpreted, linked up with other software it works with, and deployed on this or that machine. This is the difference between working in the area of written code including uncompiled programs or scripts and the deeper binary code that is created by the software that interprets or complies your program.

Practical Binary Analysis: Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly by Dennis Andriesse (with a foreword by Herbert Bos) is a guide to looking into the binaries created by application-making software. This is a set of approaches to delve into the space in which malware may lurk, bugs may flourish almost invisibly, and inefficiencies may wreak their slow and ponderous havoc.

The book teaches how to:

• Parse ELF and PE binaries and build a binary loader with libbfd
• Use data-flow analysis techniques like program tracing, slicing, and reaching definitions analysis to reason about runtime flow of your programs
• Modify ELF binaries with techniques like parasitic code injection and hex editing
• Build custom disassembly tools with Capstone
• Use binary instrumentation to circumvent anti-analysis tricks commonly used by malware
• Apply taint analysis to detect control hijacking and data leak attacks
• Use symbolic execution to build automatic exploitation tools
• The most important way in which this book will change your life is that it will allow you to analyze binaries more automatically and with less manual work.

I can not truly evaluate this book because this is way beyond, or maybe, way below (as in down deep in the abyss of computer science where I assume small gnomes still write all the assembler language) my understanding of thing. But the book gains great praise from others, and is a brand new edition.

Dennis Andriesse has a Ph.D. in system and network security and uses binary analysis daily in his research. He is one of the main contributors to PathArmor, a Control-Flow Integrity system that defends against control-flow hijacking attacks such as ROP. Andriesse was also one of the attack developers involved in the takedown of the GameOver Zeus P2P botnet.

Have you read the breakthrough novel of the year? When you are done with that, try:

In Search of Sungudogo by Greg Laden, now in Kindle or Paperback