Skip to main content

Angr: A Multi-Architecture Binary Analysis Toolkit

This blog is quoted from several angr blogs and documentations, click here and here.

angr is a multi-architecture binary analysis toolkit, with the capability to perform dynamic symbolic execution (like Mayhem, KLEE, etc.) and various static analyses on binaries.
We've tried to make using angr as pain-free as possible - our goal is to create a user-friendly binary analysis suite, allowing a user to simply start up iPython and easily perform intensive binary analyses with a couple of commands. That being said, binary analysis is complex, which makes angr complex. This documentation is an attempt to help out with that, providing narrative explanation and exploration of angr and its design.
Several challenges must be overcome to programmatically analyze a binary. They are, roughly:
  • Loading a binary into the analysis program.
  • Translating a binary into an intermediate representation (IR).
  • Performing the actual analysis. This could be:
    • A partial or full-program static analysis (i.e., dependency analysis, program slicing).
    • A symbolic exploration of the program's state space (i.e., "Can we execute it until we find an overflow?").
    • Some combination of the above (i.e., "Let's execute only program slices that lead to a memory write, to find an overflow.")
angr has components that meet all of these challenges. 
angr internals
angr is the popular framework for analyzing binary programs, from embedded firmware, to hardcore CTF challenges, all from the comfort of Python. angr’s roots lie in the Valgrind VEX instrumentation framework, meaning it benefits from the multi-architecture support and community maintenance. However, we live in a big world full of crazy things that aren’t Intel or ARM-based Linux machines.
angr now supports extensions to each of its core components: the loader, architecture database, lifter, execution engine, and simulated OS layer. We will be exploring each in turn, with the goal of bringing the complete suite of powerful angr analyses to bear on a totally new class of program that it was not designed to before.
In order to not overcomplicate things, and make the core ideas clear, we’re going to start with something conceptually simple.
Sorry, that BrainFuck thing was not a joke. In this guide, we’re going to build the most insanely overkill BrainFuck analysis platform ever constructed. By the time you’re done here, you’ll be able to totally obliterate any of the Brainfuck crack-me programs that I hear may even actually exist.
First, let’s go over the components themselves, and how they fit together.
The angr lifecycle
If you’ve used angr before, you’ve probably done this: (blatantly stolen from angr-doc’s fauxware example)
Figure 1. the angr lifecycle
CLE, the loader 
The first thing that happens when you create an angr project is angr has to figure out what the heck you just told it to load. For this, it turns to the loader, CLE (CLE Loads Everythig) to come up with an educated guess, extract the executable code and data from whatever format it’s in, take a guess as what architecture it’s for, and create a representation of the program’s memory map as if the real loader had been used. CLE supports a set of “backends” that service various formats, such as ELF, PE, and CGC. For the common cases, this means loading an ELF, which brings with it the complicated mess of header parsing, library resolution, and strange memory layouts you both require and expect. It also supports the exact opposite of this, pure binary blobs, with a backend that just takes the bytes and puts them in the right place in memory. The result is a Loader object, which has the memory of the main program itself (Loader.main_object) and any libraries. 
Archinfo, the architecture DB
During CLE’s loading, it takes a guess as to what architecture the program is for. This is usually via either a header (as in ELFs) or some simple heuristic. Either way, it makes a guess, and uses it to fetch an Arch object from the archinfopackage corresponding to it. This contains a map of the register file, bit width, usual endian-ness, and so on. Literally everything else relies on this, as you can imagine.
SimEngine, the simulated executer
Next, angr will locate an execution engine capable of dealing with the code it just loaded. Engines are responsible for interpreting the code in some meaningful way. Fundamentally, they take a program’s state– a snapshot of the registers, memory, and so on– do some thing to it, usually a basic block’s worth of instructions, and produce a set of successors, coresponding to all the possible program states that can be reached by executing the current block. When branches are encountered, they collect constraints on the state which capture the conditions needed to take each path of the branch. In aggregate, this is what gives angr its reasoning power.
PyVEX, the lifter
angr’s default engine, SimEngineVEX, supports many architectures, simply because it doesn’t run on their machine code directly. It uses an intermediate representation, known as VEX, which machine code is translated (lifted) into. As an alternative to creating your own engine for a new architecture, if it is similar enough to a “normal” PC architecture, the faster solution is to simply create a Lifter for it, allowing SimEngineVEX to take care of the rest. We will explore both Lifters and Engines in this guide.
Claripy, the solver
Every action an engine performs, even something as simple as incrementing the program counter, is not necessarily an operation on a concrete value. The value could instead be a complicated expression, that when computed on, should actually result in an even bigger expression. Creating, composing, and eventually solving these is Claripy’s job. Claripy uses a SMT-solver, currently Microsoft’s Z3, to do all of this heavy-lifting. Thankfully, we won’t need to delve into that in this series, as SMT-solving is some serious black magic.
SimOS, the rest of the nasty bits
If we just view the engine’s work on a program from the states it provides, we’re going to have a lot of work to do to get anything useful out. Where is stdin? What the heck do I do with files? Network? Are you kidding? These higher-level abstractions are provided by the OS, and don’t exist at the bare machine level. Therefore, SimOS’ job is to provide all of that to angr, so that it can be reasoned about without all the pain of interpreting just what the fake hardware would do. Based on a guess from CLE, a SimOS is created (ex. SimLinux), which defines the OS-specific embellishments on the initial state of the program, all its system calls, and convenient symbolic summaries of what syscalls and common library functions do, known as SimProcedures. These make angr dramatically faster and more compatible, as symbolically executing libc itself is, to say the least, insanely painful.
angr, the real deal
Finally, with a Loader, an Engine, an Arch, and a SimOS, we can get to work! All of this is packaged into a Project, and offered to the higher-level analyses, such as Control-flow Graph reconstruction, program slicing, and path-based reasoning.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 

Architectures

Our Arch: BrainFuck.

We’re going to implement BrainFuck in angr, because it’s one of the simplest architectures that exists, but yet is far enough from the “real” architectures angr already supports to show off its flexibility.
BrainFuck is an esoteric programming language created by Urban Muller to be simple in concept, but really painful to actually use.
BrainFuck implements a Turing machine-like abstraction, in which a infinite(ish) tape of symbols contains the program, and another tape of “cells”, holds the program’s state (memory). Each cell is an unsigned byte, and the cell being referred to by instructions is chosen by the current value of a “pointer”. BrainFuck’s instruction pointer starts at 0, and the program ends when it moves past the last symbols. The data pointer starts at cell 0.
BrainFuck has only 8 instructions:
> - Move the pointer to the right (increment)
< - Move the pointer to the left (decrement)
+ - Increment the cell under the pointer
- - Decrement the cell under the pointer
[ - If the value at the pointer is zero, skip forward to the matching ]
] - If the value at the pointer is non-zero, skip backward to the matching [
. - Output (print to stdout) the cell at the pointer
, - Input (get character at stdin) to the cell at ptr

Defining our architecture

From the description above, we notice a few things:
  • This is a “Harvard” architecture, data and memory are separate.
  • We have two real registers here: A pointer ptr, and the usual instruction pointer ip.
  • Memory accesses in BF are all in terms of a single byte. There’s no endianness to worry about. However, the width of ip and ptr are not defined.
  • We have to do something about input and output.
This last point is worth some discussion. In traditional architectures, this is handled by GPIOs, or some complicated mess of peripherals driven by the OS. We have none of that, we just want bytes in and bytes out. We’ll need to help angr out a bit with this one; there are a few possible ways to implement this, but we’re going to explore one that pretends there are mythical system calls to get our bytes in and out. In a “normal” architecture, this means there’s a syscall number in a register somewhere. We’re going to pretend that exists too.

archinfo

archinfo is the class in the angr suite that holds all of this information. To create a new arch, you simply make a subclass of archinfo.Arch, and define your registers, their aliases, some info about bit widths and endianess, and so on.
Now, let’s lay down some code.
First, the boring stuff:
from archinfo import register_arch
 
class ArchBF(Arch):
    def __init__(self, endness="Iend_LE"):
        super(ArchBF, self).__init__('Iend_LE')
We’ll call this arch little endian, since we have to say something, and in this case it doesn’t matter.
Next, some simple metadata:
        self.vex_arch = None
        self.name = "BF"
Names are usually all-caps. As I mentioned above, the bit-width here corresponds to the address space and register widths, and we don’t have one defined, so I picked 64. VEX doesn’t support this arch, so vex_arch is None.
Now here’s the register file:
        self.registers["ip"] = (0, 1)
        self.registers["ptr"] = (1, 1)
        self.registers["inout"] = (2, 1)
        self.registers["ip_at_syscall"] = (3, 1)
I mentioned the ‘inout’ register, which is our syscall number when picking input vs output. However, we have another fake register ip_at_syscall, which is used by angr to track syscall-related return behavior. Don’t worry about it, just put it here.
You can also assign aliases, like pc for ip.
        self.register_names[self.registers['ip'][0]] = 'pc'
        self.register_names[self.registers['ip'][1]] = 'ip'
        self.ip_offset = self.registers["ip"][0]
Various kinds of reasoning need to where the ip is rather explicitly. We set that here too.
Finally, we need to tell archinfo about our new arch:
register_arch(['bf|brainfuck'], 64, 'any', ArchBF)
The first argument is a list of regular expressions that match the name of our architecture. (Note, as of this writing, you can assume input is lowercase). Next, the bit-width of our arch, which is 64. The third argument is the endness, which can either be “Iend_LE”, “Iend_BE”, or “any”. (these constants come from VEX, if you’re curious) ‘any’ means this Arch will match for either endness.
This is used by archinfo.arch_from_id() to look up the Arch for a given set of parameters. Given the various circumstances under which this is needed, we deliberately make this super flexible, and encourage you to make your mappings flexible too.
That’s it!
This doesn’t do a whole lot yet, but it’s an important first step.
We’ll build on this in the next part, where we get angr to load BF programs into its simulated memory.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Loads


Now we’re going to focus on the first actual step in the process of analyzing a program: Figuring out what it even is, and loading it into our system’s memory somehow.

CLE: CLE Loads Everything

The angr suite uses CLE to load binaries. It serves as a logical implementation of the Linux loader first and foremost, but supports other OSes and formats through a series of “backends”.
CLE is given, by angr, a path to the “main binary”. This is the argument to angr.Project(). It’s CLE’s job to open it, figure out what it is, figure out what architecture it is (unless the user overrides), figure out where in memory it’s supposed to go, and load any dependencies, such as shared libraries, that it may depend on.

Shortcut: What if my architecture uses ELF files?

You don’t need to do anything; CLE will load your binary just fine. However, you will need to tell angr which sorts of ELF files map to the correct SimOS environment model, described in section 6. These are tracked by the content of their EI_OSABI field.
If you have pyelftools installed, you can identify this simply like this:
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 ff 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            Standalone App
  ABI Version:                       0
To register this with our MSP430 environment model, we would simply do:
register_simos('Standalone App', SimMSP430)
Were your binaries made by orcs instead? Read on.

Defining the BF backend

For BrainFuck, we really just need to read the program in, and put it in memory, rather straight-up, at address 0. You could get away with using the Blob loader backend directly to do this, but we’re going to make it a little more explicit and demonstrative and define our own based on it.
First, the boring stuff:
from archinfo import arch_from_id
import re
import logging
l = logging.getLogger("cle.bf")
__all__ = ('BF',)
 
class BF(Blob):
    def __init__(self, path, offset=0, *args, **kwargs):
 
        super(BF, self).__init__(path, *args,
                arch=arch_from_id("bf"),
                offset=offset,
                entry_point=0,
                **kwargs)
Normally, to use the Blob loader, you must specify an entry point and arch manually. We want to be able to just use angr.Project() on a BF program and have it work, so we subclass the Blob loader, and give it this information.
Next, we need to tell CLE when this loader will work on the given file, so that it can pick the right backend. Technically, by many definitions of BF, you can have other non-instruction characters in a file and still have it be valid. For the ease of demonstration, let’s keep it simple and support the “non-compatible” BF syntax of only the instructions and newlines.
    def is_compatible(stream):
        bf_re = re.compile('[+\-<>.,\[\]\n]+')
        stream.seek(0)
        stuff = stream.read(0x1000)
        if bf_re.match(stuff):
            return True
        return False
Don’t forget to seek the stream to 0!! Some other is_compatible, or the rest of CLE, is going to use it later. As they used to say when I was a kid, “Be kind, rewind” :)
Last but not least, we need to tell CLE about our backend.
register_backend("bf", BF)

That’s it?? That’s it.
If you want to see a more complex, but not too-complex example on something a bit less contrived, check out CLE’s CGC backend. CGC binaries are deliberately simplified ELF-like files, that were designed to make analysis easy, so they make a nice template for crazier formats.
Next time, we get to talk about lifters! Get that protein powder ready.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Vex and Gymrat

In order for angr to perform any sort of analysis on binary code, we need to first translate, or lift, this code into an intermediate representation (IR) that angr uses, called VEX.
VEX is the IR used by the Valgrind analysis tools. angr uses the libvex library also used by Valgrind, etc. to lift code, and uses its pyvex package to provide a pythonic interface to the resulting IR objects.
However, libvex and Valgrind were tailor-made for doing what they do best: analyzing lots of desktop-ish programs. What if we want to do something super non-traditional? Like Brainfuck? Or even something a bit more reasonable like MSP430.
That’s where the gymrat framework included with newer versions of pyvex comes in. Gymrat’s goal is to make lifting just about anything easy, by moving the focus from messing with parsing and bits and what not, to simply and quickly specifying what the instructions actually do, which is magically translated into VEX.
Building your workout plan
Before you jump into lifting, you’re going to need some sort of plan on how to structure your lifter, to make the process easier, and to make auditing the result less painful.
Know your body
The most important part of this planning process is becoming familiar with your chosen architecture, and particularly its instructions. We touched on archinfo in a previous part of this tutorial, and we assume you have already built the archinfo class for your architecture, with all of the register maps, and so on. In this section, we will be using the BF and MSP430 examples introduced earlier to demonstrate how we designed the lifters, and why.
Your first step should be to find an Instruction Set Architecture (ISA) document, containing, at least, the binary formats for the instructions, and hopefully a precise description of their effects of the processor.
A few questions to ask yourself while reading:
  • How are the instructions formatted? Are there a few formats that cover all possible instructions? Or is each instruction different? Is there the notion of an “opcode”?
  • How are arguments to the instructions specified? registers, memory address, intermediats, offsets, etc
  • What are the primary side-effects of instructions? (e.g., flags)
Let’s consider our example of MSP430. See here for one of many references. MSP430 instructions take one of three types, having zero, one or two operands. One operand instructions take the form src = src (op) src. Two-operand instructions take the form dst = src (op) dst. Zero-operand instructions are conditional jumps, and merely have a condition code, and a 10-bit immediate destination address offset. Each format has its own notion of “opcode”.
MSP430 supports a wide variety of possible sources and destinations, based on addressing mode bits, and special register values, contained in each instruction. Operands can be the usual register contents, or can be combined with an immediate 16-bit extension word. Instructions also support handling data of different sizes, either an 8-bit byte, or a 16-bit word, based on a flag. Instructions can set one of four flags (Carry, Negative, Zero, and Overflow), although the behavior of these is far from unifrom.
This means, in summary, that there is some logic that’s common to all instructions, and some common to each type. There are, of course edge cases, but all of this can be specified neatly using gymrat.
Know your equipment
Here we will introduce briefly the primary classes used to write a lifter using gymrat. All of the following are contained in pyvex.util:
GymratLifter
This is the actual lifter class used by pyvex. You will need to make a subclass of this, and provide a property instrscontaining a list of possible instruction classes. GymratLifters are provided with a block of code to lift in their constructor, and when lift() is called, will iterate through the code, matching instruction classes to the bytes, and populating an IRSB object (IR Super Block) with the appropriate VEX instructions. This IRSB gets returned eventually to angr, and used for its analyses. By default, GymratLifter will try using every instruction contained in instrs until one succeeds. Don’t forget to call pyvex.lift.register() to tell pyvex that your new lifter exists.
Type
In the binary world, a “type” here merely denodes how many bits wide a value is, and how it is interpreted (int, float, etc) This class uses “type imagination”, don’t worry about what sizes it supports, it will make them up for you. Simply use Type.int_16 for a 16-bit integer, or even Type.int_9 if you really want to (cLEMENCy you say? Yeah, we can do that.) You’ll see these mentioned around as the argument named ty.
Instruction
You should create a class for every instruction in your architecture, which should be subclasses of Instruction. Instructions receive the bitstream given to the lifter, and attempt to match it with a format string (self.bin_format), which both identifies that this is the correct instruction, and parses the various operands and flags. Format strings are specified similar to how many ISA documents will; for example, a 2-operand instruction, with fixed bits of 1101, and 2x2 bits of mode flags, could look like 1101ssssmmddddMM. The instruction would only match if it started with 1101, and each similarly-lettered bit would be extracted into a dictionary keyed by the letter.
The Instruction class has a number of methods designed by overriden by its subclasses, to modify behavior for each instruction or instruction type. Here’s a brief summary:
  • parse: Called by the lifter to try and match the instruction. Returns a dictionary of parsed bits on success, or does something else (raise) on failure You may want to extend this to implement changes in how data is parsed, based on previous parsed values (e.g., get an extra word if a flag is set)
  • match_instruction: Optionally implement this to match the instruction based on a bit format symbol; for example, you could use o as your opcode, and match it here. Return something on success, raise on failure.
  • lift: Called by the lifter after the instruction is matched. By default, it simply calls all of the following functions in order, but you can override this to change this or add your own.
  • mark_instruction_start: Should be called at the beginning of lifting, creates the VEX IMark instruction of the correct length.
  • fetch_operands: Implement this to specify how operands are fetched. You’ll probably want to use get() and load()below.
  • compute_result: This is where the meat of your instruction goes. Compute the actual result, and return a VexValue of the result. You will make heavy use of the VexValue syntax helpers here; for example, a normal add could simply be return src + dst You should also commit your result using put or `store, unless you chose to do that somewhere else.
  • compute_flags: Compute and store the flags affected by the instruction. Gets the same arguments as compute_result, plus the addition of the computed result, to make flag expressions easier.
Instruction also contains a few important methods meant to be called by its subclasses to implement the above methods.
  • get(reg_num, ty) Get register from a physical machine register into a temporary value we can do operations on.
  • load(addr, ty): Similar to the above, but loads from a given address in memory
  • put(val, reg_num): Puts a given temporary value into a physical register.
  • store(val, addr): Store a given value at an address in memory
  • jump(when, where): Conditionally jump to a given location
  • constant(int_val, ty): Creates a temporary values from an integer constant.
(Note: there is also ccall(); If you have something really messed up you don’t think you can express correctly, such as something that needs extensive runtime information, you may need a ccall, but try to avoid it if you can. See the python docs for info.)
VexValue
What are all these ‘temprary values’? How do I actually specify what instructions do? That’s the magic of VexValue. In VEX, you cannot do operations directly on registers or memory. They must be first loaded into a temporary variable, operated on, and then written back to the registers or memory. We wanted the lifter author to think as little about this as possible, so VexValue makes this whole process a snap.
A VexValue can be created in two different ways: by loading it out of the machine’s state using get() or put(), or by creating a constant value with constant(). You can then do normal python operations to them like any other value! VexValues have a set Type when they are created; you can cast to a new type using the cast_to(ty) method. You can even fetch bits using python’s slice and index notation!
Here’s an example, the xor instruction from our MSP430 lifter. Of course you have to xor, but what about the types? What’s the VEX operation for xor? Weird expressions for the flags?
Nah.
   defcompute_result(self, src, dst):
       returnsrc ^dst
Or something boolean:
   defcarry(self, src, dst, ret):
       returnret !=0
It’s pretty magic.
Use the proper form
As in exercise, using the proper form when lifting is better for your health, and just makes things work better. Its time to put the two sections above together and make your lifter’s design. A good lifter design, like any other piece of software, must minimize the amount of repetative code, while still being readable. In particular, we’d like to make the structure of our lifter as close to that of the documentation, to allow for better manual auditing.
Let’s walk through the design of our MSP430 lifter. We’ll come back to the BF example later; it’s too simple for this discussion.
As mentioned above, there are a lot of common tasks all MSP430 instructions must do, such as resolving the operands and addressing modes, grabbing the immediate extension words, and the write-back of the results of operations. These are defined in MSP430Instruction, a direct subclass of Instruction. We’ll also define how the Status Register (flags, in compute_flags) works, and how the four flags are packed inside when it is updated. Because the three types have their own opcode, we define match_instruction to check the o symbol here. As a final step, how values are committed to the machine’s state is dependant on the addressing mode (writing to a register, vs indexing into memory, etc), and is handled in this class as well; we expect compute_result to return the value to be written out, or None if that instruction doesn’t commit.
We will then define a class for each of the three types. These will set the bin_format property, as well as overriding fetch_operands to resolve the source and/or destination registers/immediates/etc, and simply return src and dst, which are passed to the instructionscompute_result` methods.
Finally, we will create a class for each instruction, subclassing the appropriate type, and providing only the opcode (to be matched in match_instruction), the compute_result function, which returns the value to be committed, and the computation of any flags the instruction modifies.
Time to get swole!
While we aimed these features to spare the user from thinking about an IR as much as possible (did you notice we told you almost nothing about how the IR actually works?), there’s no magical formula for getting totally shredded, or for lifting every architecture. CPU architectures, like human bodies, are different, and have their own quirks, so the best thing we can do is give you really in-depth examples.
Our fully commented example, which lifts MSP430 binary code into VEX, can be found here. You can also find the, much simpler, BF lifter here.
Built a really rad lifter? Let us know on Slack
Next time, we get to talk about execution engines! Better get fueled up.




Comments

Popular posts from this blog

Introduction to Meltdown and Escaping the Chrome Sandbox

R untime isolation and sandboxed environments are central to modern application security, but the most commonly used ones may not be as secure as we hope. Overview The general idea of isolated or sandboxed environments is to give a program a limited scope in which to operate. Instead of allowing a given program to use any of a machine’s resources, physical or virtual, you restrict its environment such that it can only access aspects of the system that the sandbox designer has decided are available for use by the program. This is not unlike putting your child in a literal sandbox with high walls – they are free to do whatever they want with all the sand, toys, and tools inside, but cannot interact with the environment outside. Isolation principles are in play at pretty much every aspect of modern computing. For example, last week a classmate wrote a blog on WannaCry, an exploit in Windows SMB older and unpatched versions of Windows. Without going into the detail...

Information Side Channel

By Elaine Cole and Jarek Millburg An information side channel can be used to gain information about the system or data that it processes. A side-channel attack identifies a physical or micro-architectural signal that leaks such desired information and monitors and analyzes that signal as the system operates. While there are many different types of information side channels and even more ways to maliciously exploit them, this blog explores a recent publication that leverages information side channels within IoT devices to aid crime scene investigators in real-time. In this blog, we provide an overview of the general attack procedure, and explore two of the many forms of side channel attacks. Side Channel Attack General Procedure While there are many different forms of side channels, at a high level, a side channel attack requires the following: 1. identify a side channel:  The attacker must first identify  a physical or micro-architectural signal that leaks...