Introduction to Meltdown and Escaping the Chrome Sandbox

Runtime isolation and sandboxed environments are central to modern application security, but the most commonly used ones may not be as secure as we hope.

Overview

The general idea of isolated or sandboxed environments is to give a program a limited scope in which to operate. Instead of allowing a given program to use any of a machine’s resources, physical or virtual, you restrict its environment such that it can only access aspects of the system that the sandbox designer has decided are available for use by the program. This is not unlike putting your child in a literal sandbox with high walls – they are free to do whatever they want with all the sand, toys, and tools inside, but cannot interact with the environment outside.

Isolation principles are in play at pretty much every aspect of modern computing. For example, last week a classmate wrote a blog on WannaCry, an exploit in Windows SMB older and unpatched versions of Windows. Without going into the details of that attack (for the curious, reference the prior blog post about it here), the SMB provides access to shared files, printers, etc. on a network, and the WannaCry exploit allows the malware to transfer itself to other systems, infecting more and more people. This is effectively an isolation exploit because if the SMB were bug-free, anything using that protocol (including the WannaCry exploit) would only have access to (i.e. be isolated to) the printers, files, etc. that the SMB was designed to allow access to, and only in the ways that it was designed to allow. Instead, a specially crafted message sent to a device with the unpatched SMB allowed it to spread itself, breaking the isolation between systems on the network in a way not explicitly allowed by the protocol being used (SMB).

This blog will now discuss at some vulnerabilities present relating to process isolation, first looking at Meltdown and afterwards a recently patched virtual memory escaping vulnerability in Chrome. Even though Spectre is often talked about alongside Meltdown, I have chosen not to include it in this blog because while it also breaks isolation between programs, it is a side channel attack on speculative execution which are both topics for another lecture and blog, while Meltdown is more directly related to memory isolation.

Meltdown

The meltdown exploit effectively destroys one of the most important isolation layers in a computer system – the isolation between the OS memory and user memory. It does this by cleverly exploiting a race condition between privilege checking and instruction execution, relying on the side effects of Out of Order (OoO) execution which make this attack possible in the first place. While the attack itself is a side channel attack, it directly related to isolation because, as mentioned, it effectively removes the isolation between a user’s memory space and the operating system’s memory space.

Basically, modern processors execute instructions out of order in order to optimize performance. This is because often, the “program order”, which is the actual order of instructions coming into the processing unit, would cause the processor to stall as it waits for something to complete (i.e. waiting on data to arrive so that the processor can grab it). Instead, processors reorder execution so that at the end, all appears to have occurred in order, but in reality it did not. It turns out that these out of order operations affect the cache, and it also turns out that the reordering of execution can cause a privileged action to occur in response to an unprivileged call before the privilege check occurs. If this happens, the processor will throw away the privileged call because it was unauthorized, and at the end of execution, that unprivileged access to privileged information will appear to have never happened. However, since these out of order executions still affect the cache, it turns out a side channel attack allows you to dump the contents of the cache in a way that you can observe. I will not dive into the specifics of the side channel attack since that is a topic for another lecture and blog, but we can look at the following example to see how we can get access to things we shouldn’t.

In the above example, an exception is raised, which is supposed to return control flow to the kernel. However, since the access call does not directly rely on the exception in order to run, OoO execution on the processor may have caused the array access call to have already executed, and thus its contents are stored in cache. Even if the CPU discards the instruction, the results are still cached, and the aforementioned side channel attack can get those contents out of the cache. Now imagine that the line that raises the exception, rather than always raising one, raised an exception in response to a privilege check. Well, the array access call still does not rely on that call, so it may still be executed out of order, and thus that information can be accessed unprivileged.

Hopefully this has illustrated the manner in which the Meltdown attack completely deteriorates the barrier between the user and kernel memory space, and how it allows unprivileged access to almost any information when cleverly exploited. Much of this information comes from the Meltdown Whitepaper as well as the Meltdown Wikipedia page.

Escaping the Chrome Sandbox

The rest of this blog will focus on one relatively recently found and fixed vulnerability that allowed escaping Chrome’s sandboxed virtual memory. This issue was fixed as of the [72.0.3626] release on January 29, 2019, thus the last release that this vulnerability was present in was release [71.0.3578.98]. Much of the information for the exploit comes from this article from the Project Zero team at Google.

This exploit is pretty convoluted, but the essence of it is that you can get access to inter-process communication (IPC) interfaces in Javascript, and the IPC allows unprivileged renderer process to interact with sandboxed files via the privileged browser process. A way to do a use after free attack was discovered, where you can ask the privileged browser process to do perform a file write, it agrees, but before it takes that action, you delete the file it was supposed to write to.

The reason this occurs has to do with the details of Mojo, the IPC interface used by the chromium engine. First, they discovered that they could get ahold of an instance of a class called RenderFrameImpl and set a member variable of it called enabled_bindings_ to BINDINGS_POLICY_MOJO_WEB_UI, since that will cause it to enable Mojo bindings in Javascript. The function that causes this can be seen here.

void RenderFrameImpl::DidCreateScriptContext(v8::Local<v8::Context> context,

                                             int world_id) {

  if ((enabled_bindings_ & BINDINGS_POLICY_MOJO_WEB_UI) && IsMainFrame() &&

      world_id == ISOLATED_WORLD_ID_GLOBAL) {

    // We only allow these bindings to be installed when creating the main

    // world context of the main frame.

    blink::WebContextFeatures::EnableMojoJS(context, true);

  for (auto& observer : observers_)

    observer.DidCreateScriptContext(context, world_id);

Then, by forcing the creation of a new ScriptContext (i.e. re-instantiating their Javascript context) for the main Chrome frame, they get access to all of the Mojo bindings within Javascript, including the FileWriter method. This in itself does not lead to an exploit, but it is a necessary first step to gain access to the tools needed to start working on an exploit. The following is the vulnerable function

void FileWriterImpl::Write(uint64_t position, blink::mojom::BlobPtr blob,

WriteCallback callback) {

blob_context_->GetBlobDataFromBlobPtr(

std::move(blob),

base::BindOnce(&FileWriterImpl::DoWrite, base::Unretained(this),

std::move(callback), position));

}

The implications of the call to base::Unretained(this) in the above function is that by passing a instance of Blob to the FileWriter method, which we got access to earlier, gives the renderer process a callback from the main browser process during the time that its accessing the blob data, allowing us to destroy the FileWriter object and thus cause an area in memory to be accessed that has already been freed.

At this point, I will briefly introduce the separation of the browser and renderer processes in Chromium. As we can see in the diagram below, every renderer process is separated from the main browser process, interacting via the Mojo IPC with the main browser process. Additionally, the renderer processes are unprivileged, while the main browser process is privileged. This is why we have to use the unsafe callback in the IPC from the unprivileged user process to execute something in the privileged browser process.

After realizing the read after free issue, it was realized that other than freeing it, it could be relatively easily replaced by registering a blob with the registerFromStream method, allowing use of a fake object that contains the addresses crafted to work with the final aspect of the attack, described next. From here, they noticed that the ASLR (address space layout randomization) implementation on the systems they were using (Windows) caused any libraries loaded from the renderer space to always be loaded at the same address within the browser space. Knowing this, they could identify gadgets to chain together and take note of their memory addresses. Then, noticing that there’s no limit on the virtual address space in Chrome, they create a file to later “spray” the browser process address space with a file (around 3.5 to 4TB is the effective size presented by the paper) that contains what is effectively a bunch of shared memory mappings to the gadgets in the code they intend to exploit. Doing this allows the attack to bypass ASLR because now the memory space is flooded with mappings to the address we want.

After creating this file is when the attack is launched. At this point, you trigger the replacement in memory that will be accessed in a callback provided by the browser, and spray the browser processor address space, causing our malicious gadget chain/payload to be executed. Though convoluted, this attack does demonstrate a way to effectively escape the unprivileged renderer process space and run arbitrary code in the browser space of Chrome; At least in versions of Chrome prior to the patches that fixed the vulnerability.

The cited article mentions that the two key things that allowed this attack were

- No inter-process randomisation on Windows (which is also a limitation on MacOS/iOS) which enabled locating valid code addresses in the target process without an information-leak.

- No limitations on address-space usage in the Chrome Browser Process, which enabled predicting valid data addresses in the heap-spray.

Since they don’t have control over OS level process randomization, I imagine that their fix for this bug revolved around limitation on address spaces in the Chrome Browser Process. By reducing it’s maximum size from unlimited, it becomes much more difficult (potentially unfeasible) to predict a valid data address using the heap spraying technique described above.

While this specific vulnerability is no longer present in updated versions of Chromium, it goes to show that while we rely on isolation at multiple levels in all of our modern computing, we should not always just take it for granted – even when it comes from Google.