Runtime isolation and sandboxed environments are central to modern
application security, but the most commonly used ones may not be as secure as
we hope.
Overview
The general idea of isolated or sandboxed environments is to give
a program a limited scope in which to operate. Instead of allowing a given
program to use any of a machine’s resources, physical or virtual, you restrict
its environment such that it can only access aspects of the system that the
sandbox designer has decided are available for use by the program. This is not
unlike putting your child in a literal sandbox with high walls – they are free
to do whatever they want with all the sand, toys, and tools inside, but cannot
interact with the environment outside.
Isolation principles are in play
at pretty much every aspect of modern computing. For example, last week a
classmate wrote a blog on WannaCry, an exploit in Windows SMB older and
unpatched versions of Windows. Without going into the details of that attack
(for the curious, reference the prior blog post about it here), the SMB provides access to shared files,
printers, etc. on a network, and the WannaCry exploit allows the malware to
transfer itself to other systems, infecting more and more people. This is
effectively an isolation exploit because if the SMB were bug-free, anything
using that protocol (including the WannaCry exploit) would only have access to (i.e.
be isolated to) the printers, files, etc. that the SMB was designed to allow
access to, and only in the ways that it was designed to allow. Instead, a
specially crafted message sent to a device with the unpatched SMB allowed it to
spread itself, breaking the isolation between systems on the network in a way
not explicitly allowed by the protocol being used (SMB).
This blog will
now discuss at some vulnerabilities present relating to process isolation,
first looking at Meltdown and afterwards a recently patched virtual memory
escaping vulnerability in Chrome. Even though Spectre is often talked about
alongside Meltdown, I have chosen not to include it in this blog because while
it also breaks isolation between programs, it is a side channel attack on
speculative execution which are both topics for another lecture and blog, while
Meltdown is more directly related to memory isolation.
The meltdown exploit effectively destroys one of the most
important isolation layers in a computer system – the isolation between the OS
memory and user memory. It does this by cleverly exploiting a race condition
between privilege checking and instruction execution, relying on the side
effects of Out of Order (OoO) execution which make this attack possible in the
first place. While the attack itself is a side channel attack, it directly
related to isolation because, as mentioned, it effectively removes the
isolation between a user’s memory space and the operating system’s memory
space.
Basically, modern processors execute instructions out of order in
order to optimize performance. This is because often, the “program order”,
which is the actual order of instructions coming into the processing unit,
would cause the processor to stall as it waits for something to complete (i.e.
waiting on data to arrive so that the processor can grab it). Instead, processors
reorder execution so that at the end, all appears to have occurred in order,
but in reality it did not. It turns out that these out of order operations
affect the cache, and it also turns out that the reordering of execution can
cause a privileged action to occur in response to an unprivileged call before
the privilege check occurs. If this happens, the processor will throw away the
privileged call because it was unauthorized, and at the end of execution, that
unprivileged access to privileged information will appear to have never
happened. However, since these out of order executions still affect the cache,
it turns out a side channel attack allows you to dump the contents of the cache
in a way that you can observe. I will not dive into the specifics of the side
channel attack since that is a topic for another lecture and blog, but we can
look at the following example to see how we can get access to things we
shouldn’t.
In the above example, an exception is raised, which is supposed to
return control flow to the kernel. However, since the access call does not
directly rely on the exception in order to run, OoO execution on the processor
may have caused the array access call to have already executed, and thus its
contents are stored in cache. Even if the CPU discards the instruction, the
results are still cached, and the aforementioned side channel attack can get
those contents out of the cache. Now imagine that the line that raises the
exception, rather than always raising one, raised an exception in response to a
privilege check. Well, the array access call still does not rely on that call,
so it may still be executed out of order, and thus that information can be
accessed unprivileged.
Hopefully this has illustrated the manner in which the Meltdown
attack completely deteriorates the barrier between the user and kernel memory
space, and how it allows unprivileged access to almost any information when
cleverly exploited. Much of this information comes from the Meltdown Whitepaper as well as the Meltdown Wikipedia page.
Escaping the Chrome Sandbox
The rest of this blog will focus
on one relatively recently found and fixed vulnerability that allowed escaping
Chrome’s sandboxed virtual memory. This issue was fixed as of the [72.0.3626] release
on January 29, 2019, thus the last release that this vulnerability was present
in was release [71.0.3578.98]. Much of the information for the exploit comes
from this article from the Project Zero team at Google.
This exploit is pretty convoluted, but the essence of it is that
you can get access to inter-process communication (IPC) interfaces in Javascript,
and the IPC allows unprivileged renderer process to interact with sandboxed
files via the privileged browser process. A way to do a use after free attack
was discovered, where you can ask the privileged browser process to do perform
a file write, it agrees, but before it takes that action, you delete the file
it was supposed to write to.
The reason this occurs has to do with the details of Mojo, the IPC
interface used by the chromium engine. First, they discovered that they could
get ahold of an instance of a class called RenderFrameImpl and set a member
variable of it called enabled_bindings_ to BINDINGS_POLICY_MOJO_WEB_UI, since
that will cause it to enable Mojo bindings in Javascript. The function that
causes this can be seen here.
void RenderFrameImpl::DidCreateScriptContext(v8::Local<v8::Context> context,
int world_id) {
if ((enabled_bindings_ & BINDINGS_POLICY_MOJO_WEB_UI) && IsMainFrame() &&
world_id == ISOLATED_WORLD_ID_GLOBAL) {
// We only allow these bindings to be installed when creating the main
// world context of the main frame.
blink::WebContextFeatures::EnableMojoJS(context, true);
}
for (auto& observer : observers_)
observer.DidCreateScriptContext(context, world_id);
}
void
FileWriterImpl::Write(uint64_t position, blink::mojom::BlobPtr blob,
WriteCallback callback) {
blob_context_->GetBlobDataFromBlobPtr(
std::move(blob),
base::BindOnce(&FileWriterImpl::DoWrite, base::Unretained(this),
std::move(callback),
position));
}
The implications of the call to base::Unretained(this) in the above function is that by passing a instance of Blob to the FileWriter method, which we got access to earlier, gives the renderer process a callback from the main browser process during the time that its accessing the blob data, allowing us to destroy the FileWriter object and thus cause an area in memory to be accessed that has already been freed.
At this point, I will briefly
introduce the separation of the browser and renderer processes in Chromium. As
we can see in the diagram below, every renderer process is separated from the
main browser process, interacting via the Mojo IPC with the main browser
process. Additionally, the renderer processes are unprivileged, while the main
browser process is privileged. This is why we have to use the unsafe callback
in the IPC from the unprivileged user process to execute something in the
privileged browser process.
After realizing the read after
free issue, it was realized that other than freeing it, it could be relatively
easily replaced by registering a blob with the registerFromStream method,
allowing use of a fake object that contains the addresses crafted to work with the
final aspect of the attack, described next. From here, they noticed that the
ASLR (address space layout randomization) implementation on the systems they
were using (Windows) caused any libraries loaded from the renderer space to
always be loaded at the same address within the browser space. Knowing this,
they could identify gadgets to chain together and take note of their memory
addresses. Then, noticing that there’s no limit on the virtual address space in
Chrome, they create a file to later “spray” the browser process address space
with a file (around 3.5 to 4TB is the effective size presented by the paper)
that contains what is effectively a bunch of shared memory mappings to the
gadgets in the code they intend to exploit. Doing this allows the attack to
bypass ASLR because now the memory space is flooded with mappings to the
address we want.
After creating this file is when the
attack is launched. At this point, you trigger the replacement in memory that
will be accessed in a callback provided by the browser, and spray the browser
processor address space, causing our malicious gadget chain/payload to be
executed. Though convoluted, this attack does demonstrate a way to effectively
escape the unprivileged renderer process space and run arbitrary code in the
browser space of Chrome; At least in versions of Chrome prior to the patches
that fixed the vulnerability.
The cited article mentions that the
two key things that allowed this attack were
-
No
inter-process randomisation on Windows (which is also a limitation on
MacOS/iOS) which enabled locating valid code addresses in the target process
without an information-leak.
-
No
limitations on address-space usage in the Chrome Browser Process, which enabled
predicting valid data addresses in the heap-spray.
Since they don’t have control
over OS level process randomization, I imagine that their fix for this bug
revolved around limitation on address spaces in the Chrome Browser Process. By
reducing it’s maximum size from unlimited, it becomes much more difficult
(potentially unfeasible) to predict a valid data address using the heap
spraying technique described above.
While this specific vulnerability
is no longer present in updated versions of Chromium, it goes to show that
while we rely on isolation at multiple levels in all of our modern computing,
we should not always just take it for granted – even when it comes from Google.
Sources
Comments
Post a Comment