Information Side Channel

By Elaine Cole and Jarek Millburg

An information side channel can be used to gain information about the system or data that it processes. A side-channel attack identifies a physical or micro-architectural signal that leaks such desired information and monitors and analyzes that signal as the system operates. While there are many different types of information side channels and even more ways to maliciously exploit them, this blog explores a recent publication that leverages information side channels within IoT devices to aid crime scene investigators in real-time.

In this blog, we provide an overview of the general attack procedure, and explore two of the many forms of side channel attacks.

Side Channel Attack General Procedure

While there are many different forms of side channels, at a high level, a side channel attack requires the following:

1. identify a side channel:

The attacker must first identify a physical or micro-architectural signal that leaks desired information about the system or data that it processes. Common examples of side channels include radio signals, execution time, and even power usage.

2. monitor and analyze the side channel:

Having found a useful side channel, the attacker now must track the signal as the system operates in order to gain valued information.

Example: Coffee Shop Side Channels

There are numerous forms of side channel attacks possible on various computational devices. Here’s just one hypothetical scenario showing the very real threat of side channel attacks:

Figure 1: Coffee Shop Example

Alice is an investment banker who makes several stock trades and writes sensitive emails using an encrypted connection to her employer’s server. Alice always sits at the table in the back of the shop keeping her back against the wall because she does not want others to observe what she is doing. Unfortunately this coffee shop she frequents is known to be visited by investment bankers, so at the next table we have Eve who is recording electromagnetic (EM) emanations from Alice’s laptop using an antenna hidden in a briefcase, Evan has installed a microphone under the table frequented by Alice to collect sound emanations from her laptop, and Ethan has attached a power meter, disguised as a battery charger, into the wall socket where Alice’s laptop is plugged in.

This shows us a couple different things. For starters, there are many different types of ways you can be targeted with side channel attacks and can be done around you with no real tells for you to pick up on. In the example, Alice sits in the back facing everyone in order to keep those around her from seeing what she is doing but as we see those around her have other ways of getting information about what she is doing without actually viewing her screen or keystrokes. Having her back to the wall is no longer sufficient; as our technology expands, so do the number of information side channels, as do the dangers.

Use Case: Side Channel Power Analysis

The publication used heavily in this section can be found here.

Here we explore one type of side channel previously mentioned: power. Following our previous example, Alice is using a device for encryption, and Ethan is executing what is known as a plain text attack to discover the key used.

In electronic devices such as the one that Alice is using, the instantaneous power consumption is dependent on the data that is being processed in that device as well as the operation performed by that device. By analyzing the power consumed by a device when it is doing encryption or decryption, the key can be deduced.

In the plain text attack we describe below, Alice’s device is running AES-128 encryption, and Ethan is using correlation power analysis to determine the key used in the AES cipher.

Correlation Power Analysis (CPA):

CPA is an algorithm used to do power analysis. There are four steps to a CPA attack: (1) pick a model for the victim device’s power consumption at a specific time during a cryptographic operation, (2) measure power consumption while the device is executing cryptographic operations, (3) work on smaller subkeys of the desired private key using Pearson’s correlation coefficient, and (4) combine the best subkey guesses to obtain the full secret key.

To model the power consumption, we use a Hamming Distance model. Given two binary numbers, the Hamming Distance is the number of different bits between the two, and the Hamming Weight is the number of 1s in a given binary number. By finding a point in the encryption algorithm where the victim (Alice) changes a variable value (say, from x to y), we can estimate that the power consumption is proportional to the Hamming Distance between x and y. To calculate the Hamming Distance, we use the XOR operator, and if we assume that Alice is replacing a variable value of zero (x = 0) with some other value y, then the Hamming Distance model is simplified to only the Hamming Weight.

Given basic AES-128 encryption algorithms, we choose to model the power consumption right after the first round of SubBytes() operations, which we will use with the sbox lookup table. We now collect traces of the device’s power consumption as described below.

Data Acquisition:

Plain text is given as input to the device system, and the system runs the encryption to output cipher text. The key used in this system is unknown to the outside. Using an oscilloscope which measures current, we can deduce the power that the device is using the setup as shown in figure 2. Specifically, we want to measure the power usage when the device is executing cryptographic operations.

The setup works as follows: we put a resistor along the path with which the microcontroller or device is grounded and, using the oscilloscope, measure the voltage drop across the resistor as given by the equation V = IR (Ohm’s Law), where V is voltage, I is current, and R is the resistance. In this case, we can deduce the current I and use the power formula P = VI to get the power measurement.

Figure 2: Measuring power using ground resistor

Taking measurements, we should have D power traces t, each of which will have T data points.

Pearson’s Correlation Coefficient:

With both our model of power consumption and traces, we now also estimate the power consumption in each trace with our model and Pearson’s correlation coefficient.

Assuming there are I different subkeys (small parts of the larger secret key), the h(d,i) is our power estimate of a given trace d and subkey i. With this, we can compare our model and measurements for each guess i and time j by finding how t and h correlate over the D traces through the equation in figure 3 below:

Figure 3: correlation equation

This equation is derived from Pearson’s Correlation Coefficient, which describes how closely random variables are related.

Subkeys:

Finally, we take the values derived from the equation in figure 3 to select which subkey most closely matches our trace. Given that Pearson’s correlation coefficient will always be in the range [-1, 1], we look for the highest value of r from figure 3’s equation for each subkey i, ignoring the sign. Through that data of the maximum values of each subkey, we find the overall highest value of r. The location of i is therefore our best guess, having correlated most closely with the traces.

Use Case: Side Channel With Electromagnetic Analysis For Investigation of IOT Devices
The publication referenced in this section can be found here.

Simple IoT devices that are unable to use computationally heavy algorithms to encrypt data due to the lack of computational resources tend to be programmed to perform a repetitive task continuously. Among these tasks certain ones have forensic interest. These include reading data from a specific on-board sensors, such as a microphone, writing data to an on-board storage device, such as an SD card, and even executing a command received remotely through the network. Identifying what operation an IoT device is performing at the moment it is seized in real time could prove important. An example of this could be if the device is currently wiping the SD card according to a command received remotely, the investigators need to know it immediately so they can power off the device without waiting for any further live analysis.

Setup:

Our goal is to train and test a machine learning model that can classify simple IoT firmware with increasing complexity. An Arduino device was used for the experiment as its simpler processor matches the resource profile of a lower-end IoT device. For classification, ten Arduino programs were selected that repeatedly perform a task inside an infinite loop. Figure 4 illustrates an example Arduino program used as a classification target.

Figure 4: An example Arduino program which performs a time complexity O(n) task repetitively inside an infinite loop which is used as a classification target.

As can be seen, the program consists of an infinite loop designed to represent a repetitive task of an IoT device with a time complexity of O(n). Each subtask the device is performing is represented by individual for loops. It is assumed that a malicious modification to the device is performed by adding a new subtask to the program or by removing an existing subtask from the program.

Data Acquisition:

For us to collect the EM trace samples for each program, the Arduino was programmed with them separately and allowed to run with a H-loop antenna placed approximately 1 cm above the microcontroller of the device. The HackRF was tuned to the information leaking 288 MHz frequency of the target device and sampled data at the rate of 20 MHz.

Each acquired EM trace was approximately 25 milliseconds long. With ten programs to detect, 600 EM traces were acquired per class, which resulted in 177 GB of data for the overall 6,000 EM traces. Figure 5 illustrates the power spectral density (PSD) of the EM emissions of four such programs subject to the experiment.

Figure 5: Power spectral density (PSD) of the EM emissions from four different Arduino programs which were used for classification.

Data Preprocessing:

From our extracted EM traces of each program class we extracted and converted to the frequency domain using Fast Fourier Transformation (FFT) 10 milliseconds long segments. From this we created a feature vector of 1,000 features by breaking a Fourier Transform vector into 1,000 buckets. It was noticed that averaging values within a bucket smoothed out the most significant frequency component under the noise floor. The most significant frequency ideally would have been selected as the representative element for the bucket. Because of this we decided to select the maximum value within each bucket instead of averaging in order to build the feature vector.

Classification:

Now we designed a neural network with two hidden layers, where the first hidden layer contained 10 hidden nodes while the second hidden layer contained 3 hidden nodes. The input layer contained 1,000 features and the output layer contains 10 output nodes. With our 600 training samples for each class, a total of 6,000 training samples were fed to the neural network to train and test the model to detect ten Arduino programs running on the target device.

Results:

Figure 6 illustrates the confusion matrix of the classification results. The programs subject to the experiment are labelled from 0 to 9 in the figure. As can be seen, the majority of the Arduino programs were detected by the classifier accurately. Under a 10-fold cross-validation, the classifier achieved a mean classification accuracy of 90% for an error margin of 11% within a 95% confidence interval. While considering that it is nearly impossible to identify the software activities of an IoT device without a significant support from the manufacturer, the achieved accuracy through EM-SCA can potentially be a significant benefit to an investigator to gain insight on the device.

Figure 6: Confusion matrix of the neural network classifier to detect ten different Arduino programs which are labelled from 0 to 9.

References:

H. Gamaarachchi, H. Ganegoda, "Power analysis based side channel attack" in CoRR, 2018, [online] Available: http://arxiv.org/abs/1801.00932.

A. Sayakkara, N. Le-Khac, M. Scanlon, “Leveraging Electromagnetic Side-Channel Analysis for the Investigation of IoT Devices”, 2019, [online] Available: https://arxiv.org/abs/1904.02089.

WashU Bear Shell Daily

Search This Blog