With the widespread use of computers and embedded systems in our personal and professional lives, a high degree of security is necessitated in the digital world. New cryptographic methods as well as improved security measures are continually being developed and improved in order to ensure data is safe and secure from attacks. However, even with these technologies, new attacks continue to emerge at an alarming rate [1]. While software security continues to evolve, the implementation of software itself in computing systems can prove to be a source for sensitive information leakage. Leakages of this nature can evolve in a number of different ways, but usually involve the workings of the hardware behind a system itself. Attacks exploiting this are known as information side-channel attacks. 
An early example of an acoustic side channel attack involved capturing and analyzing the audio output from the electronic components in keyboards and keypads, wherein attackers were able to determine the particular keys were being pressed [2]. This attack in itself highlights the potential for sensitive information leakage using this method, since attacks of this nature could be used to compromise personal information such as passwords or PIN numbers. Another example of an acoustic side-channel attack involved monitoring the audio signatures of a CPU during cryptographic processes to break an RSA encryption algorithm, and extract full 4096-bit RSA decryption keys from laptop computers [3]. More recently, similar attacks have been demonstrated on other encryption algorithms including DES and ECDSA [4,5]. All of these attacks are made possible by some of the most common electrical components in any computing system; capacitors and electrical coils, both of which emit unique sound profiles when exposed to currents [6]. While these components are the basis for most acoustic side channel attacks, newer attacks have demonstrated stealing the information from inkjet printers as well as 3D printers by analyzing the audio output from the motors in the printer itself during a print job [7,8]. 
The success of an acoustic side channel attack relies on the ability of an attacker to successfully capture the audio output from a target machine, and interpret the sound profile in the context of the attack. As such, these attacks often utilize pattern matching algorithms of some sort. And because of this, with advances in machine learning, the potential for these kinds of attacks has grown significantly [9]. Particularly, the use of neural networks in these attacks has become a growing area of interest. One particular attack utilizing this methodology involves the use of audio recording devices to decipher particular screen content from LCD screens based upon their unique acoustic profiles [10].
The idea of using side channel attacks to remotely detect the content of screen is nothing new. 
Originally demonstrated by Wim van Eck on cathode ray tubes [11],  the use of electromagnetic based EM attacks of this type has been around for decades [12]. And more recently, EM based side-channel attacks have been demonstrated on newer LCD screens as well [13]. Other attacks have been leveraged the use of reflection to create visual side channel attacks on flat screens [14]. However, the attack in focus that will be further discussed here relies solely on the audio signatures of LCD screens to decipher the content of what is visually being displayed.
In this attack type of attack, the goal is to capture all the acoustic information necessary ideally through the use of common audio recording devices. And this is exactly what Genkin et. al. examine through a variety of potential attacks on LCDs.
In order to understand the attack process, the basics behind how an LCD works must first be understood. These displays consist of a rows and rows of tightly packed pixels, wherein each pixel contains a tiny red, green and blue light. Together brightness level of each light in each pixel synchronize together to create the images shown on a screen at any given time. The values of these pixels update anywhere from several dozen to several hundreds of times per second. This is known as the refresh rate. During a refresh cycle, each pixel on the entire screen is iteratively updated. Through this method, the graphics output from a computer is able to change the content displayed on a screen. A great slo-motion video showing this process can be found here:Link
For the purpose of this attack, the authors in the paper first conducted experiments by analyzing the acoustic output of very basic "zebra" stripe images, shown in Fig. 1 below:
In order to understand the attack process, the basics behind how an LCD works must first be understood. These displays consist of a rows and rows of tightly packed pixels, wherein each pixel contains a tiny red, green and blue light. Together brightness level of each light in each pixel synchronize together to create the images shown on a screen at any given time. The values of these pixels update anywhere from several dozen to several hundreds of times per second. This is known as the refresh rate. During a refresh cycle, each pixel on the entire screen is iteratively updated. Through this method, the graphics output from a computer is able to change the content displayed on a screen. A great slo-motion video showing this process can be found here:Link
For the purpose of this attack, the authors in the paper first conducted experiments by analyzing the acoustic output of very basic "zebra" stripe images, shown in Fig. 1 below:
Here, the authors used a Soyo DYLM2086 display (A), a Brüel & Kjaer 4190 microphone and preamplifier (B), a Brüel & Kjaer 2610 amplifier (C), a Creative E-MU 0404 USB sound card (D) and a laptop (E), creating spectrograms representing the acoustic signatures of the contents displayed on screen. A example of the output corresponding to different zebra stripe patterns (with differing periods), is shown in Fig. 2 below:
|  | 
| Figure 2 Acoustic emanation of zebra stripe patterns | 
In an effort to determine the location of the acoustic output, the researchers were able to pinpoint a particular quadrant in the LCD's power supply board, shown in Fig. 3 below, which appears to contain a cluster of capacitors.
Isolating the audio output to the power supply board is of particular significance to this specific attack, because the current draw on the power supply is functionally dependent upon the content being displayed, since each pixel will have a different power signature for different color/brightness combinations. As such, the exact rendering of an image will have a unique overall "power signature" during the course of a refresh cycle. This power signature will correspond to a unique electrical load on the components in the power supply board itself. This phenomenon, as previously discussed will create acoustic frequency outputs by these electrical components, leading the way for the side channel attack. Of note, the exact frequency emitted as a series of pixels are refreshed depends on the exact circuitry used. In other words, it would incorrect to assume that every pixels in different locations with the same color and brightness combinations would always produce the same acoustic signatures, since small difference in wiring, including minute differences in conductivity or even lengths of wire from one component to another could influence the necessitated power draw. As such,
it is easier to examine the overall output during an entire refresh cycle. The experimenters in this paper decided to do just that, and focus on the overall acoustic signatures of larger display contents. With this in mind, the goal here is to create unique digital signatures or "traces" for the entire refresh cycles of particular displayed content.
To achieve this, the authors developed an algorithm for sampling the output over many refresh cycles, to improve the quality of the sample output. Of particular note was the fact that they account for tiny changes in the refresh rate as well as "abnormal cycles" since a static sampling method with a period of one refresh rate, would lead to overly noisy data. The authors were able to accomplish this by using Pearson correlation to align each recorded chunk in the position it should be in the course of one cycle. They then looked to optimize the number of samples for each recording in order to maximize chunk alignment, in essence, looking for a way to find the cleanest signal, and from this create a unique audio trace (digital signature). This is of particular importance, because creating a clean signal with minimal noise is imperative for working with the machine learning algorithms used later in the attacks to identify screen contents itself.
In order to carry out this method of attack, some method of recording the acoustic output of an LCD screen must be employed. In the context of this paper, the authors demonstrated how this could be accomplished through the use of several common pieces of equipment, including a phone, a webcam, and a virtual assistant device with recording capabilities. The specific device used does not seem to mater insofar as it is able to record sounds at the high end of the audible range, and some frequencies above the audible range. Of note, since these recorded frequencies are so high, there is less natural interference in this range, generally allowing for a cleaner recorded signal than one may expect. In fact, the majority of interference in these frequency ranges would likely come from other electronic devices that produce inaudible frequencies in the same range.
For the attack itself, lots of training data must first be collected for use in training a machine learning model. The authors of this attack used convolutional neural networks (CNN) because of the inherent time dependence of the sampled data. Using this method, the authors demonstrated attacks wherein they were able to detect inputs on an on screen keyboard, decipher rudimentary text input on a screen with varying degrees success, and finally, distinguish between popular websites loaded on a screen.
This attack itself is of particular significance due to the widespread applicability it could have in stealing sensitive data. While deciphering the screen content from an arbitrary screen (even if the particular screen model is known) may have its limitations in determining previously unknown content, due to the reliance of this method machine learning models that require lots of training data with very similar instances, it is not difficult to envision attacks on systems wherein the screen content is more predictable. An example of this would be touch screen ATM machines. Using this form of attack on this kind of system, it would be rather straightforward to design an attack to steal PIN numbers using relatively inexpensive acoustic recording equipment, such as a smartphones. And indeed, since acoustic side channel attacks have become an area of concern for many devices dealing with sensitive information, there have been proposed solutions to this problem. One particularly promising approach is signal jamming, wherein noise is generated in the same acoustic frequencies as the electronic components in a system to create a noise mask [15].
|  | 
| Figure 3 Internal components of LCD display. (A) LCD panel (B) Digital logic board. (C) Power supply board | 
it is easier to examine the overall output during an entire refresh cycle. The experimenters in this paper decided to do just that, and focus on the overall acoustic signatures of larger display contents. With this in mind, the goal here is to create unique digital signatures or "traces" for the entire refresh cycles of particular displayed content.
To achieve this, the authors developed an algorithm for sampling the output over many refresh cycles, to improve the quality of the sample output. Of particular note was the fact that they account for tiny changes in the refresh rate as well as "abnormal cycles" since a static sampling method with a period of one refresh rate, would lead to overly noisy data. The authors were able to accomplish this by using Pearson correlation to align each recorded chunk in the position it should be in the course of one cycle. They then looked to optimize the number of samples for each recording in order to maximize chunk alignment, in essence, looking for a way to find the cleanest signal, and from this create a unique audio trace (digital signature). This is of particular importance, because creating a clean signal with minimal noise is imperative for working with the machine learning algorithms used later in the attacks to identify screen contents itself.
In order to carry out this method of attack, some method of recording the acoustic output of an LCD screen must be employed. In the context of this paper, the authors demonstrated how this could be accomplished through the use of several common pieces of equipment, including a phone, a webcam, and a virtual assistant device with recording capabilities. The specific device used does not seem to mater insofar as it is able to record sounds at the high end of the audible range, and some frequencies above the audible range. Of note, since these recorded frequencies are so high, there is less natural interference in this range, generally allowing for a cleaner recorded signal than one may expect. In fact, the majority of interference in these frequency ranges would likely come from other electronic devices that produce inaudible frequencies in the same range.
For the attack itself, lots of training data must first be collected for use in training a machine learning model. The authors of this attack used convolutional neural networks (CNN) because of the inherent time dependence of the sampled data. Using this method, the authors demonstrated attacks wherein they were able to detect inputs on an on screen keyboard, decipher rudimentary text input on a screen with varying degrees success, and finally, distinguish between popular websites loaded on a screen.
This attack itself is of particular significance due to the widespread applicability it could have in stealing sensitive data. While deciphering the screen content from an arbitrary screen (even if the particular screen model is known) may have its limitations in determining previously unknown content, due to the reliance of this method machine learning models that require lots of training data with very similar instances, it is not difficult to envision attacks on systems wherein the screen content is more predictable. An example of this would be touch screen ATM machines. Using this form of attack on this kind of system, it would be rather straightforward to design an attack to steal PIN numbers using relatively inexpensive acoustic recording equipment, such as a smartphones. And indeed, since acoustic side channel attacks have become an area of concern for many devices dealing with sensitive information, there have been proposed solutions to this problem. One particularly promising approach is signal jamming, wherein noise is generated in the same acoustic frequencies as the electronic components in a system to create a noise mask [15].
[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6539189/ 
[2] https://www.berkeley.edu/news/media/releases/2005/09/14_key.shtml 
[3] https://m.tau.ac.il/~tromer/acoustic/
[4] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.5990&rep=rep1&type=pdf
[5] https://eprint.iacr.org/2016/230.pdf
[6] http://www.kemet.com/Lists/TechnicalArticles/Attachments/62/2007%20CARTS%20-%20Reduced%20Microphonics%20and%20Sound%20Emissions.pdf 
[7] https://spqr.eecs.umich.edu/courses/cs660sp11/papers/printers.pdf 
[8] http://aicps.eng.uci.edu/papers/iccps2016-3-d-printer-security-alfaruque.pdf 
[9] https://link.springer.com/article/10.1007/s13389-019-00212-8 
[10] https://www.cs.tau.ac.il/~tromer/synesthesia/synesthesia.pdf 
[11] https://cryptome.org/emr.pdf 
[12]https://www.cl.cam.ac.uk/~mgk25/pet2004-fpd.pdf 
[13] https://dl.acm.org/citation.cfm?id=3234690 
[14] https://ieeexplore.ieee.org/abstract/document/4531151 
[15] https://link.springer.com/article/10.1007/s10207-019-00449-8

Comments
Post a Comment