Two Hardware Security Design Flaws Affect Billions of Computers

In recent days, several sources—listed below—have reported on two security design flaws in computer hardware that involve undesirable interactions between processor speculative execution and memory protection, but whose implications are still emerging. With speculative execution, a processor core uses heuristics to guess the next step for execution. Programs execute faster when the guess is correct. When speculation picks an incorrect direction, a core should hide any learned information from user-level software. With these newly disclosed flaws, incorrect outcomes from speculation are properly hidden from the architectural state but can be leaked through timing-based side channels. That is, a devious program can coerce the processor to speculatively access memory it shouldn’t and to then test the timing of future cache accesses to infer some bits of secret information. These side-channel attacks can be repeated many times to leak information at a rate that depends on the specifics of the attack. See the end of this post for the sketch of a side-channel attack.

The first bug—dubbed Meltdown—appears to affect most Intel x86-64 processors, but not processors from AMD. It involves a flaw in speculation that lets a user-level program to read kernel pages mapped into its page table with escalated privilege. Patches are in progress for major operating systems. Unfortunately, depending on the frequency of system calls, these patches can have negative performance impacts. This bug is important to all Intel systems, as leaking the contents of kernel memory is unacceptable.

The second flaw— really class of flaws—dubbed Spectre—is reported to affect x86-64 and ARM processors from many vendors. It is rare–and perhaps unprecedented–that a design flaw appears in multiple architectures, not just one or multiple implementations of a single architecture. The flaw allows a user program to read another user program’s memory using side channels involving speculative branches, etc. The authors claim that Spectre is more difficult to fix and harder to exploit, in part, because attacks must be tailored to specific processor implementations. This class of flaws is most important to computer systems running user-level programs that are potentially hostile to each other, as with infrastructure-as-a-service cloud servers.

As a deeper understanding emerges, the computer science technical community may wish to reflect on how to prevent future bugs like these in the cyber infrastructure on which we all depend. How do we make more deliberate trade-offs between performance and security? Can formal methods help or is the attack surface too complex? Should a functional architectural specification be augmented with a specification of security architectures that somehow identify the risks an abstraction of micro-architectural side-channels?

In one effort, the CCC–through its Cybersecurity Task Force–is in the initial planning of an embedded security workshop for researchers to set strategic directions and goals to better design security into systems rather than attempt to bolt security on after the fact.

More generally, building most robust cyber-infrastructure will likely require government investment—like DARPA’s CRASH Program (https://www.darpa.mil/program/clean-slate-design-of-resilient-adaptive-secure-hosts)—because incentives and consequences are different for society than for individual companies.

References:

New York Times: https://www.nytimes.com/2018/01/03/business/computer-flaws.html

Meltdown paper: https://meltdownattack.com/meltdown.pdf

Spectre paper: https://spectreattack.com/spectre.pdf

A blog separating the two bugs: https://danielmiessler.com/blog/simple-explanation-difference-meltdown-spectre/

Google Blog: https://security.googleblog.com/2018/01/todays-cpu-vulnerability-what-you-need.html and https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Industry News Sources: https://arstechnica.com/gadgets/2018/01/whats-behind-the-intel-design-flaw-forcing-numerous-patches/ and https://www.theregister.co.uk/2018/01/02/intel_cpu_design_flaw/

Sketch of a Side Channel Attack:

Step 0: Prime architectural state, e.g., branch predictor and caches.

Step 1: Coerce processor into speculatively loading address FOO into variable y.

The variable y never gets committed to “architectural state.”
FOO may be memory that an application is not supposed to access either from page table protections or even software bounds checks.
Hence y may hold a secret.

Step 2: In another speculative access, use the speculatively loaded value y to index into a different array BAR[] that that program has legitimate access to.

Step 3: Time the loads to BAR[] to determine the cache index. The value y is the cache index. That’s the data leak.

Step 4: Repeat many times to obtain information at some bandwidth.