The mention of Spectre and Meltdown is enough to send chills down any InfoSec spine. A number of these batches of security vulnerabilities deal with speculative execution, and how a processor might leak data while executing code in a speculative manner. This week AMD has pre-empted the security space by detailing a potential security concerns regarding its new Zen 3-based Predictive Store Forwarding feature designed to improve code performance by predicting dependencies between loads and stores. AMD is clear to point out that most users will not need to take any action, as the risk for general consumer use to any breach is low, and no known code is vulnerable.

Predictions Create Predilections for Data

Modern processors use a number of clever techniques to improve performance. A number of those techniques come under the heading of ‘speculation’ – at a high level, when a processor runs code like a simple true/false branch, rather than wait for the result of that true/false check to come in from memory, it will start executing both branches at once. When the true/false result comes back from memory, the branch that had the right answer is kept, and the other is destroyed. Modern processors also predict memory addresses in repetitive loops, or values in a sequence, by learning what code has already been processed. For example, if your loop increments a load address by 1024 bytes every cycle, by the 100th loop, the processor has learned where it expects the next load to come from. It’s all rather clever, and enables a lot of performance.

The downside of these techniques, aside from the extra power consumption needed to execute multiple branches, is the fact that data is in flow from both the correct branch and the incorrect branch at once. That incorrect branch could be accessing data it shouldn’t meant to be and storing it in caches, where it can be read or accessed by different threads. A malicious attacker could cause the incorrect branch to access data it shouldn't be accessing. The concept has lots of layers and is a lot more complicated than I’ve presented here, but in any event, speculation for the sake of performance without consideration to security can lead to fast but leaky data.

For the most part, the whole industry including AMD, Intel, and Arm, have been susceptible to these sort of side-channel attacks. While Meltdown style attacks are more isolated to Intel microarchitectures, Spectre-type attacks are industry wide, and have the potential to leak user memory even in browser-like scenarios.

Predictive Store Forwarding

AMD’s document this week is a security analysis on its new Predictive Store Forwarding (PSF) feature inside Zen 3. PSF identifies execution patterns and commonalities in repeated store/load code, known as store-to-load forwarding. PSF enables the thread to speculate on the next store-to-load result before waiting to see if that result is even needed in the first place. If the result is eventually needed, then we haven’t needed to wait, and the prediction/speculation has done its job and enabled extra performance.

AMD has identified that its PSF feature could be vulnerable in two ways.

First, the pattern of the store-to-load forwarding could change unexpectedly. If the store/load pair is based on a fixed dependency pattern (such as a fixed data stride length using an external multiplier), the PSF feature learns that pattern and continues. If that dependency suddenly changes, or becomes effectively, random, the PSF feature will continue to speculate until it has learned the new dependency pattern. As it continues to speculate during this time, it has the potential to draw unneeded data into the caches which can be probed by external threads, or the access time to that sensitive data will change for external threads, and this can be monitored.

Second, PSF can be vulnerable through memory alignment / aliasing of predictions with dependencies. The PSF is designed to work and track data based on a portion of memory address alignment. As a result, when the store-to-load speculation occurs with an alignment, if a dependency is in the mix of that speculation and the dependency ends up not aligning the predicted values, this might result in incorrect speculation. The data is still valid for a speculation that won’t be used, but therein lies the issue – that data might be sensitive or outside the memory bounds of the thread in question.

Limitations

PSF only occurs within a singular thread – how PSF learns where the next store/load pair should be is individual to each thread. This means that an attack of this nature relies on the underlying code causing the PSF speculation to venture into unintended memory, and cannot be exploited directly by an incoming thread, even on the same core. This might sound as if it becomes somewhat unattackable, however if you have ever used a code simulator in a web-browser, then your code is running in the same thread as the browser.

PSF training is also limited by context – a number of thread-related values (CPL, ASID, PCID, CR3, SMM) define the context and if any one of these is changed, the PSF flushes what it has learned starts a new as an effective new context has been created. Context switching also occurs with system calls, flushing the data as well.

AMD lists that in order to exploit PSF, it requires the store-to-load pairs to be close together in the instruction code. Also the PSF is trained through successive correct branch predictions – a complete mis-prediction can cause a pipeline flush between the store and the load, removing any potential harmful data.

Effect on Consumers, Users, and Enterprise

AMD (and its security partners) has identified that the impact of PSF exploitation is similar to Speculative Store Bypass (Spectre v4), and a security concern arises when code implements security control that can be bypassed. This might occur if a program hosts untrusted code that can influence how other code speculates – AMD cites a web browser might deliver such an attack, similar to other Spectre-type vulnerabilities. 

Despite being similar to other Spectre attacks, AMD’s security analysis states that an attacker would have to effectively train the PSF of a thread with malicious code in the same thread context. This is somewhat difficult to do natively, but could be caused through elevated security accesses. That being said, PSF does not occur across separate address spaces enabled through current hardware mechanisms, such as Secure Encrypted Virtualization. The PSF data is flushed if an invalid data access occurs.

For the enterprise market, AMD is stating that the security risk is mitigated through hardware-based address space isolation. Should an entity not have a way for address space isolation in their deployment, PSF can be disabled though setting either MSR 48h bit 2 or MSR 48h bit 7 to a 1. The only products that would be effected as of today are Ryzen 5000 CPUs and EPYC Milan 7003 CPUs.

AMD is currently not aware of any code in the wild that could be vulnerable to this sort of attack. The security risk is rated as low, and AMD recommends that most end-user customers will not see any security risk by leaving the feature enabled, which will still be the default going forward. 

The full security analysis document, along with a suggested mitigation for enterprise, can be found at this link.

Comments Locked

63 Comments

View All Comments

  • Makaveli - Thursday, April 8, 2021 - link

    I'm on x570 with a 5800X on AGESA 1.2.0.1 Patch A and have zero issue on my machine.

    No WHEA BSOD
    No USB issues
    No CO issue

    Are you even on a AM4 platform? I'm speaking from first hand experience not something I read in a forum.
  • Silver5urfer - Thursday, April 8, 2021 - link

    I'm not on AM4 platform, I'm planning to buy a machine and was reading on what are the things to look out. Having this fix on ASUS, I see. I'm going to wait to hear more on this once the 1.2.0.2 rolls out for all the problems associated with the Ryzen platform.
  • Silver5urfer - Thursday, April 8, 2021 - link

    Looks like MSI, ASUS just released 1.2.0.2 BIOS update to their boards, ASUS 1.2.0.1A and this says the same thing of Fix USB connectivity issues. As for MSI, it says improved USB compatibility.
  • Makaveli - Thursday, April 8, 2021 - link

    Those are Beta bios, I will stay on 1.2.0.1A until those are out of beta status.
  • schizoide - Thursday, April 8, 2021 - link

    Good call. 1.2.0.1A was originally tagged beta and I didn't upgrade until they removed that tag.
  • Makaveli - Thursday, April 8, 2021 - link

    Same for me my AM4 machine can be considering production since I work from home on it. I never load beta's bioses on it. I'll leave that for other people to test.
  • schizoide - Thursday, April 8, 2021 - link

    I would feel perfectly comfortable buying Zen3 right now. The USB issue was the last major problem with the platform. As of mid-February, I was feeling pretty down on AMD, but they did fix the problems. That does not in any way excuse their poor QA allowing those bugs to slip through in the first place!
  • TheJian - Sunday, April 11, 2021 - link

    I don't get it, you're speaking from an authority of ONE, on behalf of a group of millions? So you are 1 better than a guy NOT on said board/cpu and reading forums of thousands or more maybe using the same hardware? :) Ok then.

    If I read a forum where there are 1000's of owners of board X/cpu Y, I already know far more than a guy talking about his SINGLE board/cpu experience even if I don't own it. :) It is silly to think your personal experience is greater than thousands using the same combo in forums. The whole point of them is to get massive experience through others correct? Silly to act as an authority when you are a sample of ONE yourself. One guy not having issues x,y, or z, doesn't mean ALL BOARDS/CPU ARE GOOD. You seem to think your ONE experience means everyone MUST be having a jolly time too, and forget all the complaints in forums (who they?)...Uh, maybe not. To you, forums are useless? Whatever...

    IE, I used to buy 20pk's of gpus as a PC biz. AS an example, take a 20pk of Matrox cards for 91TMX monitors (all cad/solidworks/ProE type crap would be used). I tested all 20 on that monitor as it pushed them to the limits. ~2 out of every pack wouldn't run without wavy screens at high refresh rates, but you'd never know it on a lower grade monitor with one card (at that time you couldn't get a 21in for less than ~$600, and the 91TXM was $1100 or so). But if you read my post in forums (posted that in actual matrox forums, not comment sections on articles), you'd know to test for that on your single card anyway. I read forums on stuff before purchasing (like the OP), just to AVOID products that have more issues than I can stomach. It's not rocket science here, more info=good, especially when done BEFORE buying. DUH. Buy first, and possibly swear later. READ forums first (ok, reviews too, but again they barely test a board/gpu etc), and avoid 95% of BS by buying the CORRECT part with the least issues. You don't find that part by listening to ONE guy in a comment section, you find that on a forum FULL of users of whatever you're eyeballing today. I'm not saying your opinion/experience doesn't count, just that you are making WAY too much out of that single data point vs. the OP who is possibly reading thousands of data points on the same hardware. Silly to blow off forums where tons of info is easily gathered on whatever you're about to buy.

    I've never owned a dell, hp, etc box (apple //e was the last "PC" our family BOUGHT, instead of built). But I've troubleshot the crap out of all of them via forums much of the time and many times over the phone on a box I can't touch. So again...pfft... I used to read them for each launch just to find which boards etc to never sell myself in my biz.
  • Silver5urfer - Monday, April 12, 2021 - link

    Wtf are you talking about ? AMD ackowledged the stupid issues on the USB3.0 silently in a fucking subreddit after months of failures by the customers. Next the WHEA errors and crashes were also seen in subreddit and OCN both places. AMD offering RMA for the fucking latest TSMC 7N bleeding edge CPUs flatout just by showing p95 errors.

    Yeah who blowed off forums ? Not everyone comes and says I have this X issue on forums, most of them go to reddit, and when I checked OCN FORUMS. People were saying USB bullshit happening,

    I'm here to fucking buy Hardware for "my money" would like to research what to expect when buying and not be a blind fool taking all that AMD is da best hype nonsense. Ryzen 5000 released in OCT2020. Over 6months and still AGESA Is not rock stable. That doesn't inspire confidence at all. On top USB issues take out the RGB controller software when they are plugged to the USB header on the X570 chipset. Go to the same forums and tell what the fuck is going on rather than writing a useless essay.
  • thigobr - Thursday, April 8, 2021 - link

    As Silver5urfer said not really fixed. I did RMA my second defect 5950X because it's unstable at default settings... And I tried all BIOSes since AGESA 1.1.8.0... First CPU core #0 was unstable boosting at default settings (PBO disabled). Second CPU, already a replacement from RMA, core #5 and #12 were both unstable! All I needed to do is start a DirectX11 game and boom! Computer crash!

    You would say: impossible, two problematic CPUs in a row?! Yes!

    After the second CPU I found hard to believe and started suspecting other component on my machine even though it was stable before with a 3700X. I went ahead and bought a 3rd CPU, a 5800X. I am still using this 5800X on the exact same machine typing this response, exact same software/Windows install, completely stable for more than a week now since I RMAed the second defect 5950X...
    There's a thread on the forum about CPU failing Corecycler you can get more details there

Log in

Don't have an account? Sign up now