An Unusual Launch Cycle: OEMs now, Individual Units Later

The launch of Bristol Ridge APUs for desktop is taking a slightly different strategy to previous AMD launches. Typically we expect to see CPUs/APUs and OEM systems with that hardware launched on the day of the announcement, with stock of the hardware getting to shelves over the next few weeks. In order to do this, AMD needs to work with all the OEMs (HP, Lenovo, Dell) and platform partners (ASUS, GIGABYTE, MSI, ASRock) and potentially the memory manufacturers (Crucial, Kingston, G.Skill, ADATA, etc) to synchronize a launch with expected hardware, platform control and settings.

This time around, AMD has focused on the OEMs first, with all-in-one PCs and desktop systems being their focus. Typically the big OEMs develop their own PCBs and manage the full gamut of support, as well as being mindful of firmware that can be a work in progress up until the launch date. This allows the launch to be focused on a few models of complete experience systems, rather than the comparative free-for-all with custom build machines. Typically one might argue that the standard motherboard designers take longer to design their product, as it becomes their brand on offer, whereas HP/Lenovo sells the system as a brand, so not every stage has to be promoted, advertised and polished in the same way.

Of course, from an enthusiast perspective, I would prefer everything to come out on day one, and a deep dissection into the platform. But because Bristol Ridge is sharing a platform with the upcoming new microarchitecture, Zen, AMD has to balance the wishes of OEMs along with product expectations. As a result, the base announcement from AMD was somewhat of a brief overview, and we delayed writing this piece until we were able to source certain nuggets of information which make sense when individual units (and motherboards) are on sale for DIY users, as well as some insights into what Zen might offer.

But by focusing on OEMs first, it makes it more difficult for us to source review units! Watch this space, we’re working on it.

The CPU Roadmap

A lot of the recent talk regarding AMD’s future in the desktop CPU space has revolved around its next-generation CPU architecture called Zen. In August, AMD opened up to a significant part of the underlying Zen microarchitecture, detailing a micro-op cache, a layered memory hierarchy, dual schedulers and other information. Nonetheless Zen is initially aiming for the high-end desktop (HEDT) market, and AMD has always stated that Zen will share the AM4 platform with new mainstream CPUs, under the Bristol Ridge and Stoney Ridge names, initially based on an updated Excavator microarchitecture.

AMD’s roadmap seems to be the following:

The latest AMD announcements are for that mainstream segment, but we can see that AMD is moving from a three-socket configuration of AM3, FM2+ and AM1 into a singular AM4 platform from top to bottom, with the budget element perhaps being more embedded focused. This has positives and negatives associated with it, which is part of the reason why AMD is staggering the release of Bristol Ridge and the 7th Generation APUs between OEMs and PIBs.

The positive from the unified problem is that AMD’s OEM customers can have a one size fits all solution that spans from the budget to the premium, which makes OEM designs easier to translate from a high powered platform to a budget system. The downside is variety and compatibility – if a vendor designs a platform purely for a budget system, and has fewer safeguards, then a user cannot simply put in the most powerful CPU/APU available. Luckily we are told that all AM4 systems should be dual channel, which migrates away from the Carrizo/Carrizo-L problem we had in notebooks late last year.

AMD 7th Gen Bristol Ridge and AM4: The CPUs, Overclocking The Integrated GPU
Comments Locked

122 Comments

View All Comments

  • Alexvrb - Sunday, September 25, 2016 - link

    Geekbench is trash at comparing across different architectures. It makes steaming piles look good. Only using SSE (first gen, ancient) on x86 processors would certainly be a part of the puzzle regarding Geekbench results. Thanks, Patrick.

    Not to take anything away from Apple's cores. I wouldn't be surprised that they have better performance per WATT than Skylake. Perf/watt is kind of a big deal for mobile, and Apple (though I don't care for them as a company) builds very efficient processor cores. With A10 using a big.LITTLE implementation of some variety, they stand to gain even more efficiency. But in terms of raw performance? Never rely on Geekbench unless maybe you're comparing an A9 Apple chip to an A10 or something. MAYBE.
  • ddriver - Monday, September 26, 2016 - link

    Hey, it is not me who uses crap like geekbench and sunspider to measure performnace, it is sites like AT ;)
  • BurntMyBacon - Monday, September 26, 2016 - link

    @ddriver: "Hey, it is not me who uses crap like geekbench and sunspider to measure performnace, it is sites like AT ;)"

    LOL. My gut reaction was to call you out on blame shifting until I realized ... You are correct. There hasn't exactly been a lot of benchmark comparison between ARM and x86. Of course, there isn't much out there with which to compare either so ...
  • patrickjp93 - Monday, September 26, 2016 - link

    Linpack and SAP. Both are massive benchmark suites that will give you the honest to God truth, and the truth is ARM is still 10 years behind.
  • patrickjp93 - Monday, September 26, 2016 - link

    They use it in context and admit the benchmarks are not equally optimized across architectures.
  • patrickjp93 - Monday, September 26, 2016 - link

    It doesn't even use SSE. It uses x86_64 and x87 scalar float instructions. It doesn't even give you MMX or SSE. That's how biased it is.
  • patrickjp93 - Monday, September 26, 2016 - link

    Just because you write code simply enough using good modern form and properly align your data and make functions and loops small enough to be easily optimized does not mean GCC doesn't choke. Mike Acton gave a great lecture at CPPCon 2014 showing various examples where GCC, Clang, and MVCC choke.

    Define very good.

    Define detailed analysis. Under what workloads? Is it more efficient for throughput or latency (because I guarantee it can't be both)?

    Yes, Geekbench uses purely scalar code on x86 platforms. It's ludicrously pathetic.

    It's 8x over scalar, and that's where it matters, and it can even be better than that because of loop Muop decreases which allow the loops to fit into the detector buffers which can erase the prefetch and WB stages until the end of the loop.

    No, they're not more powerful. A Pentium IV is still more powerful than the Helio X35 or Exynos 8890.

    No, those are select benchmarks that are more network bound than CPU bound and are meaningless for the claims people are trying to make based on them.
  • BurntMyBacon - Monday, September 26, 2016 - link

    @ddriver: "I've been using GCC mostly, and in most of the cases after doing explicit vectorization I found no perf benefits, analyzing assembly afterwards revealed that the compiled has done a very good job at vectorizing wherever possible."

    It's not just about vectorizing. I haven't taken a look at Geekbench code, but it is pretty easy to under-utilize processor resources. Designing workloads to fit within a processors cache for repetitive operations is a common way to optimize. It does, however, leave a processor with a larger cache underutilized for the purposes of the workload. Similar examples can be found for wide vs narrow architectures and memory architectures feeding the processor. Even practical workloads can be done various ways that are much more or less suitable to a given platform. Compression / Encoding methods are some examples here.
  • BurntMyBacon - Monday, September 26, 2016 - link

    @patrickjp93: "Yes you can get 5x the performance by optimizing. Geekbench only handles 1 datem at a time on Intel hardware vs. the 8 you can do with AVX and AVX2. Assuming you don't choke on bandwidth, you can get an 8x speedup."

    If you have processor with a large enough cache to keep a workload almost entirely in cache and another with far less cache that has to access main memory repetitively to do the job, the difference can be an order of magnitude or more. Admittedly, the type of workload that is small enough to fit in any processor cache isn't common, but I've seen cases of it in benchmarks and (less commonly in) scientific applications.
  • patrickjp93 - Tuesday, September 27, 2016 - link

    Heh, they're usually based on Monte Carlo simulations if they can.

Log in

Don't have an account? Sign up now