AMD Slims Down Compute With Radeon Pro W7900 Dual Slot For AI Inference

by Ryan Smith on June 2, 2024 11:05 PM EST

2 Comments | Add A Comment

2 Comments

While the bulk of AMD’s Computex presentation was on CPUs and their Instinct lineup of dedicated AI accelerators, the company also has a small product refresh for the professional graphics and workstation AI crowd. AMD is releasing a dual-slot version of their high-end Radeon Pro W7900 card – aptly named the W7900 Dual Slot – with the intent being to improve compute density in workstations by making it possible to install 4 of the cards inside a single chassis.

The release of a dual-slot version of the card comes after the original Radeon Pro W7900 was the first time AMD went with a larger, triple-slot form factor for their flagship workstation card. With the W7000 generation bringing an all-around increase in power consumption, pushing the W7900 to 295 Watts, AMD originally opted to release a larger card for improved acoustics. However this came at the cost of compute density, as most systems could only fit 2 of the thicker cards. As a result, AMD is opting to release a dual-slot version of the hardware as well, to offer a more competitive product for high-density workstation systems – particularly those doing local AI inference.

AMD Radeon Pro Specification Comparison
	AMD Radeon Pro W7900DS	AMD Radeon Pro W7900	AMD Radeon Pro W7800	AMD Radeon Pro W6800
ALUs	12288 (96 CUs)		8960 (70 CUs)	3840 (60 CUs)
ROPs	192		128	96
Boost Clock	2.495GHz		2.495GHz	2.32HHz
Peak Throughput (FP32)	61.3 TFLOPS		45.2 TFLOPS	17.8 TFLOPS
Memory Clock	18 Gbps GDDR6		18 Gbps GDDR6	16 Gbps GDDR6
Memory Bus Width	384-bit		256-bit	256-bit
Memiry Bandwidth	864GB/sec		576GB/sec	512GB/sec
VRAM	48GB		32GB	32GB
ECC	Yes (DRAM)		Yes (DRAM)	Yes (DRAM)
Infinity Cache	96MB		64MB	128MB
Total Board Power	295W		260W	250W
Manufacturing Process	GCD: TSMC 5nm MCD: TSMC 6nm		GCD: TSMC 5nm MCD: TSMC 6nm	TSMC 7nm
Architecture	RDNA3		RDNA3	RDNA2
GPU	Navi 31		Navi 31	Navi 21
Form Factor	Dual Slot Blower	Triple Slot Blower	Dual Slot Blower	Dual Slot Blower
Launch Date	06/2024	Q2'2023	Q2'2023	06/2021
Launch Price (MSRP)	$3499	$3999	$2499	$2249

Other than the narrower cooler, the Radeon Pro W7900DS is for all intents and purposes identical to the original W7900, with the same Navi 31 GPU being driven to the same clockspeeds, and the overall board being run to the same 295 Total Board Power (TBP) limit. This is paired with the same 18Gbps GDDR6 as before, giving the card 48GB of VRAM.

Officially, AMD doesn’t have a noise specification for these cards. But you can expect that the W7900DS will be louder than its triple-slot senior. By all appearances, AMD is just using the cooler from the W7800, which was a dual-slot card from the start, so that cooler is being tasked with handling another 35W of heat dissipation.

As the W7800 was also AMD’s fastest dual-slot card up until now, it’s an apt point of comparison for compute density. With its full-fat Navi 31 GPU, the W7900DS will offer about 36% more compute/pixel throughput than its sibling/predecessor. So it’s a not-insubstantial improvement for the very specific niche AMD has in mind for the card.

And like so many other things being announced at Computex this year, that niche is AI. While AMD offers PCIe versions of their Instinct MI210 accelerators, those cards are geared at servers, with fully-passive coolers to match. So workstation-level compute is largely picked up by AMD’s Radeon Pro workstation cards, which are intended to go into a traditional PC chassis and use active cooling (blowers). In this case, AMD is specifically going after local inference workloads, as that’s what the Radeon hardware and its significant VRAM pool are best suited for.

The Radeon Pro W7900 Dual Slot will drop on June 19^th. Notably, AMD is introducing the card at a slightly lower price tag than they launched the original W7900 at last year, with the W7900DS hitting retail shelves at $3499, down from the W7900’s original $3999 price tag.

ROCm 6.1 For Radeons Coming as Well

Alongside the release of the W7900DS, AMD is also promoting the upcoming Radeon release of ROCm 6.1, their software stack for GPU computing. While baseline ROCm 6.1 was introduced back in April, the Windows version of AMD’s software stack is still a trailing (and feature limited) release. So that is slated to finally get bumped up to a ROCm 6.1 release on June 19^th, the same day the W7900DS launches.

ROCm 6.1 for Radeons is slated to bring a couple of major changes/improvements to the stack, particularly when it comes to expanding the scope of available features. Notably, AMD will finally be shipping Windows Subsystem for Linux 2 (WSL2) support, albeit at a beta level, allowing Windows users to access the much richer feature set and software ecosystem of ROCm under Linux. This release will also incorporate improved support for multi-GPU configurations, perfect timing for the launch of the Radeon Pro W7900DS.

Finally, ROCm 6.1 sees TensorFlow integrated into the ROCm software stack as a first-class citizen. While this matter involves more complexities than can be summarized in a simple news story, native TensorFlow support under Windows was previously blocked by a lack of a Windows version of AMD’s MIOpen machine learning library. Combined with WSL2 support, developers will have two ways to access TensorFlow on Windows systems going forward.

Gallery: AMD Radeon Pro W7900DS Press Deck

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

2 Comments

View All Comments

erinadreno - Monday, June 3, 2024 - link
It's either a slot worth of metal costs $500 or that GPU was not selling well
abufrejoval - Monday, June 3, 2024 - link
I'm not sure if "server" vs "workstation" really does apply here.

My guess is that it's really professional high-density servers vs. low-cost bitbarns, which do not care about noise, but want to extract denser inference workloads using entry level workstation boards and chassis in facilities that used to hold crypto workloads.

Not necessarily the highest quality or highest ethics inference workloads, either.

Pretty sure that scale up across PCIe wouldn't work so it's more about fitting more sub-card size inference workloads into volume and budget, while ROCm means some significant adaptation software effort vs. anything from team green and that implies a deployment at scale to pay for it.

If I was in intelligence, I'd have a closer look at who is buying this stuff and who they are reselling it to.

AMD Slims Down Compute With Radeon Pro W7900 Dual Slot For AI Inference

ROCm 6.1 For Radeons Coming as Well

Post Your Comment

2 Comments

View All Comments

erinadreno - Monday, June 3, 2024 - link

abufrejoval - Monday, June 3, 2024 - link

Log in

Don't have an account? Sign up now