Not much good has happened for either party since AMD purchased ATI. New chips from both sides of the fence have been late, run hot, and underperformed compared to the competition. Meanwhile, the combined company has posted staggering financial losses, causing many folks to wonder whether AMD could continue to hold up its end of the bargain as junior partner in the PC market’s twin duopolies, for CPUs and graphics chips.
AMD certainly has its fair share of well-wishers, as underdogs often do. And a great many of them have been waiting with anticipation—you can almost hear them vibrating with excitement—for the Radeon HD 4800 series. The buzz has been building for weeks now. For the first time in quite a while, AMD would seem to have an unequivocal winner on its hands in this new GPU.
Our first peek at Radeon HD 4850 performance surely did nothing to quell the excitement. As I said then, the Radeon HD 4850 kicks more ass than a pair of donkeys in an MMA cage match. But that was only half of the story. What the Radeon HD 4870 tells us is that those donkeys are all out of bubble gum.
Uhm, or something like that. Keep reading to see what the Radeon HD 4800 series is all about.
The RV770 GPU
Work on the chip code-named RV770 began two and a half years ago. AMD’s design teams were, unusually, dispersed across six offices around the globe. Their common goal was to take the core elements of the underperforming R600 graphics processor and turn them into a much more efficient GPU. To make that happen, the engineers worked carefully on reducing the size of the various logic blocks on the chip without cutting out functionality. More efficient use of chip area allowed them to pack in more of everything, raising the peak capacity of the GPU in many ways. At the same time, they focused on making sure the GPU could more fully realize its potential by keeping key resources well fed and better managing the flow of data through the chip.
The fruit of their labors is a graphics processor whose elements look familiar, but whose performance and efficiency are revelations. Let’s have a look at a 10,000-foot overview of the chip, and then we’ll consider what makes it different.
A block diagram of the RV770 GPU. Source: AMD.
Some portions of the diagram above are too small to make out at first glance, I know. We’ll be looking at them in more detail in the following pages. The first thing you’ll want to notice here, though, is the number of processors in the shader array, which is something of a surprise compared to early rumors. The RV770 has 10 SIMD cores, as you can see, and each them contains 16 stream processor units. You may not be able to see it above, but each of those SP units is a superscalar processing block comprised of five ALUs. Add it all up, and the RV770 has a grand total of 800 ALUs onboard, which AMD advertises as 800 “stream processors.” Whatever you call them, that’s a tremendous amount of computing power—well beyond the 320 SPs in the RV670 GPU powering the Radeon HD 3800 series. In fact, this is the first teraflop-capable GPU, with a theoretical peak of a cool one teraflops in the Radeon HD 4850 and up to 1.2 teraflops in the Radeon HD 4870. Nvidia’s much larger GeForce GTX 280 falls just shy of the teraflop mark.
The blue blocks to the right of the SIMDs are texture units. The RV770’s texture units are now aligned with SIMDs, so that adding more shader power equates to adding more texturing power, as is the case with Nvidia’s recent GPUs. Accordingly, the RV770 has 10 texture units, capable of addressing and filtering up to 40 texels per clock, more than double the capacity of the RV670.
Across the bottom of the diagram, you can see the GPU’s four render back-ends, each of which is associated with a 64-bit memory interface. Like a bad tattoo, the four back-ends and 256 bits of total memory connectivity are telltale class indicators: this is decidedly a mid-range GPU. Yet the individual render back-ends on RV770 are vastly more powerful than their predecessors, and the memory controllers have one heck of a trick up their sleeves in the form of support for GDDR5 memory, which enables substantially more bandwidth over every pin.
Despite all of the changes, the RV770 shares the same basic feature set with the RV670 that came before it, including support for Microsoft’s DirectX 10.1 standard. The big news items this time around are (sometimes major) refinements, including formidable increases in texturing capacity, shader power, and memory bandwidth, along with efficiency improvements throughout the design.
Like the RV670 before it, the RV770 is fabricated at TSMC on a 55nm process, which packs its roughly 956 million transistors into a die that’s 16mm per side, for a total area of 260 mm². The chip has grown from the RV670, but not as much as one might expect given its increases in capacity. The RV670 weighed in at an estimated 666 million transistors and was 192 mm².
Of course, AMD’s new GPU is positively dwarfed by Nvidia’s GT200, a 577 mm² behemoth made up of 1.4 billion transistors. But the more relevant comparisons may be to Nvidia’s mid-range GPUs. The first of those GPUs, of course, is the G92, a 65nm chip that’s behind everything from the GeForce 8800 GT to the GeForce 9800 GTX. That chip measured out, with our shaky ruler, to more or less 18mm per side, or 324 mm². (Nvidia doesn’t give out official die size specs anymore, so we’re reduced to this.) The second competing GPU from Nvidia is a brand-new entrant, the 55nm die shrink of the G92 that drives the newly announced GeForce 9800 GTX+. The GTX+ chip has the same basic transistor count of 754 million, but, well, have a look. The pictures below were all taken with the camera in the same position, so they should be pretty much to scale.
The die-shrunk G92 at 55nm aboard the GeForce 9800 GTX+
Yeah, so apparently I have rotation issues. These things should not be difficult, I know. Hopefully you can still get a sense of comparative size. By my measurements, interestingly enough, the 55nm GTX+ chip looks to be 16 mm per side and thus 260 mm², just like the RV770. That’s despite the gap in transistor counts between the RV770 and G92, but then Nvidia and AMD seem to count transistors differently, among a multitude of other variables at work here.
The pictures below will give you a closer look at the chip’s die itself. The second one even locates some of the more important logic blocks.
A picture of the RV770 die. Source: AMD.
The RV770 die’s functional units highlighted. Source: AMD.
As you can see, the RV770’s memory interface and I/O blocks form a ring around the periphery of the chip, while the SIMD cores and texture units take up the bulk of the area in the middle. The SIMDs and the texture units are in line with one another.
What’s in the cards
Initially, the Radeon HD 4800 series will come in two forms, powder and rock. Err, I mean, 4850 and 4870. By now, you may already be familiar with the 4850, which has been selling online for a number of days.
Here’s a look at our review sample from Sapphire. The stock clock on the 4850 is 625MHz, and that clock governs pretty much the whole chip, including the shader core. These cards come with 512MB of GDDR3 memory running at 993MHz, for an effective 1986MT/s. AMD pegs the max thermal/power rating (or TDP) of this card at 110W. As a result, the 4850 needs only a single six-pin aux power connector to stay happy.
Early on, AMD suggested the 4850 would sell for about $199 at online vendors, and so far, street prices seem to jibe with that, by and large.
And here we have the big daddy, the Radeon HD 4870. This card’s much beefier cooler takes up two slots and sends hot exhaust air out of the back of the case. The bigger cooler and dual six-pin power connections are necessary given the 4870’s 160W TDP.
Cards like this one from VisionTek should start selling online today at around $299. That’s another hundred bucks over the 4850, but then you’re getting a lot more card. The 4870’s core clock is 750MHz, and even more importantly, it’s paired up with 512MB of GDDR5 memory. The base clock on that memory is 900MHz, but it transfers data at a rate of 3600MT/s, which means the 4870’s peak memory bandwidth is nearly twice that of the 4850.
Both the 4870 and the 4850 come with dual CrossFire connectors along the top edge of the card, and both can participate in CrossFireX multi-GPU configurations with two, three, or four cards daisy-chained together.
The folks at Nvidia aren’t likely to give up their dominance at the $199 sweet spot of the video card market without a fight. In response to the release of the Radeon HD 4850, they’ve taken several steps to remain competitive. Most of those steps involve price cuts. Stock-clocked versions of the GeForce 9800 GTX have dropped to $199 to match the 4850. Meanwhile, you have higher clocked cards like this one:
This “XXX Edition” card from XFX comes with core and shader clocks of 738 and 1836MHz, respectively, up from 675/1688MHz stock, along with 1144MHz memory. XFX bundles this card with a copy of Call of Duty 4 for $239 at Newegg, along with a $10.00 mail-in rebate, which gives you maybe better-than-even odds of getting a check for ten bucks at some point down the line, if you’re into games of chance.
Cards like this “XXX Edition” will serve as a bridge of sorts for Nvidia’s further answer to the Radeon HD 4850 in the form of the GeForce 9800 GTX+. Those cards will be based on the 55nm die shrink of the G92 GPU, and they’ll share the XXX Edition’s 738MHz core and 1836MHz shader clocks, although their memory will be slightly slower at 1100MHz. Nvidia expects GTX+ cards to be available in decent quantities by July 16 at around $229.
For most intents and purposes, of course, these two cards should be more or less equivalent, including performance. The GTX+ shares the 9800 GTX’s dual-slot cooler and layout, as well. As a result, and because of time constraints, we’ve chosen to include only the XXX Edition in most of our testing. The exception is the place where the 55nm chip is likely to make the biggest difference: in power draw and the related categories of heat and noise. We’ve tested the 9800 GTX+ separately in those cases.
Nvidia has also decided to sweeten the pot a little bit by supplying us with drivers that endow the GeForce 9800 GTX and GTX 200-series cards with support for GPU-accelerated physics via the PhysX API. You’ll see early results from those drivers in our 3DMark Vantage performance numbers.