Sizing up the GPUs
I suppose I've already given away the game on performance by talking about the reasons why AMD decided to aim for lower prices on the eve of the Radeon HD 2400 and 2600 launch, but I still think the topic deserves some closer examination. Why is the R600 family underperforming? The answers have to do with some of the guesses AMD and Nvidia made about GPU usage models when they first set out to design these GPUs several years ago. AMD guessed differently than Nvidia about what mix of resources would be best to have onboard, and those guesses are embodied in the RV630 and RV610, as well as in the original R600.

These differences between AMD and Nvidia boil down to a few key metrics, which we can summarize and then measure with some simple tests in 3DMark. We'll start with a table that shows theoretical peak throughput numbers.

Peak
pixel
fill rate
(Gpixels/s)
Peak texel
filtering
rate
(Gtexels/s)
Peak
memory
bandwidth
(GB/s)
Peak
shader
throughput
(MFLOPS)
GeForce 8400 GS 3.6 3.6 6.4 43.2
GeForce 8500 GT 3.6 3.6 12.8 43.2
GeForce 8600 GT 540M 4.3 8.6 22.4 114.2
GeForce 8600 GT 620M 5.0 9.9 25.6 130.1
GeForce 8600 GTS 5.4 10.8 32.0 139.2
Radeon HD 2400 Pro 2.1 2.1 6.4 42.0
Radeon HD 2400 XT 2.8 2.8 12.8 56.0
Radeon HD 2600 Pro 2.4 4.8 16.0 144.0
Radeon HD 2600 XT GDDR3 3.2 6.4 25.6 192.0
Radeon HD 2600 XT GDDR4 3.2 6.4 35.2 192.0

Let's start with the right-most column, shader throughput. These numbers represent theoretical peaks for the programmable shader cores, ruling out fixed-function units like interpolators. Generally, what you're looking at here is what happens if all of the GPU's stream processors are occupied at once with the most optimal instruction mix—usually lots of multiply-add instructions, because they yield two operations per clock cycle. The obvious outcome here is that Radeon HD 2600 cards have a tremendous amount of peak shader throughput, with the 2600 XT easily surpassing the 8600 GT and even the 8600 GTS.

These numbers may even understate the case, because they're assuming the GeForce 8 GPUs are able to co-issue a MADD and MUL in a single clock cycle, something that's only possible in certain situations. If you discount this MUL, the GeForce chips' peak throughput drops by a third—so the 8600 GTS peaks at 93 GFLOPS and the 8600 GT 620M peaks at 87 GFLOPS. Of course, there are counterpoints to be made by the Nvidia camp, not least of which involves the difficulty of consistently scheduling all five of the ALUs in the Radeons' superscalar execution units with a full slate of work. The compiler in AMD's drivers must sniff out dependencies ahead of time and schedule around them in order for the GPU to work properly. This issue will always be a challenge for the R600 and its relatives, but I am largely persuaded it won't be a serious hindrance, in part because of the results of the tests we did here and in part due to the sheer amount of parallel processing power in these chips.

I'm also persuaded by our 3DMark shader test results, which tend to confirm the RV630's shader prowess.

The Radeon HD 2600 XT beats out its ostensible direct competitor, the GeForce 8600 GT, in every test but the complex vertex shader one, and it's close there. More notably, the 2600 XT outright creams even the 8600 GTS in the pixel shader and Perlin noise tests. (3DMark's vertex shader tests sometimes seem not to max out shader throughput; the GeForce 8800 GTX has produced scores similar to the 8600 GTS in these tests, for whatever reason.) The long and the short of it is that the RV630 has quite a bit of shader power compared to the G84. The tiny RV610 also outdoes the G86 in the pixel shader, particles, and Perlin noise tests, but the gap is less pronounced there, as our theoretical throughput numbers suggested might be the case.

Look what happens when we consider theoretical peak pixel throughput and texturing, though. The Radeon HD 2600 XT tops out at 3.2 Gpixels/s of fill rate and 6.4 Gtexels/s of texture filtering capacity, while the GeForce 8600 GT 620M is substantially more capable, with peaks of 5 Gpixels/s and 9.9 Gtexels/s. The 2600 XT's only strength here is memory bandwidth; it maxes out at over 35 GB/s, more than the 8600 GTS at 32 GB/s or the 8600 GT 620M at only 25.6 GB/s. Here's what happens when we measure the more notable of these metrics, multitextured fill rate, in a simple synthetic test.

The 2600 XT comes in just behind the 8600 GT and well back from the 8600 GTS. Importantly, the 2600 XT is achieving something close to its theoretical peak throughput, likely due to its superior memory bandwidth. The 8600 GT and GTS, meanwhile, are keeping some power in reserve; they don't reach their peaks in this simple test. Both have additional filtering capacity they might use in the right situation, like with the higher quality filtering we like to use in games, where textures can be fetched and cached in blocks. We found that the R600 tended not to scale as well as the G80 with higher degrees of anisotropy.

Finally, we have the question of antialiasing performance, which would traditionally be connected with pixel fill rate and the capacity of a GPU's render back-ends or ROPs. For instance, have a look at this diagram of one of R600's render back-ends created by AMD.


Logical block diagram of an R600 render back-end. Source: AMD.

The logic that handles the resolve step for multisampled antialiasing is shown here where it traditionally resides in a modern GPU, but there's a catch. That diagram is something of a fib, like AMD's insinuations that the R600 had UVD. In truth, the resolve step is programmable because it's not handled in custom logic at all—in the R600 family, MSAA resolve is handled in the shader core. AMD says it has included a "a fast path between the render back-ends and the shader hardware" to allow the shaders to handle the resolve, and rightly argues that this provision can lead to higher image quality when combined with custom-programmed filters. Trouble is, this arrangement can also lead to lower performance. Dedicated logic tends to do jobs like traditional MSAA resolve quite well.

To give you some context, consider a claim AMD itself has made. The Radeon X1800 and X1900 series GPUs did filtering of 64-bit HDR-format textures in their shader cores, because their texturing filtering units couldn't handle those datatypes. When AMD introduced the R600, whose filtering units can process 64-bit textures, it claimed a 7X speedup in HDR texture filtering performance. Of course, you won't "feel" this one aspect of overall performance as a 7X speedup in a game, but that was the claim.

For a better sense of the impact of the RV610/RV630's lack of MSAA resolve hardware, have a look at this table, which shows 3DMark performance for our contenders with and without 4X multisampled AA.

3DMark06
No AA
3DMark06
4X AA
Performance
penalty
GeForce 8500 GT 2189 1637 25.2%
GeForce 8600 GT 4938 3814 22.8%
GeForce 8600 GTS 5740 4512 21.4%
Radeon HD 2400 XT 2229 1512 32.2%
Radeon HD 2600 Pro 3378 2279 32.5%
Radeon HD 2600 XT 4888 3432 29.8%

The Radeon HDs suffer roughly an additional 7% penalty over their GeForce counterparts in the move to 4X AA. Worse yet, the 2600 XT nearly ties the GeForce 8600 GT without AA, but it falls behind 3432 to 3814 with 4X AA enabled.

The big story here is a simple one. AMD has biased its GPUs' on-chip resources, particularly in the R600 and RV630, toward delivering vast amounts of shader power at the expense of texturing capacity and pixel throughput—especially when multisampled AA comes into the picture. Nvidia's GeForce 8 chips strike a different balance.

The question of memory bandwidth gets to be a little more complicated, because it raises the issue of intentions. Had AMD followed through on its plans to sell the 2600 XT at $199 and kept its initial price structure intact, AMD and Nvidia would have been matched up almost exactly at several price points and pretty close across the board. As things now stand, AMD offers quite a bit more memory bandwidth at each price point. Of course, that means they're probably paying more to make the cards at each price point, as well.

Will AMD's gamble on shader power yet pay off? Time will tell, but I doubt the GPU usage model will change sufficiently in the life of these products. That statement's hardly a gamble given the life cycles of GPUs these days, but I'm getting way ahead of myself once again. We should probably look at some results from today's games before speculating any further.

Latest news stories

Related articles

Copyright ©1999-2009 The Tech Report. All rights reserved.
About us | Privacy policy | Subscribe to our mailing list