Sizing up the new guy
We've already talked some about the 9600 GT's theoretical capabilities. Here's a quick table to show how it compares with a broader range of today's video cards, including the juiced-up Diamond Radeon HD 3850 512MB card we're testing. I've included numbers for the Palit card at its higher clock speeds, as well.

Peak
pixel
fill rate
(Gpixels/s)
Peak bilinear
texel
filtering
rate
(Gtexels/s)
Peak bilinear
FP16 texel
filtering
rate
(Gtexels/s)
Peak
memory
bandwidth
(GB/s)
Peak
shader
arithmetic
(GFLOPS)
GeForce 9600 GT 10.4 20.8 10.4 57.6 312
Palit GeForce 9600 GT 11.2 22.4 11.2 64.0 336
GeForce 8800 GT 9.6 33.6 16.8 57.6 504
GeForce 8800 GTS 10.0 12.0 12.0 64.0 346
GeForce 8800 GTS 512 10.4 41.6 20.8 62.1 624
GeForce 8800 GTX 13.8 18.4 18.4 86.4 518
GeForce 8800 Ultra 14.7 19.6 19.6 103.7 576
Radeon HD 2900 XT 11.9 11.9 11.9 105.6 475
Radeon HD 3850 10.7 10.7 10.7 53.1 429
Diamond Radeon HD 3850 11.6 11.6 11.6 57.6 464
Radeon HD 3870 12.4 12.4 12.4 72.0 496
Radeon HD 3870 X2 26.4 26.4 26.4 115.2 1056

Now the question is: how do these theoretical numbers translate into real performance? For that, we can start with some basic synthetic tests of GPU throughput.

The single-textured fill rate test is typically limited by memory bandwidth, which helps explain why the Palit 9600 GT beats out our stock GeForce 8800 GT. The multitextured test is more generally limited by the GPU's texturing capabilities, and in this case, the 8800 GT pulls well away from its upstart sibling. The 9600 GT easily outdoes the Radeon HD 3850 and 3870, though, which is right in line with what we'd expect.

3DMark's two simple pixel shader tests show the 9600 GT at the back of the pack, again as we'd expect. Simply put, shader arithmetic is the place where Nvidia has compromised most in this design. Whether or not that will really limit performance in today's game is an intriguing question. We shall see.

Among the GeForce 8 cards, these vertex shader tests appear to track more closely with shader clock speeds than with the total shader power of the card. I don't think that's anything worth worrying about.

However, have a look at the difference in scores between the Radeon HD 3850 and 3870 in the simple vertex shader test. This is not a fluke; I re-tested several times to be sure. The 3850 is just faster in the simple vertex shader test—at least until you get multiple GPUs involved. After consulting with AMD, I believe the most likely explanation for the 3870's low performance here is its use of GDDR4 memory. GDDR4 memory has a transaction granularity of 64 bits, while GDDR3's is half that. In certain cases, that may cause GDDR4 memory to deliver lower performance per clock, especially if the access patterns don't play well with its longer burst length. Although this effect is most pronounced here, we saw its impact in several of our game tests, as well, where the Radeon HD 3850 turned out to be faster than the 3870, despite having slightly slower GPU and memory clock frequencies.

Latest news stories

Related articles

Copyright ©1999-2009 The Tech Report. All rights reserved.
About us | Privacy policy | Subscribe to our mailing list