Sizing up the new guy
We've already talked some about the 9600 GT's theoretical capabilities. Here's a quick table to show how it compares with a broader range of today's video cards, including the juiced-up Diamond Radeon HD 3850 512MB card we're testing. I've included numbers for the Palit card at its higher clock speeds, as well.
|
Peak pixel fill rate (Gpixels/s) |
Peak bilinear texel filtering rate (Gtexels/s) |
Peak bilinear FP16 texel filtering rate (Gtexels/s) |
Peak memory bandwidth (GB/s) |
Peak shader arithmetic (GFLOPS) |
|
| GeForce 9600 GT | 10.4 | 20.8 | 10.4 | 57.6 | 312 |
| Palit GeForce 9600 GT | 11.2 | 22.4 | 11.2 | 64.0 | 336 |
| GeForce 8800 GT | 9.6 | 33.6 | 16.8 | 57.6 | 504 |
| GeForce 8800 GTS | 10.0 | 12.0 | 12.0 | 64.0 | 346 |
| GeForce 8800 GTS 512 | 10.4 | 41.6 | 20.8 | 62.1 | 624 |
| GeForce 8800 GTX | 13.8 | 18.4 | 18.4 | 86.4 | 518 |
| GeForce 8800 Ultra | 14.7 | 19.6 | 19.6 | 103.7 | 576 |
| Radeon HD 2900 XT | 11.9 | 11.9 | 11.9 | 105.6 | 475 |
| Radeon HD 3850 | 10.7 | 10.7 | 10.7 | 53.1 | 429 |
| Diamond Radeon HD 3850 | 11.6 | 11.6 | 11.6 | 57.6 | 464 |
| Radeon HD 3870 | 12.4 | 12.4 | 12.4 | 72.0 | 496 |
| Radeon HD 3870 X2 | 26.4 | 26.4 | 26.4 | 115.2 | 1056 |
Now the question is: how do these theoretical numbers translate into real performance? For that, we can start with some basic synthetic tests of GPU throughput.


The single-textured fill rate test is typically limited by memory bandwidth, which helps explain why the Palit 9600 GT beats out our stock GeForce 8800 GT. The multitextured test is more generally limited by the GPU's texturing capabilities, and in this case, the 8800 GT pulls well away from its upstart sibling. The 9600 GT easily outdoes the Radeon HD 3850 and 3870, though, which is right in line with what we'd expect.
![]()

3DMark's two simple pixel shader tests show the 9600 GT at the back of the pack, again as we'd expect. Simply put, shader arithmetic is the place where Nvidia has compromised most in this design. Whether or not that will really limit performance in today's game is an intriguing question. We shall see.


Among the GeForce 8 cards, these vertex shader tests appear to track more closely with shader clock speeds than with the total shader power of the card. I don't think that's anything worth worrying about.
However, have a look at the difference in scores between the Radeon HD 3850 and 3870 in the simple vertex shader test. This is not a fluke; I re-tested several times to be sure. The 3850 is just faster in the simple vertex shader testat least until you get multiple GPUs involved. After consulting with AMD, I believe the most likely explanation for the 3870's low performance here is its use of GDDR4 memory. GDDR4 memory has a transaction granularity of 64 bits, while GDDR3's is half that. In certain cases, that may cause GDDR4 memory to deliver lower performance per clock, especially if the access patterns don't play well with its longer burst length. Although this effect is most pronounced here, we saw its impact in several of our game tests, as well, where the Radeon HD 3850 turned out to be faster than the 3870, despite having slightly slower GPU and memory clock frequencies.

