Pushin' pixels — continued
![]()
|
NVIDIA is obviously aware of the negative vibe created by GeForce FX performance problems in some DirectX 9 applications, and the company is working to improve both the FX lineup's image and its performance. One of the keys to that effort is the new Detonator 50 driver, which should be available for download starting today.
The Detonator 50 series includes a new run-time compiler intended to produce code better optimized for the GeForce FX architecture, which seems to be especially sensitive to the way code is structured. The compiler embedded in a graphics driver translates API calls into machine code, as you can see in the simple block diagram on the right. All modern GPU drivers have such compilers, and they are especially important for programmable GPUs. NVIDIA claims its new compiler in Detonator 50 should be much more proficient at generating code friendly to the FX architecture.
Graphics driver-based compilers present several opportunities for optimization that are especially relevant in the case of NVIDIA's NV3x chip, which appears to need extra help sometimes. One of the FX's primary weaknesses is performance in DirectX 9, because the FX hardware doesn't seem to map very well to the requirements of the API. (Much of the 3DMark03 controversy and the hubbub over Half-Life 2 performance can be traced to this fact.) NVIDIA very carefully alludes to this situation in its whitepaper on its new compiler:
Delivering industry-leading graphics solutions entails a broad set of challenges and even some fortune-telling. Hardware designers not only must continually push the performance and functionality forward, but also anticipate the future direction for the major software application programming interfaces (APIs). Even with attention to every detail, coupling a new architecture with the long list of emerging application requirements from the various APIs can be daunting. When a new GPU is released, its new architecture may not suit the latest software programming techniques for one API, yet it may be ideally suited for the programming techniques of another.Hence the GeForce FX's apparent prowess in the OpenGL-based DOOM 3, and its relative weakness in the DX9-based Half-Life 2. Microsoft seems to have taken a different direction with some portions of DirectX 9 than NVIDIA anticipated. OpenGL is easier for the FX chips, because NVIDIA can map API calls to its hardware more directly by creating its own extensions to OpenGL.
NVIDIA claims its new compiler can help bridge the gap in situations of API-hardware mismatch by automatically optimizing the machine code it produces. These optimizations come in several forms. One method is friendlier instruction ordering. When I spoke with NVIDIA's Chief Scientist, David Kirk, a few weeks back, he indicated better instruction ordering is more important than datatype selection or any other optimization for the FX chips. NVIDIA offers one very simple example of instruction reordering for the FX: going from interleaved math and texture ops to serial operations of the same type (from math-texture-math-texture to math-math-texture-texture).
The complier can also translate shorter pixel shader programs that require multiple passes into longer shader programs that require fewer passes. Because NV3x chips can process exceptionally long pixel shader programs, this adjustment makes lots of sense. Also, compilers can work to minimize register use (another rumored FX sticking point). The end result of such changes is pixel shader programs executed in fewer clock cycles. Done correctly, instruction reordering and shader optimization should improve performance without changing the value produced by the calculationthat is, image quality shouldn't be affected.
However, NVIDIA still hasn't sworn off optimizations that reduce color precision and thus image quality. We know that the FX chips seem to perform much better with lower precision datatypesbest with integer, and better with 16-bit floating point than with 32 bits per color channel. NVIDIA's complier could conceivably translate higher-precision calculations into lower precision if it decides more precision is unneeded. According to Dr. Kirk, the standard for Pixel Shader 2.0 calculations isn't 16 or 24 or 32 bits of precision; it's matching the output generated by Microsoft's reference rasterizer. NVIDIA's brief on the new compiler says it won't reduce image quality, but color precision isn't discussed. Of course, reductions in color precision can wreak havoc on shader output, especially when datatype selection is poor, so the compiler had best be very careful about making such changes. Some Pixel Shader 2.0 programs might make the transition to integer math gracefully, and others may not. Predicting such outcomes probably isn't easy to do in all cases, even with a relatively intelligent compiler algorithm.
I suspect color precision and datatypes are more important to NV3x performance than NVIDIA is letting on, and I also suspect the fact that DirectX 9's Pixel Shader 2.0 doesn't expose NVIDIA's integer FX12 pixel shaders directly will be an abiding problem for the NV3x chips. (If we knew more about the NV3x's actual internal structure, we might be able to project better how all of this will likely play out.) ATI probably made the smarter compromise by simply converting all pixel shader calculations to 24-bits of floating-point data per color channel, because its R300-series chips, by all indications, have more peak floating-point processing power than the NV3x series. However, NVIDIA's hardware does offer higher peak color precision and a flexible, CPU-like set of datatypes for pixel shader calculations.
On to the testing...
Please read this next bit carefully. We have plans to test the image quality produced by the 52.16 drivers in some detail, but unfortunately, we weren't able to do so for this article due to time constraints. For that, I apologize. We have tested the GeForce FX 5950 Ultra and 5900 Ultra with 52.16 drivers in Pixel Shader 2.0 programs. We will address image quality properly and extensively in a future article.
Now, on to the benchmarks. As we did in our Radeon 9800 XT review, we have tested the 5950 Ultra almost exclusively in fill rate- and memory bandwidth-limited situations, at very high resolutions and with 4X antialiasing and 8X anisotropic filtering enabled. Some of the newer games are limited by pixel shader power, and higher resolutions will push the pixel shaders, too.

