They're baaaaack
Even so, the GeForce FX series was relatively slow at pixel shader programs and was behind the curve in some respects. ATI's R3x0-series chips had more pixel pipelines, better antialiasing, and didn't require as much tuning in order to achieve optimal performance. More importantly, NVIDIA itself had lost some of its luster as the would-be Intel of the graphics world. The company clearly didn't enjoy being in second place, and sometimes became evasive or combative about its technology and the issues surrounding it.
But as of today, that's all ancient history. NVIDIA is back with a new chip, the NV40, produced by a new, crystal-clear set of design principles. The first NV40-based product is the GeForce 6800 Ultra. I've been playing with one for the past few days here in Damage Labs, and I'm pleased to report that it's really, really good. For a better understanding of how and why, let's look at some of the basic design principles that guided NV40 development.
- Massive parallelism Processing graphics is about drawing pixels, an inherently paralleliziable task. The NV40 is has sixteen real, honest-to-goodness pixel pipelines"no funny business," as the company put it in one briefing. By contrast, NV30 and its high-end derivatives had a four-pipe design with two texture units per pipe that could, in special cases involving Z and stencil operations, process eight pixels per clock. The NV40 has sixteen pixel pipes with one texture unit per pipe, and in special cases, it can produce 32 pixels per clock. To feed these pipes, the NV40 has six vertex shader units, as well.

An overview of the NV40 architecture. Source: NVIDIAAll told, NV40 weighs in at 222 million transistors, roughly double the count of an ATI Radeon 9800 GPU and well more than even the largest desktop microprocessor. To give you some context, the most complex desktop CPU is Intel's Pentium 4 Prescott at "only" 125 million transistors. Somewhat surprisingly, the NV40 chip is fabricated by IBM on a 0.13-micron fabrication process, not by traditional NVIDIA partner TSMC.
By going with a 0.13-micron fab process and sixteen pipes, NVIDIA is obviously banking on its chip architecture, not advances in manufacturing techniques and higher clock speeds, to provide next-generation performance.
- Scalability With sixteen parallel pixel pipes comes scalability, and NVIDIA intends to exploit this characteristic of NV40 by developing a top-to-bottom lineup of products derived from this high-end GPU. They will all share the same features and differ primarily in performance. You can guess how: the lower end products will have fewer pixel pipes and fewer vertex shader units.
Contrast that plan with the reality of NV3x, which NVIDIA admits was difficult to scale from top to bottom. The high-end GeForce FX chips had four pixel pipes with two texture units eacha 4x2 designwhile the mid-range chips were a 4x1 design. Even more oddly, the low-end GeForce FX 5200 was rumored to be an amalgamation of NV3x pixel shaders and fixed-function GeForce2-class technology.
NVIDIA has disavowed the "cascading architectures" approach where older technology generations trickle down to fill the lower rungs of the product line. Developers should soon be able to write applications and games with confidence that the latest features will be supported, in a meaningful way, with decent performance, on a low-end video card.

A single, superscalar pixel shader unit. Source: NVIDIA - More general computational power The NV40 is a more capable general-purpose computing engine than any graphics chip that came before it. The chip supports pixel shader and vertex shader versions 3.0, as defined in Microsoft's DirectX 9 spec, with support for long instruction programs, looping, branching, and dynamic flow control. Also, NV40 can process data internally with 32 bits of floating-point precision per color channel (red, green, blue, and alpha) with no performance penalty. Combined with the other features of 3.0 shaders, this additional precision should allow developers to employ more advanced rendering techniques with fewer compromises and workarounds.
- More performance per unit of transistors Although GPUs are gaining more general programmability, this trend stands in tension with the usual mission of graphics chips, which has been to accelerate graphics functions through custom logic. NVIDIA has attempted to strike a better balance in NV40 between general computing power and custom graphics logic, with the aim of achieving more efficiency and higher overall performance. As a result, NV40's various functional units are quite flexible, but judiciously include logic to accelerate common graphics functions.
