Home NVIDIA’s GeForce 6200 with TurboCache
Reviews

NVIDIA’s GeForce 6200 with TurboCache

Scott Wasson
Disclosure
Disclosure
In our content, we occasionally include affiliate links. Should you click on these links, we may earn a commission, though this incurs no additional cost to you. Your use of this website signifies your acceptance of our terms and conditions as well as our privacy policy.

A WHILE BACK, we reviewed the GeForce 6200, and we were a bit perplexed. The GeForce 6200 was a success on most fronts. It brought GeForce 6-class features to graphics cards in the $129 range, including DirectX 9 with Shader Model 3.0—no small feat. Performance was quite good, and the product made sense overall, save for one thing: the 6200 was based on the NV43 GPU, the same graphics chip used in GeForce 6600 cards, but with much of its rendering power disabled. Graphics companies sell tons of low-end GPUs. Why would NVIDIA manufacture a bunch of NV43 graphics chips, which are relatively larger chips, only to cut them down to half the rendering power?

The answer, it turns out, is pretty simple: that version of the GeForce 6200 was just a stop-gap measure, not the real thing. The real GeForce 6200 is based on a new and much smaller chip, the NV44, with an intriguing new technology, dubbed TurboCache, that allows graphics cards to use system memory in combination with a smaller amount of local graphics RAM to deliver decent low-end performance. Read on for our take on NVIDIA’s new low-end GPU.

The NV44 GPU
The GeForce 6200 with TurboCache is indeed a scaled-down version of the NV4x architecture that powers the entire line of GeForce 6 graphics cards, but it’s not just that. NVIDIA has reworked the memory management portions of the NV4x pipeline in order to allow this new NV44 GPU to use system memory for rendering. Here’s a block diagram of the NV44.


Block diagram of the NV44 GPU. Source: NVIDIA.

The NV44 is a new and substantially smaller chip than its predecessors. It packs three vertex shader engines, just like the NV43, but has only four pixel shader pipelines. Those pixel shader units handle both programmable pixel shading and texturing, and they are linked to a pair of raster operators, or ROPs, by a fragment crossbar. This crossbar acts like a load-balancer, sending pixel fragments to available ROPs as needed. The ROP then writes the pixel result to memory. Thus, the NV44 is limited to writing two pixels per clock, but it’s able to keep the ROPs working by feeding them from four pixel pipes. This architecture is a little bit different from the usual arrangement, in which each pixel pipeline is directly connected to the raster operators, but it’s proven very effective in the GeForce 6600 line.

The ROP subsystem is the place where NVIDIA has saved the most transistors on the GeForce 6200. In addition to cutting the number of ROPs down to two (from 16 in the NV40 and four in NV43), NVIDIA’s engineers have scaled back some of the ROP pipeline’s capabilities in NV44, removing support for color compression, Z compression, and OpenEXR 16-bit floating-point blending and filtering. The removal of color compression will primarily affect performance with multisampled antialiasing, while Z compression will cost some performance overall. OpenEXR blending and filtering is nice to have, but this thing won’t be fast enough to handle 16-bit FP blending in real time graphics, anyhow.

All told, NVIDIA was able to reduce the NV44 to a comparatively tiny 75 million (or so) transistors—much less than the 222 million estimated transistors on the GeForce 6800 Ultra.


The NV44 GPU

More importantly, the NV44’s die size is much smaller than other NV4x chips. By my measurements, the chip is 10mm by 11mm, or 110mm2. The NV43, for comparison, is about 156mm2. Both chips are manufactured by TSMC on its 110nm fab process, so the die size difference is the result of the trimmed-down ROPs and pixel shader pipes.

 

TurboCache and turbo lag
As for TurboCache, it isn’t really about forced induction or ceramic impellers. TurboCache is about using a combination of fast local RAM and system memory for rendering. In this case, NVIDIA says the local frame buffer acts as a high-speed, software-managed cache, while main memory is allocated dynamically as needed for graphics use. GeForce 6200 cards with TurboCache will come with either one or two pieces of memory onboard. A single DRAM, typically a 16MB chip, will offer a 32-bit path to memory, while two chips will make for a 64-bit path. The memory chips will run at 350MHz, or a 700MHz data rate with DDR, so that a 32-bit config will yield 2.8GB/s of local frame buffer bandwidth, and a 64-bit config will have 5.6GB/s to memory. Although system memory can be used freely for rendering tasks, scan-out for video will always happen from the local frame buffer.

The NV44 can do texturing directly from main memory, as most graphics cards have done since the introduction of AGP texturing way back when. The novel thing with NV44 is the ability to write directly to a texture in system memory. Programmable pixel shading typically creates lots of renderable surfaces in a scene, and NVIDIA has concentrated its efforts on making NV44 able to render directly to system memory “at 100% efficiency,” as they put it. The yellow bits in the block diagram on the previous page are the ones modified to make direct rendering to system memory a possibility. As you can see, NVIDIA has added a memory management unit that allows two-way access to system memory from both the pixel shader pipes and the ROP pipelines.

NVIDIA’s drivers for the 6200 will allocate as much as 128MB of system memory for graphics on an as-needed basis in a 512MB system. The limits are lower with less RAM, and it’s possible that future drivers will allocate up to 256MB of RAM in systems with plenty of memory.

Taken together, the local video RAM plus system RAM should produce copious amounts of available memory bandwidth. For instance, in a system with dual channels of DDR400 memory, a 6200 with a 32-bit path to memory would have a total of 9.2GB/s of theoretical peak memory bandwidth (2.8GB/s local plus 6.4GB/s system). A 64-bit version could have as much as 12GB/s available to it. By contrast, a Radeon X300 card with a 128-bit memory subsystem would have 6.4GB/s of peak theoretical bandwidth.

Unfortunately, going out to system memory introduces latency, or delays between the time that data is requested and data begins to arrive. If one thinks of the system as a network, going from the GPU to local RAM is a single “hop,” while going from the GPU to system RAM in a Pentium 4 system is two hops: from the GPU over PCI Express to the north bridge, and from the north bridge to system RAM. Worse yet, the trip from the GPU to system RAM is three hops on an Athlon 64 system: from the GPU over PCI Express to the chipset, from the chipset to the CPU over HyperTransport, and from the CPU’s memory controller to system RAM. Each additional chip-to-chip hop introduces longer delays.

We can illustrate the difference between a single-hop memory access and a two-hop access by looking at the results from one of our recent processor reviews. In this case, the Pentium 4 is doing a two-hop access to memory (from CPU to north bridge over the system bus, and from the north bridge into RAM) while the Athlon 64’s integrated memory controller allows a one-hop access to RAM. The Pentium 4’s memory access latencies are nearly twice those of the Athlon 64.

This latency penalty is one reason why main system memory hasn’t been a good place for 3D graphics solutions to store data, and it’s also why the GeForce 6200 will come with one or two faster local DRAM chips onboard.

NVIDIA claims the 6200 is designed to mask latency, and that the graphics pipeline itself is designed to do so. They say graphics involves lots of independent memory accesses and parallel work, so with adequate buffering to do all the work “in flight” on the GPU, the 6200 should perform reasonably well. The parts of the NV44’s pipelines that handle calculations are unchanged from other NV4x chips, save for the noted changes to the ROPs, but the MMU is new. Presumably, the data paths between the MMU and other parts of the chip include a fair amount of buffering.

Interestingly enough, NVIDIA says real-world bandwidth between the system RAM and the GPU will be limited by the platform. They claim the Intel 900-series chipsets can achieve about 3GB/s of throughput from memory to the GPU, and only 1GB/s of throughput in the opposite direction—well below the 4GB/s bidirectional bandwidth promised by PCI Express x16. Although AMD64 systems will have higher memory access latencies by nature, NVIDIA says “faster K8 chipsets” will achieve more bandwidth to the GPU than Intel’s 900-series platform.

 

The case for using system RAM
Obviously, the point of TurboCache is to allow higher-performance graphics at lower prices. NVIDIA points out all sorts of graphics geeky reasons why the time is right for TurboCache to exist. I’ll try sum them up in three points. First, of course, is the advent of PCI Express and the additional bandwidth it provides, especially the bandwidth coming back from the GPU into main memory, which was problematic with AGP. Second, games are now using more memory than ever for renderable surfaces, all of which requires storage space. 128MB is commonly required for best quality by some newer games, and 256MB is the next milestone. The ability to allocate a portion of system RAM for graphics should help enable cheaper graphics solutions to handle these more demanding applications. Third, although memory space requirements are growing, NVIDIA says programmable shading is reducing the need for memory bandwidth. Since real-world bandwidth and latency are interrelated, this point is important, even though we’ve been throwing around some relatively big numbers for total peak theoretical memory bandwidth in a 6200 with TurboCache config. Here’s a quick summary NVIDIA provided to illustrate how programmable shaders might cut the need for memory bandwidth versus traditional multipass rendering.

Of course, this is a very extreme example, but it makes the point. Programmable shading is reducing bandwidth pressure for color and Z writes (this is also the reason why the NV44 can live with only two ROPs). Examples like this one will be more pertinent as programmable shading becomes more prevalent.

NVIDIA says TurboCache offers better overall system performance than an integrated chipset graphics solution, and that makes sense, given the fact that the 6200 has local frame buffer to handle non-3D display tasks. The company also claims TurboCache is ideal for laptops, because fewer RAM chips onboard will allow for reduced power consumption.

In fact, the folks at NVIDIA so believe in the TurboCache scheme that they have no apparent plans for a future GeForce 6200 solution that doesn’t use it. The current, NV43-derived GeForce 6200 will eventually die off. One place where TurboCache will not prevail is on AGP systems, where the GeForce FX line will continue as NVIDIA’s low-end solution.

We have been hearing for a while now about a new virtual memory hierarchy coming for graphics, possibly along with Microsoft’s Longhorn OS. Obviously, TurboCache is a big step in that direction, although it’s not quite the whole enchilada. When asked about the possibility of seeing TurboCache used on higher-end card, as a means of augmenting their larger pools of local RAM, NVIDIA said to “stay tuned.” They also were quick to talk down the prospects for ATI’s competing HyperMemory technology, claiming it will only be the equivalent of AGP texturing and nothing more. We’ll have to see about that.

The selling of TurboCache
One of the more sensitive issues NVIDIA will have to address will be how to sell the GeForce 6200 with TurboCache. Feisty consumers may not like the sound of a graphics card with only 16MB or 32MB of memory, so NVIDIA and its partners will need to sell these cards confidently without being deceptive. NVIDIA offered us a look at a product box mock-up with some carefully worded language about the GeForce 6200 graphics card “supporting 128MB.” The fine print on the box includes a quick description of the TurboCache memory sharing scheme and notes that 512MB of memory is required to get “full 128MB support.” All of this is well and good, but the actual amount of local memory on the board is completely omitted in NVIDIA’s example. Likewise, NVIDIA’s first drivers for the 6200 with TurboCache simply show a 128MB graphics device, with no notation of the amount of memory actually on the card.

I understand the need to sell this solution to wary consumers in an appropriate manner, but I expect NVIDIA and its partners will have to give a little here. The performance difference between a one-chip/16MB and two-chip/32MB 6200 with TurboCache is notable, and folks will want to be able to determine which version of the card they’re getting. NVIDIA partners may also choose to introduce cards with denser 32MB DRAM chips, and they’ll want to tell the world about the extra local frame buffer space.

 

The cards
We have a pair of 6200 cards to review. The first has a single 16MB DRAM chip onboard with a 32-bit path to RAM, and the second has two chips, for 32MB and a 64-bit data path.


The GeForce 6200 with TurboCache


A single RAM chip peeking out from beneath the passive heat sink

The only cosmetic difference between the two cards is the second RAM chip populating a pad on the back side of the 32MB card.

Test notes
The tests on the following pages were intended to stress the TurboCache scheme. We wanted to see how the 6200 with TurboCache would perform with lots of accesses to main memory, so we chose to test new games at higher quality settings with lots big textures. We also wanted to test some of the theories about shader effects reducing bandwidth pressure, so we tested some scenarios where shader effects are rather intensive. In short, we tested the GeForce 6200 with TurboCache much like a mid-range graphics card. We believe that’s a fair thing to do, in part because the latest games tend to look better at their highest quality settings at lower display resolutions than they do at higher resolutions and lower quality settings.

We’ve included an expected rival to the 6200 with TurboCache, the Radeon X300 with 128MB of local frame buffer memory behind a 128-bit data path. We’ve also included a graphics solution from ATI that’s very similar to the 6200 with TurboCache: the Radeon Xpress 200 IGP. Like the 6200, the Radeon Xpress 200’s integrated graphics solution uses 16MB or 32MB of local memory in addition to system memory. The IGP can interleave accesses to system memory and local memory in order to get more total effective bandwidth.

However, we have to acknowledge that the Radeon Xpress 200 is overmatched here. It has only two pixel pipelines running at up to 350MHz core clock speeds (actual configs may vary), and it has no real vertex engines, relying on the CPU to do most of that work. Also, the motherboard we’re testing has only 16MB of local memory onboard, so it’s less than optimal. We configured that board to use coarse-grained interleaving between local RAM and system RAM in order to achieve a total of 144MB of memory allocated to graphics. ATI recommends fine-grained interleaving for optimal performance, but that would limit us to 32MB of total graphics RAM—not enough.

Still, the Radeon Xpress 200 should serve as an able stand-in for chipset-based graphics solutions, because it is the very best chipset-based graphics processor on the market today. We’ve already see the Xpress 200 IGP outperform the GMA 900 graphics processor in Intel’s 915G chipset. We should be able to get some sense from these results how a TurboCache solution compares to a chipset-based IGP.

Finally, we have used more conservative memory timings in our test system than usual, because we wanted to be realistic in testing TurboCache performance. Most consumer PCs are sold with low-grade memory that runs at relaxed timings. We’ve chosen 2.5-3-3 at 400MHz as a reasonably representative set of timings. Running 2-2-2 RAM would have given the 6200 with TurboCache an advantage over the typical system in which it will likely be deployed.

 

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged. All graphics driver image quality settings were left at their defaults, with the exception that vertical refresh sync (vsync) was always disabled and geometry instancing was enabled on the X300 card.

Our test systems were configured like so:

Processor Athlon 64 3500+ 2.2GHz (90nm) Athlon 64 3500+ 2.2GHz (90nm)
System bus 1GHz HyperTransport 1GHz HyperTransport
Motherboard NVIDIA reference ATI reference
BIOS revision 4.70 B10
North bridge nForce4 Ultra Radeon Xpress 200
South bridge Radeon Xpress 200
Chipset drivers ForceWare 6.31 beta 10/31/04 beta
Memory size 1GB (2 DIMMs) 1GB (2 DIMMs)
Memory type OCZ PC3200 EL DDR SDRAM at 400MHz OCZ PC3200 EL DDR SDRAM at 400MHz
CAS latency (CL) 2.5 2.5
RAS to CAS delay (tRCD) 3 3
RAS precharge (tRP) 3 3
Cycle time (tRAS) 10 10
Hard drive Maxtor MaXLine III 250GB SATA 150
Audio Integrated Integrated
Graphics 1 GeForce 6200 with TurboCache (32b/16MB) PCI-E with ForceWare 71.20 drivers Radeon Xpress 200 IGP at 350MHz with 16MB LFB and 8.07-04109a-018757E drivers 
Graphics 2 GeForce 6200 with TurboCache (64b/32MB) PCI-E with ForceWare 71.20 drivers  
Graphics 3  Radeon X300 128MB PCI-E with Catalyst 4.12 drivers  
OS Microsoft Windows XP Professional
OS updates Service Pack 2, DirectX 9.0c

Thanks to OCZ for providing us with memory for our testing. If you’re looking to tweak out your system to the max and maybe overclock it a little, OCZ’s RAM is definitely worth considering.

Also, all of our test systems were powered by OCZ PowerStream power supply units. The PowerStream was one of our Editor’s Choice winners in our latest PSU round-up.

The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate.

We used the following versions of our test applications:

The tests and methods we employed are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

 

Pixel filling power
Here’s how the GeForce 6200 with TurboCache matches up against the competition in terms of some key theoretical specs. Of course, these numbers don’t account for the TurboCache scheme.

  Core clock (MHz) Pixel pipelines  Peak fill rate (Mpixels/s) Texture units per pixel pipeline Peak fill rate (Mtexels/s) Memory clock (MHz) Memory bus width (bits) Peak memory bandwidth (GB/s)
GeForce 6200 with TurboCache (32b/16MB) 350 4* 700 1 1400 700 32 2.8*
GeForce 6200 with TurboCache (64b/32MB) 350 4* 700 1 1400 700 64 5.6*
GeForce 6200 300 4 1200 1 1200 TBD 128 TBD
Radeon X300 325 4 1300 1 1300 400 128 6.4
Radeon X600 Pro 400 4 1600 1 1600 600 128 9.6
GeForce FX 5700 Ultra 475 4 1900 1 1900 900 128 14.4
Radeon 9600 XT 500 4 2000 1 2000 600 128 9.6
Radeon X600 XT 500 4 2000 1 2000 740 128 11.8
GeForce 6600 300 8* 1200 1 2400 TBD 128 TBD

I’ve listed only the local memory bandwidth of the GeForce 6200 cards with TurboCache. Total memory bandwidth will be somewhere north of the numbers you see above. Beyond that, the 6200’s disconnect between pixel shaders and ROPs makes the chart above difficult, as well. The peak throughput numbers are theoretically correct for both pixel and texel fill rates, but we’ve slapped an asterisk on the pipeline count to keep things honest.

In 3DMark03’s synthetic fill rate tests, the GeForce 6200 with a 64-bit/32MB TurboCache lives up to its specs. The 6200 card with 32-bit/16MB TurboCache, however, can’t keep up in the single-texturing test, likely because of the performance penalty associated with going out to system RAM.

 

Doom 3
We’ll kick off our gaming benchmarks with Doom 3. Our first Doom 3 test uses a gameplay demo we recorded inside the Delta Labs complex, and it represents the sorts of graphics loads you’ll find in most of the game’s single-player levels. We’ve tested with Doom 3’s High Quality mode, which turns on 8X anisotropic filtering by default.

Not bad! The 6200 with 32-bit/16MB TurboCache keeps pace with the Radeon X300, and the 6200 with TurboCache 64-bit/32MB beats the X300 soundly. We’ve omitted the Radeon Xpress 200 from these results because of its inability to render Doom 3’s specular lighting properly.

This next demo was recorded in order to test a specific effect in Doom 3: that cool-looking “heat haze” effect that you see whenever a demon hurls a fireball at you. We figured this effect would be fairly shader intensive, so we wanted to test it separately from the rest of the game.

Picking up the shader effects doesn’t change the relative performance of these cards much. The 6200 card with 64-bit/32MB TurboCache is still on top, and other two cards are still nearly tied.

 

Half-Life 2
Our new Half-Life 2 demo features some outdoor combat, some indoor combat, and a run down the river in an airboat.

The 6200 card with a 64-bit/32MB TurboCache can’t quite match the Radeon X300, but it’s close. The 32-bit/16MB card struggles even more, but it still faster than the Radeon Xpress 200 with 16MB of its own local RAM.

Unreal Tournament 2004
Our UT2004 shows me smacking down some bots in an Onslaught game.

Relative performance in UT2004 is about like it is in Half-Life 2, but the 6200 with 64-bit/32MB TurboCache manages to outdo the X300 at higher resolutions.

 

Far Cry
The Pier level in Far Cry is an outdoor area with dense vegetation, and it makes good use of geometry instancing to populate the jungle with foliage.

Like our Doom 3 “heat haze” demo, the Volcano level in Far Cry includes lots of pixel shader warping and shimmering.

Far Cry shows us the same basic performance mix we saw in UT2004 and Half-Life 2. The GeForce 6200 with 64-bit/32MB TurboCache meets or beats the Radeon X300. The 32-bit/16MB TurboCache card is quite a bit slower, but it’s still faster than the fastest integrated graphics solution around.

 

3DMark03

3DMark03 shows us a couple of interesting things. The game tests confirm the general performance picture we’ve already seen, and the shader tests show how capable the GeForce 6200’s vertex and pixel shaders are.

 
Power consumption
With each of the graphics cards installed and running, I used a watt meter to measure the power draw of our test systems. The monitor was plugged into a separate power source. The cards were tested at idle in the Windows desktop and under load while running our Doom 3 “heat haze” demo at 1280×1024.

The TurboCache scheme really does seem to save some power consumption. The 6200 with 32-bit/16MB TurboCache pulls about five fewer watts at idle than the 64-bit/32MB version, and both are more efficient at idle than the Radeon X300. Under load, the 6200 with 64-bit/32MB TurboCache does consume a little more power than the X300, though.

 
Conclusions
The GeForce 6200 with TurboCache breaks some conventions, but it seems to make sense. This solution’s performance isn’t going to bowl anybody over, but the GeForce 6200 with a 64-bit/32MB TurboCache is fast enough to match a Radeon X300 with a full 128MB of local memory onboard. That’s solid enough performance to convince me that this solution works, and I’d certainly prefer it to most of the other graphics solutions in the $79-99 range that NVIDIA is targeting.

And, jeez, I wish my laptop PC had this instead of 32MB of total video memory. Ugh.

The relatively weak performance of the GeForce 6200 with 32-bit/16MB TurboCache demonstrates that the TurboCache scheme isn’t magic. NVIDIA isn’t quite as able to mask system memory access latencies as one might hope, and bandwidth pressure isn’t dramatically relieved by pixel shaders in today’s games. The 32-bit/16MB TurboCache card may be superior to integrated chipset solutions, but I’m not convinced its performance is superior in a way that matters. The 64-bit/32MB TurboCache card really is the one to have. I’m curious to see whether a 64-bit/64MB config with a pair of 32MB DRAMs would substantially outperform the 64-bit/32MB card we tested. However you cut it, though, the TurboCache scheme or something like it appears to be the future of low-end graphics.

That said, NVIDIA deserves some praise for the GeForce 6200 GPU independent of the caching scheme. This graphics chip offers near-feature-parity with NVIDIA’s high-end GPUs, and it brings an amazing feature set to the sub-$100 portion of the graphics market, including Shader Model 3.0 and three real vertex shader engines. The fact that this thing runs Doom 3 at over 70 frames per second in High Quality mode at 640×480 impresses the heck out of me.

The again, the fact that it does so with only 32MB of local memory is a sign of true innovation. 

Latest News

Lenovo Plans To Introduce A New Generation Of AI PCs
News

Lenovo Plans to Introduce a New Generation of AI PCs with Native AI Features

5 Major Tech Companies Announce Fresh Layoffs in March
News

5 Major Tech Companies Announce Fresh Rounds of Layoffs in March

Another month, another round of layoffs – the tech industry has been hit by multiple rounds of layoffs ever since life resumed after the pandemic and it looks like we...

Bitcoin ETF Records Inflows After Bottoming Out Favoring Self-Custody Investors
Crypto News

Bitcoin ETF Records Inflows After Bottoming Out Favoring Self-Custody Investors

Bitcoin appears to have found a sturdy floor, as dip buyers have emerged with conviction from institutional and individual investor camps.  Available data points to heavy accumulation from self-custodial holders,...

Coinbase Gears Up To Onboard Business Onchain With Plans To Store Users’ USDC On Base
Crypto News

Coinbase Gears Up To Onboard Business Onchain With Plans To Store Users’ USDC On Base

Bitcoin Primed for Massive Short Squeeze as Institutions Refuse to Blink
Crypto News

Bitcoin Primed for Massive Short Squeeze as Institutions Refuse to Blink

Shiba Inu Exceeded Expectations, Says Ethereum Founder Vitalik Buterin
Crypto News

Shiba Inu (SHIB) Exceeded Expectations, Says Ethereum Founder Vitalik Buterin

Ripple (XRP) Gained Massively from Recent Market Boom, Forbes Report
Crypto News

Ripple (XRP) Gained Massively from Recent Market Boom, Forbes Report