Nvidia's GeForce GTX 460 graphics processor

We’ve been following the story of the Fermi architecture for the better part a year now, since Nvidia first tipped its hand about plans for a new generation of DirectX 11-class GPUs. Fermi’s story has been one of the more intriguing developments over that span of time, because it involves great ambitions and the strains that go with attempting to achieve them. Nvidia wanted its new top-of-the-line GPU to serve multiple markets, both traditional high-end graphics cards and the nascent market for GPUs as parallel computing engines. Not only that, but Fermi was to be unprecedentedly capable in both domains, with a novel and robust programming model for GPU computing and a first-of-its-kind parallel architecture for geometry processing in graphics.

Naturally, that rich feature set made for a large and complex GPU, and such things can be deadly in the chip business—especially when a transition to a new architecture is mated with an immature chip fabrication process, as was the case here. Time passed, and the first Fermi-based chip, the GF100, became bogged down with delays. Rumors flew about a classic set of problems: manufacturing issues, silicon re-spins, and difficult trade-offs between power consumption and performance. Eventually, as you know, the GF100 arrived in the GeForce GTX 470 and 480 graphics cards, which turned out to be reasonably solid but not much faster than the then-six-month-old Radeon HD 5870—which is based on a much smaller, cheaper-to-produce chip.

Whoops.

The GF100, though, has a lot of extra fat in it that’s unnecessary for, well, video cards. We wondered at that time, several months ago, whether a leaner version of the Fermi architecture might not be a tougher competitor. If you’ll indulge me, I’ll quote myself here:

We’re curious to see how good a graphics chip this generation of Nvidia’s technology could make when it’s stripped of all the extra fat needed to serve other markets: the extensive double-precision support, ECC, fairly large caches, and perhaps two or three of its raster units. You don’t need any of those things to play games—or even to transcode video on a GPU. A leaner, meaner mid-range variant of the Fermi architecture might make a much more attractive graphics card, especially if Nvidia can get some of the apparent chip-level issues worked out and reach some higher clock speeds.

Sounds good, no? Well, I’m pleased to report that nearly all of that has come to pass in the form of a GPU known as the GF104. What’s more, the first graphics cards based on it, to be sold as the GeForce GTX 460 768MB and 1GB, are aimed directly at the weak spot in the Radeon’s armor: the $199-229 price range.

A new Fermi: GF104
The GF104 GPU is undoubtedly based on the same generation of technology as the GF100 before it, but to thrust them both under the umbrella of the same architecture almost feels misleading. In truth, the GF104 has been pretty radically rebalanced in terms of the number and type of functional units onboard, clearly with an eye toward more efficient graphics performance. We’ll illustrate that point with a high-level functional block diagram of the GPU. If you’d like to compare against the GF100, a diagram and our discussion of that GPU is right here.

Block diagram of the GF104. Source: Nvidia.

These diagrams are becoming increasingly hard to read as the unit counts on GPUs mushroom. Starting with the largest elements, you can see that there are only two GPCs, or graphics processing clusters, in the GF104. The GF100 has four. As a result, the number of SMs, or shader multiprocessors, is down to eight. Again, GF100 has twice as many. The immediately obvious result of these cuts is that GF104 has half as many raster and polymorph engines as the GF100, which means its potential for polygon throughput is substantially reduced. That’s very much an expected change, and not necessarily a major loss at this point in time.

Another immediately obvious change is a reduction in the number of memory controllers flanking the GPCs. The GF104 has four memory controllers and associated ROP partitions, while the GF100 has six. What you can’t tell from the diagram is that, apparently, 128KB of L2 cache is also associated with each memory controller/ROP group. With four such groups, the GF104 features 512KB of L2 cache, down from 768K on the GF100. The local memory pools on the GF104 are different in another way, too: the ECC protection for these memories has been removed, since it’s essentially unneeded in a consumer product—especially a graphics card.

Our description so far may lead you to think the GF104 is simply a GF100 that’s been sawed in half, but that’s not the case. To understand the other changes, we need to zoom in on one of those SM units and take a closer look.

Block diagram of an SM in the GF104. Source: Nvidia.

Each SM in the GF104 is a little “fatter” than the GF100’s. You can count 48 “CUDA cores” in the diagram above, if you’re so inclined. That’s an increase from 32 in the GF100. We’re not really inclined to call those shader arithmetic logic units (ALUs) “cores,” though. The SM itself probably deserves that honor.

While we’re being picky, what you should really see in that diagram is a collection of five different execution units: three 16-wide vector execution units, one 16-wide load/store unit, and an eight-wide special function unit, or SFU. By contrast, the GF100’s SM has two 16-wide execution units, one 16-wide load/store unit, and a four-wide SFU block. The GF104 SM’s four dispatch units represent a doubling from the GF100, although the number of schedulers per SM remains the same.

The end result of these modifications is an SM with considerably more processing power: 50% more ALUs for general shader processing and double the number of SFUs to handle interpolation and transcendentals—both especially important mathematical operations for graphics. The doubling of instruction dispatch bandwidth should help keep the additional 16-wide ALU block occupied with warps—groups of 32 parallel threads or pixels in Nvidia’s lexicon—to process.

One place where the GF104’s SM is less capable is double-precision math, a facility important to some types of GPU computing but essentially useless for real-time graphics. Nvidia has retained double-precision support for the sake of compatibility, but only one of those 16-wide ALU blocks is DP-capable, and it processes double-precision math at one quarter the usual speed. All told, that means the GF104 is just 1/12 its regular speed for double-precision.

Another big graphics-related change is the doubling of the number of texture units in the SM to eight. That goes along nicely with the increase in interpolation capacity in the SFUs, and it grants the GF104 a more texturing-intensive personality than its elder sibling.

Boil down all of the increases here and decreases there versus the GF100, and you begin to get a picture of the GF104 as a chip with a rather different balance of internal graphics hardware—one that arguably better matches the demands of today’s games.

	ROP pixels/ clock	Textures filtered/ clock	Shader ALUs	Triangles/ clock	Memory interface width (bits)
GF100	48	64	512	4	384
GF104	32	64	384	2	256
Cypress	32	80	1600	1	256

The GF104 is a smaller chip aimed at a broader market than GF100, of course, so some compromises were necessary. What’s interesting is where those compromises were made. ROP throughput (which determines pixel fill rate and anti-aliasing power), shader ALU count, and memory interface width are each reduced by a third. The triangle throughput for rasterization (and tessellation, via the polymorph engines) is cut in half. Yet texturing capacity holds steady, with no reduction at all. When you consider that Nvidia’s shader ALUs run at twice the frequency of the rest of the chip and are typically more efficient than AMD’s, the GF104’s balance begins to look quite a bit like AMD’s Cypress, in fact.

That said, Nvidia is unquestionably following its own playbook here. A couple of generations back, the firm reshaped its enormous G80 GPU into a leaner, meaner variant. In the process, it went from a 384-bit memory interface to 256 bits, from 24 ROP units to 16, and from four texture units per SM to eight. The resulting G92 GPU performed nearly as well as the G80 in many games, and it became a long-running success story.

Sizing ‘er up

	Estimated transistor count (Millions)	Approximate die size (mm²)	Fabrication process node
G92b	754	256	55-nm TSMC
GT200	1400	576*	65-nm TSMC
GT200b	1400	470*	55-nm TSMC
GF104	1950	320*	40-nm TSMC
GF100	3000	529*	40-nm TSMC
RV770	956	256	55-nm TSMC
Juniper	1040	166	40-nm TSMC
Cypress	2150	334	40-nm TSMC

Chip die sizes are interesting because they tell us something about how much it costs to produce a chip and about the efficiency of its architecture and design. We’d like to compare the GF104 to its stablemates and competitors to get a better sense of things. Uniquely among the major players in PC semiconductors, though, Nvidia refuses to divulge die sizes for its chips. That’s a quirky thing, in a very Nvidia sort of way, since finding out a chip’s die size isn’t especially difficult. Heck, I’d have measured the GF104 myself by now if my X-Acto knife blade could wedge in just a little further under the metal cap that covers it. I’ll get there eventually.

In the meantime, we have our highly scientific “find the most widely reported number that looks right to you” method of obtaining Nvidia die sizes. This information could be wrong, especially in the case of a new chip like the GF104, but it’s probably not far off. I’ve added asterisks to the table on the right for die sizes gathered from around the web.

To give you more of a sense of things, the pictures below show chips where possible and “integrated heat spreaders”—that is, metal caps—where necessary. The quarter is there as a size reference and is not an FCC requirement for video cards. Based on its estimated transistor count (which comes from Nvidia) and process node, the size of the chip package, and its rumored die area culled from here, the GF104 looks to be very close in total area to Cypress, though more oblong in shape. Whether the GF104 can reach the same performance heights as Cypress does aboard the Radeon HD 5870 is an open question. Its mission in the GeForce GTX 460 is quite a bit more modest.

Juniper

RV770

Cypress

The GF104’s metal cap

The 55-nm G92b

The 65-nm GT200 under its metal cap

The GF100’s metal cap

The cards
The first GF104-based graphics card, the GeForce GTX 460, will come in two flavors. Both will use a scaled-back GF104 chip with one of its eight SMs disabled, leaving a total of 336 ALUs and 56 texels per clock of filtering capacity. The two share common clock speeds: a 675MHz core, 1350MHz shader ALUs, and 900MHz (3600 MT/s) GDDR5 memory.

The tastier flavor of the GTX 460 is the 1GB version, which has 32 ROPs, 512KB of L2 cache, and a 256-bit path to memory. The card’s max power requirement, or TDP, is 160W, and it requires two 6-pin auxiliary power inputs. Nvidia says this card will sell for $229.

The more accessible version is the 768MB card. Since it’s down one memory interface/ROP partition, it has 24 ROPs, a 192-bit memory path, and 384KB of L2 cache. This version has a slightly lower 150W TDP, but it also requires dual 6-pin power inputs. Accepting this card’s smaller memory size, lower bandwidth, lesser ROP throughput, and smaller cache will save you 30 bucks off the list price, since it should sell for $199.

We’d really prefer that Nvidia had used its magical powers of branding here to set a bright and shining line between these two products. You’re giving up a lot more than 256MB of memory by going with the 768MB version, and we understand that “GeForce GTX 455” is available. You know there will be folks who pick up a GTX 460 768MB without realizing it’s a lesser product. The GTX 460 768MB isn’t bad, but such confusion isn’t good for consumers.

Pictured above are a couple of GeForce GTX 460 cards. The one on the left is Nvidia’s reference design, and the one of the right comes from Zotac. (The reference card is a 768MB version, but the reference 1GB card looks just the same.) Nvidia’s rendition of the GTX 460 has a pair of DVI outputs and a mini-HDMI connector. Zotac’s offering is simply better, with full-sized HDMI and DisplayPort outputs, along with dual DVI ports.

Here’s a surprise: the GTX 460 supports bitstream audio over HDMI for Dolby True HD and DTS-HD Master Audio. I believe that’s a first for Nvidia graphics cards, and it should make the GTX 460 a nice candidate for an HTPC system.

Notice that these two cards have different coolers. The reference design uses a fan that has a Zalman-esque heatsink beneath, while Zotac’s card employs a blower. Nvidia expects some board vendors to use its reference cooler, but obviously, others will not. We have no problem in theory with a blower like the one Zotac uses; lots of high-end graphics move air efficiently and quietly with blowers. However, the reference cards’ fans are very noticeably—and measurably—quieter. We think it’s possible this particular blower on our Zotac card may have a bad bearing or something, because it did tend to rattle a bit at times, but we only have the one card to judge.

The GTX 460 sports a very compact board design, with a total length of only 8.25″. As you can see from the pictures above, that’s quite a bit shorter than some cards in this price class, most notably the Radeon HD 5830. Since AMD didn’t produce a reference PCB design for the 5830, most manufacturers have based their cards on the Radeon HD 5870 PCB. That makes the 5830 a relatively lengthy card for this class, and it could present fit problems in smaller cases. Even though it’s based on a much larger GPU, the GTX 460 is no longer than the Radeon HD 5770.

The competitive landscape
Speaking of the competition, we should probably map out the landscape before we move on. Heck, we’ve gotten email reminders from both AMD and Nvidia during the past few days to help us sort out the situation. Both firms gave us current expected pricing on their product lineups, so we can share that with you. We also dug up the best available prices on Newegg as of late last week.

We’ve had to add another column to the table below in order to deal with some bizarre behavior by Nvidia, its partners, and online retailers. You can’t just pull up a listing on Newegg and check out the prices of GeForce cards. Instead, you must “click to see price in cart.” Once you’ve done so, you’ll see both the price you’ll pay for that individual product and a potential net price based on a mail-in rebate offer. I hate mail-in rebates; it’s a shady practice that depends on many customers not getting paid. Nvidia has apparently gone all in on the rebate thing, though. You can barely buy a GeForce at its purported list price without going that route, so we’ve added a column to show the net price after rebate.

	Pixel fill rate (Gpixels/s)	Filtering rate (Gtexels/s)	Memory bandwidth (GB/s)	Mfr’s expected list price	Street price	Net after MIR
GeForce GTX 460 768MB	16.2	37.8	86.4	$199	–	–
GeForce GTX 460 1GB	21.6	37.8	115.2	$229	–	–
GeForce GTX 465	19.4	26.7	102.6	$249	$279	$249
GeForce GTX 470	24.3	34.0	133.9	$329	$349	$329
GeForce GTX 480	33.6	42.0	177.4	$499	$499	$459
Radeon HD 5770	13.6	34.0	76.8	$149	$149	–
Radeon HD 5830	12.8	44.8	128.0	$199	$199	–
Radeon HD 5850	23.2	52.2	128.0	$299	$289	–
Radeon HD 5870	27.2	68.0	153.6	$399	$389	–
Radeon HD 5870 2GB	27.2	68.0	153.6	$499	$499	–

We can make a few observations based on these prices and specs.

Zotac’s GeForce GTX 465

The GeForce GTX 465 was introduced just over a month ago—heck, this is our first test of the thing—but it has little reason to exist now that the GTX 460 is here. Nvidia tells us it has cut the price on the GTX 465 to $249 so the two products can coexist, but I think that’s marketing code for, “We’re clearing out our remaining inventory.”

XFX’s Radeon HD 5830

We weren’t especially taken with the Radeon HD 5830 when it debuted at between $240 and $269. Now that it’s solidly down to $199, though, it’s not such a raw deal anymore. Might even be a decent one! Obviously, the 5830 is the closest competition for the GTX 460, priced exactly opposite the 768MB variant.

Other competitors worth watching include the Radeon HD 5770, which packs an awful lot of bang for the buck at $149, and the Radeon HD 5850 at $289. If the GPU market becomes cutthroat competitive again, I could see AMD lowering prices on the 5850 to match the GTX 460. Eventually. Maybe.

Beyond that, AMD and Nvidia had several things to say about EyeX and Physfinity and… eh, I forget. Truth is, there are reasons to choose one brand of GPU over another, but both have their merits. For all of their complexity and variance, today’s GPUs really are made to do essentially the same things. I’m not saying one should buy a video card based solely on price and performance. There’s sticker color to be considered, after all. But the DX11 offerings from both major players are awfully similar these days in terms of graphics feature sets and image quality.

Test notes
Somewhat unusually, all but one of the cards we’re testing run at the base clock speed specified for the product by the GPU maker. Oftentimes, board makers will range beyond that base clock a little bit to make their products more distinctive, but that’s happening less often these days for various reasons. The one exception in the group today is Asus’ ENGTX260 TOP SP216, whose core and shader clocks are 650 and 1400MHz, respectively, and whose memory speed is 2300 MT/s. The GTX 260 displayed uncommon range during its lifespan, adding an additional SP cluster and getting de facto higher clock speeds on shipping products over time. The Asus card we’ve included represents the GTX 260’s highest point, near the end of its run.

Similarly, the Radeon HD 4870 we’ve tested is the later version with 1GB of memory.

Many of our performance tests are scripted and repeatable, but for a couple of games, Battlefield: Bad Company 2 and Metro 2033, we used the FRAPS utility to record frame rates while playing a 60-second sequence from the game. Although capturing frame rates while playing isn’t precisely repeatable, we tried to make each run as similar as possible to all of the others. We raised our sample size, testing each FRAPS sequence five times per video card, in order to counteract any variability. We’ve included second-by-second frame rate results from FRAPS for those games, and in that case, you’re seeing the results from a single, representative pass through the test sequence.

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and we’ve reported the median result.

Our test systems were configured like so:

Processor	Core i7-965 Extreme 3.2GHz
Motherboard	Gigabyte EX58-UD5
North bridge	X58 IOH
South bridge	ICH10R
Memory size	12GB (6 DIMMs)
Memory type	Corsair Dominator CMD12GX3M6A1600C8 DDR3 SDRAM at 1600MHz
Memory timings	8-8-8-24 2T
Chipset drivers	INF update 9.1.1.1025 Rapid Storage Technology 9.6.0.1014
Audio	Integrated ICH10R/ALC889A with Realtek R2.49 drivers
Graphics	Radeon HD 4870 1GB with Catalyst 10.6 drivers
	Gigabyte Radeon HD 5770 1GB with Catalyst 10.6 drivers
	XFX Radeon HD 5830 1GB with Catalyst 10.6 drivers
	Radeon HD 5850 1GB with Catalyst 10.6 drivers
	Asus Radeon HD 5870 1GB with Catalyst 10.6 drivers
	Asus ENGTX260 TOP SP216 GeForce GTX 260 896MB with ForceWare 258.80 drivers
	GeForce GTX 460 768MB with ForceWare 258.80 drivers
	Zotac GeForce GTX 460 1GB with ForceWare 258.80 drivers
	Zotac GeForce GTX 465 1GB with ForceWare 258.80 drivers
	GeForce GTX 470 1280MB with ForceWare 258.80 drivers
Hard drive	WD Caviar SE16 320GB SATA
Power supply	PC Power & Cooling Silencer 750 Watt
OS	Windows 7 Ultimate x64 Edition DirectX runtime update June 2010

Thanks to Intel, Corsair, Gigabyte, and PC Power & Cooling for helping to outfit our test rigs with some of the finest hardware available. AMD, Nvidia, XFX, Asus, Sapphire, Zotac, and Gigabyte supplied the graphics cards for testing, as well.

Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.

We used the following test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Running the numbers

	Peak pixel fill rate (Gpixels/s)	Peak bilinear INT8 texel filtering rate* (Gtexels/s) *FP16 is half rate	Peak memory bandwidth (GB/s)	Peak shader arithmetic (GFLOPS)
GeForce GTS 250	12.3	49.3	71.9	484
GeForce GTX 260 (216 SPs)	18.2	46.8	128.8	605
GeForce GTX 275	17.7	50.6	127.0	674
GeForce GTX 285	21.4	53.6	166.4	744
GeForce GTX 460 768MB	16.2	37.8	86.4	907
GeForce GTX 460 1GB	21.6	37.8	115.2	907
GeForce GTX 465	19.4	26.7	102.6	855
GeForce GTX 470	24.3	34.0	133.9	1089
GeForce GTX 480	33.6	42.0	177.4	1345
Radeon HD 4850	11.2	28.0	63.6	1120
Radeon HD 4870	12.0	30.0	115.2	1200
Radeon HD 4890	14.4	36.0	124.8	1440
Radeon HD 5770	13.6	34.0	76.8	1360
Radeon HD 5830	12.8	44.8	128.0	1792
Radeon HD 5850	23.2	52.2	128.0	2088
Radeon HD 5870	27.2	68.0	153.6	2720

The numbers above represent theoretical peaks for the GPUs in question. Delivered performance, as we’ll see, is often lower. These numbers are interesting, though, in various ways. For instance, the GeForce GTX 465 trails the GTX 460 1GB in every category of note.

We’ve grown increasingly dissatisfied with the texture fill rate tool in 3DMark Vantage, so I’ve reached back into the cupboard and pulled out an old favorite, D3D RightMark, to test texture filtering performance.

Unlike 3DMark, this tool lets us test a range of filtering types, not just texture sampling rates. Unfortunately, D3D RightMark won’t test FP16 texture formats, but integer texture formats are still pretty widely used in games. I’ve plotted a range of results below, and to make things more readable, I’ve broken out a couple of filtering types into bar charts, as well.

The GTX 460 is true to its specs, outperforming the GTX 465 and nearly matching the GTX 470 in these texture filtering tests. Interestingly, the Radeon HD 5830 is substantially faster than the GTX 460 when doing bilinear filtering, but the GTX 460 becomes relatively stronger as the filtering quality ramps up. The crossover point looks to be 4X anisotropic filtering. Go beyond that, and the GTX 460 is clearly faster. Newer Radeons do have slightly higher filtering quality than recent GeForces, but the difference is hard to detect.

As I’ve noted before, the Unigine Heaven demo’s “extreme” tessellation mode isn’t a very smart use of DirectX 11 tessellation, with too many triangles and little corresponding improvement in image quality. I think that makes it a poor representation of graphics workloads in future games and thus a poor benchmark of overall GPU performance.

Pushing through all of those polygons does have its uses, though. This demo should help us tease out the differences in triangle throughput between these GPUs. To do so, we’ve tested at the relatively low resolution of 1680×1050, with 4X anisotropic filtering and no antialiasing. Shaders were set to “high” and tessellation to “extreme.”

Here’s the one spot where the GeForce GTX 465 has an edge on the 460. Since it’s based on a cut-down GF100, the GTX 465 can process more polygons per clock than the GTX 460. Even so, the GTX 460 768MB is 25% faster than its direct rival, the Radeon HD 5830, and beats out the much more expensive Radeon HD 5870, as well.

Aliens vs. Predator
The new AvP game uses several DirectX 11 features to improve image quality and performance, including tessellation, advanced shadow sampling, and DX11-enhanced multisampled anti-aliasing. Naturally, we were pleased when the game’s developers put together an easily scriptable benchmark tool. This benchmark cycles through a range of scenes in the game, including one spot where a horde of tessellated aliens comes crawling down the floor, ceiling, and walls of a corridor.

For these tests, we turned up all of the image quality options to the max, with two exceptions. We held the line at 2X antialiasing and 8X anisotropic filtering simply to keep frame rates in a playable range with most of these graphics cards. The use of DX11 effects ruled out the use of older, DX10-class video cards, so we’ve excluded them here.

The 5830 and GTX 460 768MB are neck and neck, with no notable separation between them. The GTX 460 1GB and GTX 465 are locked in effective parity, as well.

Just Cause 2
I’ve already sunk more hours than I’d care to admit into this open-world adventure, and I feel another bout coming on soon. JC2 has some flashy visuals courtesy of DirectX 10, and the sheer scope of the game world is breathtaking, as are the resulting view distances.

Although JC2 includes a couple of visual effects generated by Nvidia’s CUDA GPU-computing API, we’ve left those disabled for our testing. The CUDA effects are only used sparingly in the game, anyhow, and we’d like to keep things even between the different GPU brands. I do think the water simulation looks gorgeous, but I’m not so impressed by the Bokeh filter used for depth-of-field effects.

We tested performance with JC2‘s built-in benchmark, using the the “Dark Tower” sequence.

Three frames per second separate the GTX 460 768MB and the 5830. I’ll let you decide whether that margin matters. I should note that the GTX 460 twins bracket the GeForce GTX 260 here. Less than two years ago, the GTX 260 cards like this one sold for around $300, so we are seeing a little bit of progress on the price-performance front, even though we seemed to stall for the first half of 2010.

DiRT 2: DX9
This excellent racer packs a scriptable performance test. We tested at DiRT 2‘s “ultra” quality presets in both DirectX 9 and Direct X 11. The big difference between the two is that the DX11 mode includes tessellation on the crowd and water. Otherwise, they’re hardly distinguishable.

DiRT 2: DX11

This one is very nearly a clean sweep for the GTX 460 768MB over the Radeon HD 5830, right up to the point where the GTX 460 runs out memory in DX11 at 2560×1600. Other than that little hiccup, both cards provide playable frame rates at all of the resolutions tested in both DX9 and DX11. Heck, in all but the last graph there, the GTX 460 1GB mixes it up with the Radeon HD 5850.

Battlefield: Bad Company 2
BC2 uses DirectX 11, but according to this interview, DX11 is mainly used to speed up soft shadow filtering. The DirectX 10 rendering path produces the same images.

Since these are all relatively fast graphics cards, we turned up all of the image quality settings in the game. Our test sessions took place in the first 60 seconds of the “Heart of Darkness” level.

Have a look at the lines plotted in that last graph above for the GTX 460 768MB and the 5830. What an incredibly close contest.

Metro 2033
If Bad Company 2 has a rival for the title of best-looking game, it’s gotta be Metro 2033. This game uses DX10 and DX11 to create some of the best visuals on the PC today. You can get essentially the same visuals using either version of DirectX, but with DirectX 11, Metro 2033 offers a couple of additional options: tessellation and a DirectCompute-based depth of field shader. If you have a GeForce card, Metro 2033 will use it to accelerate some types of in-game physics calculations, since it uses the PhysX API. We didn’t enable advanced PhysX effects in our tests, though, since we wanted to do a direct comparison to the new Radeons. See here for more on this game’s exhaustively state-of-the-art technology.

Yes, Virginia, there is a game other than Crysis that requires you to turn down the image quality in order to achieve playable frame rates on a $200 graphics card. Metro 2033 is it. We had to dial back the presets two notches from the top settings and disable the performance-assassinating advanced depth-of-field effect, too.

We did leave tessellation enabled on the DX11 cards. In fact, we considered leaving out the DX10 cards entirely here, since they don’t produce exactly the same visuals. However, tessellation in this game is only used in a few specific ways, and you’ll be hard pressed to see the differences during regular gameplay. Thus, we’ve provisionally included the DX10 cards for comparison, in spite of the fact that they can’t do DX11 tessellation.

The Fermi-based GPUs have the advantage here, so much so that the GTX 460 1GB essentially matches the Radeon HD 5870. The Radeon HD 5830, meanwhile, falls behind both the GeForce GTX 260 and the Radeon HD 4870—although it is doing tessellation and using DX11, while they are not. The more noteworthy outcome may be the fact that the GTX 460 achieves playable frame rates, with a low of 30 FPS, while the 5830 doesn’t quite cut it. That’s the sort of difference one would notice while gaming.

Borderlands
We tested Gearbox’s post-apocalyptic role-playing shooter by using the game’s built-in performance test. We tested with all of the in-game quality options at their max. We couldn’t enable antialiasing, because the game’s Unreal Engine doesn’t support it.

Here’s another case where the GTX 460 1GB tangles with the pricier Radeon HD 5850 and holds its own. The GTX 460 768MB outclasses the Radeon HD 5830, too, although the 5830 still delivers playable frame rates at 1920×1080. At 2560×1600, the 5830 averages below 28 FPS, which is going to feel sluggish. The GTX 460 768MB is in safer territory.

Power consumption
We measured total system power consumption at the wall socket using an our fancy new Yokogawa WT210 digital power meter. The monitor was plugged into a separate outlet, so its power draw was not part of our measurement. The cards were plugged into a motherboard on an open test bench.

The idle measurements were taken at the Windows desktop with the Aero theme enabled. The cards were tested under load running Left 4 Dead at a 1920×1200 resolution with 4X AA and 16X anisotropic filtering. We test power with Left 4 Dead because we’ve found that this game’s fairly simple shaders tend to cause GPUs to draw quite a bit of power, so we think it’s a solidly representative peak gaming workload.

Overall, the new GeForces look quite decent on power draw. Interestingly, the Radeon HD 5850 draws less power under load than either flavor of GTX 460, yet the slower Radeon HD 5830 draws more. There’s a simple reason for that: the GPU on 5830 has more units disabled, but it also has a higher clock speed than the 5850. Higher clock frequencies increase power draw, and higher voltages are often required to reach them. With a larger board and higher clocks the 5830 isn’t particularly efficient.

The GF104 looks like real progress for Nvidia. The GTX 460 1GB pulls less power at idle and under load than the GTX 465 or the GTX 260, yet it usually matches or outperforms them both.

Noise levels
We measured noise levels on our test system, sitting on an open test bench, using an Extech model 407738 digital sound level meter. The meter was mounted on a tripod approximately 8″ from the test system at a height even with the top of the video card. We used the OSHA-standard weighting and speed for these measurements.

You can think of these noise level measurements much like our system power consumption tests, because the entire systems’ noise levels were measured. Of course, noise levels will vary greatly in the real world along with the acoustic properties of the PC enclosure used, whether the enclosure provides adequate cooling to avoid a card’s highest fan speeds, placement of the enclosure in the room, and a whole range of other variables. These results should give a reasonably good picture of comparative fan noise, though.

These results tell a couple of important stories. First, the difference in sound levels between the GTX 460 768MB and 1GB cards comes from that noisy blower on our Zotac review unit. Nvidia’s reference cooler is much quieter, as are most other video cards. I should say that we also have a GTX 460 1GB reference card, and its noise levels are similar to the 768MB card’s—nice and quiet.

The only thing quieter under load, in fact, is XFX’s custom cooler on its Radeon HD 5830, which is actually a very similar design.

GPU temperatures
We used GPU-Z to log temperatures during our load testing. We had to leave out the GTX 260, though, because it was reporting some obviously incorrect values.

Happily, with relatively low power draw, the reference GTX 460 cooler can remain quiet while keeping GPU temperatures in check.

Conclusions
Boy, is it refreshing to have strong competition at the $200 mark again. The contest between the Radeon HD 5830 and the GeForce GTX 768MB is a narrow but sure win for Nvidia. The GTX 460 768MB is noticeably faster in several games, and it’s more power efficient than the 5830, too. Nvidia’s stock cooler for the GTX 460 is blessedly quiet (unlike the Zotac one, sadly), and this much shorter board design should fit into even the most painfully cramped cases from the likes of Dell and HP.

This is a price point at which we want to recommend a video card, a traditional home of good values, but it’s been rough going for much of this year. The GTX 460 768MB isn’t really a revelation, but it is an improvement over the single DX11 option we’ve had to date. Also, the GTX 460 768MB is generally a superior choice to DX10 cards like the GeForce GTX 260 that have been haunting our sense of video card value. We can now be free of those ghosts, thank goodness.

You can decide for yourself whether it’s worth an additional 30 bucks for the GTX 460 1GB over the 768MB version. In light of what happened in our DiRT 2 DX11 tests at 2560×1600, though, you’ll probably want the larger memory size of the 1GB card if you have a four-megapixel display. Of course, in that case, you’ll probably want an even beefier video card or a second GPU. With some of these newer DX10 and DX11 games, I do think there’s a case to be made for a high-end GPU config once again. One thing I would have a hard time justifying, though, is spending $299 for a Radeon HD 5850 when you can pick up a GTX 460 1GB—which is faster in Metro 2033 and Borderlands and competitive enough elsewhere—for $229. I think it’s finally time for AMD to cut its prices. Just dropping the 5850 back to its, ahem, original introductory price of $259 would be a good start.

We’re curious to see how far the GF104 can go when pushed to its logical limits. With all eight SMs enabled and clock speeds raised as feasible (within the usual power and thermal constraints), could the GF104 challenge the Radeon HD 5870 and the GeForce GTX 470? Nvidia claims the GTX 460 has plenty of clock speed headroom, so it seems possible. We need to try overclocking this thing and see how it handles, but we’ve not yet had time. Some quick math suggests a fully enabled GF104 at 750MHz would have a comparable ROP rate and shader power to a GTX 470, with better texturing throughput. We’re very much intrigued to see where this chip goes next.

Scott Wasson

View all posts by Scott Wasson

Nvidia’s GeForce GTX 460 graphics processor

Scott Wasson

Scott Wasson

Most Popular News

Latest News

Crypto Market Miscalculates Long-Term Effect of Bitcoin Halving: Bitwise Report

Analyst Highlights XRP Breakout Signal as Monthly Bollinger Bands Tightens

XRP Ledger Sees Explosive Growth with Over 315 AMM Pools and 2.2 Million XRP Locked

Solana Meme Coins Making Millionaires: Could This New Coin Be the Next Big Hit?

Spotify Ups its Game with ‘AI Playlist’ – Also Known as Spotify’s ChatGPT

Top 10 World’s Highest Paid Soccer Coaches 2023-24

El Salvador’s New Hilton Hotel Adopts Crypto Through Tokenized Debt on Bitcoin