Quad SLI under the microscope

OVER THE PAST few years, we’ve seen an incredible number of extreme hardware solutions marketed to PC gamers: graphics cards that cost over five hundred bucks, thousand-dollar “Extreme Edition” CPUs, motherboards with more ports than Dubai, custom physics processors, and now even a “killer” NIC. Without a doubt, though, the most extreme of all of these offerings has to be Nvidia’s vaunted Quad SLI. The concept of running four GPUs together for insane gaming goodness is more extreme than snowboarding down some wicked moguls into a vat of Mountain Dew at the X Games.

But does Quad SLI live up to its practically built-in hype? Can running four GPUs in tandem catapult you into a zone of pure extremeness, where new frames flow like water, object edges are feathery smooth, and textures are so perfectly mapped to surfaces that you’re utterly convinced they’re real?

I dunno. I’m just making this stuff up as I go along. But we have tested Quad SLI in order to see what it’s like to play games on a quad-GPU system. We’ve also popped open the metaphorical hood on Quad SLI to see how it works. Along the way, we found a few unexpected things, as well.

Quad SLI basics
You may remember the 7950 GX2 from our review of it a while back. The GX2 has two G71 graphics processors onboard, each with its own 512MB pool of memory. This product is essentially “SLI on a stick,” a dual-GPU solution that plugs into a single PCIe x16 slot. All by itself, the GeForce 7950 GX2 is the fastest “single” video card on the market. Quad SLI involves taking a pair of these cards and running them together in an SLI configuration, bringing the grand totals involved to four G71 GPUs, 96 pixel shader processors, 2GB of DDR3 memory, 153.6 GB/s of memory bandwidth, and over $1100 in video cards alone.

Gulp.

We wanted to outfit our test system with this outrageous config, and we already had a BFG Tech 7950 GX2 from our initial GX2 review. Fortunately, MSI offered us its version of the GeForce 7950 GX2 for review, and we were in business.

MSI’s GeForce NX7950GX2. Note the distinctive MSI sticker.

As you might have guessed, the MSI NX7950GX2 is quite a nice video card. It comes with all of the requisite cables and plug converters, has HDCP support, and ships with CyberLink PowerCinema for movie playback. MSI also includes a copy of the King Kong movie game, just to make a point. We had no problem throwing this card into a Quad SLI configuration with our BFG Tech card. (SLI brand commingling has been kosher for a while now.)

You will need an exceptionally fast CPU and an SLI-ready motherboard for a Quad SLI rig. We chose the Asus P5N32-SLI SE Deluxe for its ability to host an Intel Core 2 Extreme X6800 processor. The other big concern with Quad SLI, naturally, is finding an adequate power supply. We chose an OCZ GameXStream 700W PSU for our test system, which just so happens to be on Nvidia’s list of SLI-certified PSUs. I would give you some more general PSU specification recommendations for Quad SLI systems, but Nvidia seems to prefer pointing folks to its list of certified power supplies for this application.

With this bundle of components, we were able to set up a Quad SLI system with relatively little drama. Unlike some early Quad SLI configurations that came from system builders, going quad with a pair of GX2s requires only one SLI bridge connector between the two cards.

How Quad SLI splits up the work
That single SLI bridge connector between the 7950 GX2s leads to a Quad SLI topology that looks like this:

A logical diagram of Quad SLI’s topology. Source: NVIDIA.

Although each G71 GPU has room for two SLI links to other GPUs, only two of the four GPUs in a GX2 quad config actually use both links. Internally, each GX2 card has an SLI link between its two GPUs. (The card also has a 48-lane PCI Express switch that links both GPUs to the PCIe interface, noted as “X48 PCI-E” in the diagram.) Externally, the single SLI connector bridges between the “primary” GPUs on each card. This arrangement dispenses with the ring topology found in early Quad SLI configs—an arrangement that always seemed unnecessary to me. At the end of the day, pixels rendered by all of the GPUs have to make it to the lone GPU driving the display, anyhow.

With four GPUs in this topology, Nvidia uses several techniques to split up the work between the graphics processors. These are variations of the methods used in dual-GPU SLI.

Four-way alternate-frame rendering. Source: NVIDIA.

As with dual-GPU SLI, the preferred method of GPU load balancing is known as alternate-frame rendering (AFR). In a two-GPU config, that means GPU 0 renders the odd frames and GPU 1 renders the even frames, for an every-other-frame arrangement. With Quad SLI, the “alternate” tag isn’t quite accurate. Frames are split up sequentially between the four GPUs.

This method tends to scale best in performance because it’s a very logical way to divvy up the workload. Not only does it scale up in terms of fill rate and pixel shading power, but it also divides the vertex-processing burden between the GPUs. Both Nvidia and ATI tend to employ this method when possible in their respective multi-GPU schemes.

AFR isn’t without its drawbacks, though. It’s not compatible with all applications, for one thing, so it’s can’t always be used. More notably, for Quad SLI, AFR requires the use of four frame buffers in order to work. That’s a major, show-stopping problem, because DirectX 9 currently allows a maximum of three such buffers. For DirectX games—which comprise the vast majority of PC game titles—Quad SLI’s best load-balancing method isn’t an option. Nvidia does use four-way AFR in some OpenGL titles like Doom 3 and Quake 4, but they have to resort to other methods for most games.

Another potential problem with four-way AFR is simply the amount of latency involved in a four-buffer rendering scheme. The lag between user input and when that change is reflected onscreen could be fairly long—tens of milliseconds, or long enough to be perceptible (and probably annoying) to a gamer. This problem would be most acute when the graphics subsystem is really stressed and the rate at which the GPUs are pumping out frames is relatively low.

I played a fair amount of Quake 4 with our Quad SLI test rig using an AFR graphics mode, and I didn’t detect any noticeable input lag. Now, I’m no professional gamer, and Quake 4 isn’t exactly the fastest-twitch action game around. But I have noticed input lag playing Quake 4 on an LCD display. Our test rig’s fast CRT may have helped here. I wouldn’t be shocked to hear of folks who found four-way AFR too slow for their tastes, especially when combined with a middling-speed LCD monitor or when running a graphically intensive game at high quality settings.

Four-way split-frame rendering. Source: NVIDIA.

An alternative to AFR that offers broader compatibility and doesn’t suffer from the three-buffer limit is split-frame rendering, where each GPU renders a portion of the frame, subdivided horizontally. The screen can be split into four segments of the same size, or the area apportioned to each GPU can be modified dynamically in response to demand. SFR’s big downside is so-so performance scaling. Some applications work better than others with SFR, but SFR always requires each GPU to process vertex data for the entire frame.

Alternate-frame rendering combined with split-frame rendering. Source: NVIDIA.

Nvidia can also circumvent DX9’s three-buffer limit in AFR-compatible apps by employing a hybrid of AFR and SFR, as shown in the diagram above. The two GPUs on each GeForce 7950 GX2 card use SFR to distribute the load, and the frames rendered are interleaved between the two cards via AFR. In theory, this “AFR of SFR” should scale better than four-way SFR. The current driver profile for Quad SLI uses this method effectively in the game F.E.A.R, as we will see in our performance results shortly.

32X SLI AA. Source: NVIDIA.

Another means of distributing the load in a multi-GPU system is to deliver high levels of antialiasing by combining the antialiased frames from multiple GPUs into one. The AA sample pattern is varied from one GPU to the next so that the final frame effectively has a larger sample size and a more dispersed sample pattern. The various SLI antialiasing modes haven’t traditionally achieved strong performance scaling, although performance improved somewhat when Nvidia incorporated the ability to pass sample data over the SLI bridge in the G71 and G73 GPUs. (The G70 had to pass this data via PCI Express.) Still, SLI AA has remained a means of making less graphically intense games look prettier rather than a load-balancing technique aimed primarily at performance.

That trajectory continues in Quad SLI with the addition of a new SLI AA mode with a 32X sample size. As the diagram above indicates, each GPU renders the frame at a slight offset using the G71’s highest quality 8xS AA mode, and the four frames are composited to produce the final result. Because the 8xS mode is a mix of edge-oriented multisampling and full-scene supersampling, the resulting frame has elements of both, as a quick look at the sample patterns should illustrate.

SLI antialiasing modes exposed
I used the exceptionally handy Direct3D FSAA viewer to capture the sample patterns for the antialiasing methods available in SLI, Quad SLI, and ATI’s CrossFire. The top four rows of the table show the 2X, 4X, 6X, and 8xS modes possible with single GPUs, as well. Below those are the SLI AA and CrossFire SuperAA sample patterns.

The yellow squares below represent the space inside of a pixel. The green dots represent texture samples, and the red ones represent coverage samples, used to determine polygon coverage in the multisampled antialiasing algorithms used by both Nvidia and ATI. (If multisampling is unfamiliar to you, I suggest reading this helpful article for an overview of it.)

	GeForce 7900 GTX	GeForce 7950 GX2 SLI	Radeon X1950 CrossFire
2X
4X
6X
8xS
8X
10X
12X
14X
16X
32X

There’s much to discuss in the table above. Let’s start with the differences between the GPUs, the Radeon X1950 XTX and the Nvidia G71. The Radeon hardware is more flexible; its sample patterns are fully programmable, and the Radeon can capture six coverage samples alongside just one texture sample on a single GPU. (Texture samples have a high performance cost, so avoiding them is helpful.) The Radeon’s excellent 6X AA mode also uses a pseudo-random, non-grid-aligned sample pattern that ought to disrupt the eye’s pattern recognition better than the rotated grid patterns used in both brands’ 2X and 4X modes. ATI makes further use of its hardware’s programmable sample patterns in the CrossFire SuperAA modes, where the pseudo-random sample distribution gets taken up a notch.

Since each Radeon in a CrossFire config must capture one texture sample, ATI gives the user a choice between its 8X/10X and 12X/14X modes. In 8X and 12X, the two texture samples are superimposed over one another, so there’s no element of supersampling in the scene—only edges are modified, and none of the interior detail “blurring” that comes from supersampling is present. The 10X and 14X modes offer the same number of samples as their respective peers, but the texture samples are situated at opposite corners of the pixel. These modes compromise crispness for an element of full-scene spatial AA.

Generally, given the choice, I would side with the 8X and 12X sample patterns. Interior details like textures are best handled via anisotropic filtering.

The Nvidia GPU, on the other hand, has largely fixed sample patterns. The 4X pattern, in particular, is the basis for all of the GeForce’s higher sample modes. Even the SLI AA modes use jittered versions of the 4X and 8xS patterns. The 8xS pattern is two copies of the 4X pattern, one above the other, squished into a single pixel. In order to get to sample sizes beyond four with a single GPU, Nvidia resorts to capturing two texture samples—a bandwidth-intensive task.

The G71 GPU’s less capable hardware limits what Nvidia can do with antialiasing. Still, Quad SLI brings with it three new SLI AA modes, distinct from the modes used in dual-GPU SLI. The 32X mode splits up the work as expected, with each GPU contributing an 8xS antialiased image at an its own unique subpixel offset. As I mentioned earlier, SLI 32X AA has eight texture samples and 32 coverage samples, for a mix of supersampling and multisampling. In order to fit this pattern into the space of a pixel, Nvidia’s been forced to jitter the pattern only slightly from one GPU to the next. The result is four sets of eight samples that unfortunately don’t vary from one another in position by very much.

The Quad SLI 16X pattern is, also as expected, simply four copies of the base 4X AA pattern—one from each GPU, with an offset. That makes this mode very much distinct from the SLI 16X mode in dual SLI, which involves two overlaid 8xS images. Nvidia has made an interesting choice here, situating the four texture samples tightly around the pixel center. It’s a choice that I tend to like, for the same reasons that I prefer ATI’s 8X and 12X SuperAA modes.

The Quad SLI 8X AA mode is a different kind of animal. Like the SLI 16X mode, its texture samples are grouped tighter around the pixel center than in its dual-GPU counterpart. The really odd thing here, though, is that the pattern is simply two copies of the base 4X AA sample pattern—not four copies of the 2X pattern, which would be what’s expected in a Quad SLI AA mode.

I suspect Nvidia must be doing something different with its load-balancing here—a suspicion bolstered by the fact Quad SLI 8X AA has unexpectedly strong performance. Whatever’s happening, it’s apparently secret sauce to Nvidia. When I asked them what was going on with the Quad SLI 8X mode, they wouldn’t tell me unless I would agree not to disclose it. My best guess is that they are running 4X AA on a pair of GPUs, resolving the resulting images to this 8X pattern, and then doing alternate frame rendering between GPU pairs. This method would sidestep the DirectX three-buffer limit and potentially scale well.

Performance expectations for Quad SLI
Nvidia has been extremely careful in slipping Quad SLI out into the market. They first told the world about it at CES early this year, and then Quad SLI systems began shipping via a select group of system builders like Falcon Northwest. The early quad-capable dual-GPU cards were never sold to the public, and were replaced by the GeForce 7950 GX2. Even after the GX2 arrived, PC enthusiasts who wanted to build their own systems had to wait until this past August to get their hands on official release drivers that would enable Quad SLI.

Even as those drivers were released, Nvidia sought to position Quad SLI as a targeted solution that will appeal to a very narrow segment of the market. Those select few, they argued, would want to run their games at extreme resolutions, like 2560×1600, with high degrees of anisotropic filtering and edge antialiasing. Here is a chart from Nvidia’s Quad SLI reviewer’s guide that lays it out for you.

Only the bright green need apply. Source: NVIDIA.

So they’re being very upfront about the fact that Quad SLI isn’t for everyone. Heck, if you read the reviewer’s guide, we’re probably not really qualified to review this product due to our puny 2048×1536 display! So you can just stop reading now, folks.

If you do choose to soldier on, you will see actual, empirical performance numbers for Quad SLI from a range of resolutions and edge and texture antialiasing settings. In fact, we’ve used our entire test suite from our recent graphics reviews to compare Quad SLI to a broad array of solutions. Some of these results may not show Quad SLI in the best light, but the important thing is to remember how Quad SLI makes you feel. That is, of course, the reason for spending all of that money.

In case you’re not getting the hint, all of this caution and fine-grained product positioning ought to temper your performance expectations for Quad SLI. Nvidia says the CPU overhead of driving four GPUs is what causes Quad SLI problems at lower resolutions and quality levels, and I’m sure that’s one part of the equation. Our test rig’s new Core 2 Extreme X6800 processor is incredibly fast, though, and other factors figure prominently into the mix, as well—not least of which is the three-buffer limit in DirectX 9. Not being able to run in AFR mode has gotta hurt. Then there’s the fact that the individual GPUs on the GeForce 7950 GX2 aren’t the fastest on the market. They’re each about the same speed as a GeForce 7900 GT or GS. Cards like the GeForce 7900 GTX or the Radeon X1950 XTX offer quite a bit more performance per GPU—and they can run in dual-GPU configurations, as well

Quad SLI may become more attractive when games that use Havok FX’s GPU-based physics API finally arrive. Then, a quad setup could dedicate two or three GPUs to graphics and leave the rest to handle physics acceleration. At present, though, no Havok FX-enabled game titles or video drivers are available.

Of course, we’ve not only tested Quad SLI in our regular graphics performance suite. We’ve also done some extended testing at extremely high antialiasing and anisotropic filtering settings, and we’ve examined image quality, too.

Test notes
We did run into a few snags in our testing. For one, we had to update our Asus P5N32-SLI SE Deluxe’s BIOS in order to resolve a problem. With the original 0204 BIOS, the system reported only 1GB of memory in Windows whenever a pair of 7950 GX2s was installed. This was not a problem with any of our single or dual-GPU configs, but Quad SLI required a BIOS update.

Also, when we tried to run a pair of GeForce 7600 GT cards in SLI, we encountered some odd image artifacts that we couldn’t make go away. The image artifacts didn’t appear to affect performance, so we’ve included results for the GeForce 7600 GT in SLI. If we find a resolution for the problem and performance changes, we’ll update the scores in this article.

Finally, the 3DMark06 test results for the Radeon X1950 XTX CrossFire system were obtained using an Asus P5W DH motherboard, for reasons explained here. Otherwise, we used the test systems as described below.

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor	Core 2 Extreme X6800 2.93GHz	Core 2 Extreme X6800 2.93GHz	Core 2 Extreme X6800 2.93GHz
System bus	1066MHz (266MHz quad-pumped)	1066MHz (266MHz quad-pumped)	1066MHz (266MHz quad-pumped)
Motherboard	Asus P5N32-SLI Deluxe	Intel D975XBX	Asus P5W DH
BIOS revision	0204	BX97510J.86A.1073.2006.0427.1210	0801
BIOS revision	0305	BX97510J.86A.1073.2006.0427.1210	0801
North bridge	nForce4 SLI X16 Intel Edition	975X MCH	975X MCH
South bridge	nForce4 MCP	ICH7R	ICH7R
Chipset drivers	ForceWare 6.86	INF Update 7.2.2.1007 Intel Matrix Storage Manager 5.5.0.1035	INF Update 7.2.2.1007 Intel Matrix Storage Manager 5.5.0.1035
Memory size	2GB (2 DIMMs)	2GB (2 DIMMs)	2GB (2 DIMMs)
Memory type	Corsair TWIN2X2048-8500C5 DDR2 SDRAM at 800MHz	Corsair TWIN2X2048-8500C5 DDR2 SDRAM at 800MHz	Corsair TWIN2X2048-8500C5 DDR2 SDRAM at 800MHz
CAS latency (CL)	4	4	4
RAS to CAS delay (tRCD)	4	4	4
RAS precharge (tRP)	4	4	4
Cycle time (tRAS)	15	15	15
Hard drive	Maxtor DiamondMax 10 250GB SATA 150	Maxtor DiamondMax 10 250GB SATA 150	Maxtor DiamondMax 10 250GB SATA 150
Audio	Integrated nForce4/ALC850 with Realtek 5.10.0.6150 drivers	Integrated ICH7R/STAC9221D5 with SigmaTel 5.10.5143.0 drivers	Integrated ICH7R/ALC882M with Realtek 5.10.00.5247 drivers
Graphics	Radeon X1800 GTO 256MB PCI-E with Catalyst 8.282-060802a-035722C-ATI drivers	Radeon X1900 XTX 512MB PCI-E + Radeon X1900 CrossFire with Catalyst 8.282-060802a-035515C-ATI drivers	Radeon X1900 XT 256MB PCI-E + Radeon X1900 CrossFire with Catalyst 8.282-060802a-035515C-ATI drivers
	Radeon X1900 GT 256MB PCI-E with Catalyst 8.282-060802a-035722C-ATI drivers	Radeon X1950 XTX 512MB PCI-E + Radeon X1950 CrossFire with Catalyst 8.282-060802a-03584E-ATI drivers
	Radeon X1900 XT 256MB PCI-E with Catalyst 8.282-060802a-03584E-ATI drivers
	Radeon X1900 XTX 512MB PCI-E with Catalyst 8.282-060802a-03584E-ATI drivers
	Radeon X1950 XTX 512MB PCI-E with Catalyst 8.282-060802a-03584E-ATI drivers
	BFG GeForce 7600 GT OC 256MB PCI-E with ForceWare 91.47 drivers
	Dual BFG GeForce 7600 GT OC 256MB PCI-E with ForceWare 91.47 drivers
	XFX GeForce 7900 GS 480M Extreme 256MB PCI-E with ForceWare 91.47 drivers
	Dual XFX GeForce 7900 GS 480M Extreme 256MB PCI-E with ForceWare 91.47 drivers
	GeForce 7900 GT 256MB PCI-E with ForceWare 91.31 drivers
	Dual GeForce 7900 GT 256MB PCI-E with ForceWare 91.31 drivers
	XFX GeForce 7950 GT 570M Extreme 512MB PCI-E with ForceWare 91.47 drivers
	Dual XFX GeForce 7950 GT 570M Extreme 512MB PCI-E with ForceWare 91.47 drivers
	GeForce 7900 GTX 512MB PCI-E with ForceWare 91.31 drivers
	Dual GeForce 7900 GTX 512MB PCI-E with ForceWare 91.31 drivers
	GeForce 7950 GX2 1GB PCI-E with ForceWare 91.31 drivers
	Dual GeForce 7950 GX2 1GB PCI-E with ForceWare 91.47 drivers
OS	Windows XP Professional (32-bit)
OS updates	Service Pack 2, DirectX 9.0c update (August 2006)

Thanks to Corsair for providing us with memory for our testing. Their quality, service, and support are easily superior to no-name DIMMs.

Our test systems were powered by OCZ GameXStream 700W power supply units. Thanks to OCZ for providing these units for our use in testing.

Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults.

The test systems’ Windows desktops were set at 1280×960 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.

We used the following versions of our test applications:

Quake 4 1.3 with trdm1 netdemo
The Elder Scrolls IV: Oblivion 1.1
Ghost Recon Advanced Warfighter 1.21
F.E.A.R. 1.07
Half-Life 2: Episode One with trdem1 demo
FutureMark 3DMark06 Build 1.02
FRAPS 2.7.2

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Pixel-filling power
No review of Quad SLI would be complete without putting its staggering performance specs into perspective. To do that for us, we turn to a table of reasonably recent graphics cards and see how Quad SLI compares. In the table below and throughout the rest of the review, by the way, we’re referring to our Quad SLI config by its proper name: “GeForce 7950 GX2 SLI.”

	Core clock (MHz)	Pixels/ clock	Peak fill rate (Mpixels/s)	Textures/ clock	Peak fill rate (Mtexels/s)	Effective memory clock (MHz)	Memory bus width (bits)	Peak memory bandwidth (GB/s)
Radeon X1650 Pro	600	4	2400	4	2400	1400	128	22.4
GeForce 7600 GT	560	8	4480	12	6720	1400	128	22.4
All-In-Wonder X1900	500	16	8000	16	8000	960	256	30.7
Radeon X1800 GTO	500	12	6000	12	6000	1000	256	32.0
GeForce 7800 GT	400	16	6400	20	8000	1000	256	32.0
Radeon X1800 XL	500	16	8000	16	8000	1000	256	32.0
GeForce 7800 GTX	430	16	6880	24	10320	1200	256	38.4
Radeon X1900 GT	575	12	6900	12	6900	1200	256	38.4
GeForce 7900 GS	450	16	7200	20	9000	1320	256	42.2
GeForce 7900 GT	450	16	7200	24	10800	1320	256	42.2
XFX GeForce 7900 GS 480M	480	16	7680	20	9600	1400	256	44.8
GeForce 7950 GT	550	16	8800	24	13200	1400	256	44.8
Radeon X1900 XT	625	16	10000	16	10000	1450	256	46.4
XFX GeForce 7950 GT 570M	570	16	9120	24	13680	1460	256	46.7
Radeon X1800 XT	625	16	10000	16	10000	1500	256	48.0
Radeon X1900 XTX	650	16	10400	16	10400	1550	256	49.6
GeForce 7900 GTX	650	16	10400	24	15600	1600	256	51.2
GeForce 7800 GTX 512	550	16	8800	24	13200	1700	256	54.4
Radeon X1950 XTX	650	16	10400	16	10400	2000	256	64.0
GeForce 7950 GX2	2 * 500	32	16000	48	24000	1200	2 * 256	76.8
GeForce 7950 GX2 SLI	*4 500**	64	32000	96	48000	1200	*4 256**	153.6

I haven’t included other SLI or CrossFire setups in this table, but you can double up the specs yourself if you’d like to see how they compare. However you slice it, though, a Quad SLI rig has astounding theoretical peak graphics throughput—48,000 megatexels per second of fill rate and over 150 GB/s of memory bandwidth. Nvidia’s fastest single-GPU solution, the GeForce 7900 GTX, has roughly one-third the potential of Quad SLI.

These simple synthetic tests bear out the raw pixel-pushing capacity of Quad SLI. Nothing else comes close.

Quake 4
In order to make sure we pushed the video cards as hard as possible, we enabled Quake 4’s multiprocessor support before testing.

Quake 4 is friendly ground for Quad SLI, because it’s an AFR-capable application that runs in OpenGL, so four-way AFR isn’t an issue. CPU overhead seems to overwhelm the Quad SLI setup at lower resolutions, but it pulls away from the pack decisively at 1600×1200.

F.E.A.R.
We’ve used FRAPS to play through a sequence in F.E.A.R. in the past, but this time around, we’re using the game’s built-in “test settings” benchmark for a quick, repeatable comparison.

Quad SLI turns in another solid performance in this game. Although four-way AFR isn’t a possibility here, F.E.A.R. scales nicely with the AFR of SFR method. Notice that the performance scaling doesn’t just apply to average frame rates, either—minimum frame rates for the Quad SLI rig are quite good, too.

Half-Life 2: Episode One
The Source game engine uses an integer data format for its high dynamic range rendering, which allows all of the cards here to combine HDR rendering with 4X antialiasing.

Quad SLI stumbles a bit in Half-Life 2: Episode One, with the quad solution consistently running slower than a single GeForce 7950 GX2. This game just doesn’t fare well with the load-balancing method Nvidia currently uses for it in quad mode.

The Elder Scrolls IV: Oblivion
We tested Oblivion by manually playing through a specific point in the game five times while recording frame rates using the FRAPS utility. Each gameplay sequence lasted 60 seconds. This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent and trustworthy results. In addition to average frame rates, we’ve included the low frames rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

We set Oblivion’s graphical quality settings to “Ultra High.” The screen resolution was set to 1600×1200 resolution, with HDR lighting enabled. 16X anisotropic filtering was forced on via the cards’ driver control panels.

Oblivion presents the same problems for Quad SLI as Half-Life 2: Episode One. The four-GPU config barely surpasses a single GeForce 7900 GTX here.

Ghost Recon Advanced Warfighter
We tested GRAW with FRAPS, as well. We cranked up all of the quality settings for this game, with the exception of antialiasing. However, GRAW doesn’t allow cards with 256MB of memory to run with its highest texture quality setting, so those cards were all running at the game’s “Medium” texture quality.

At this point, the performance reality of Quad SLI is probably beginning to sink in. It may have lots of potential pixel throughput, but tapping it requires the right combination of display modes, quality settings, and available load-balancing methods.

3DMark06

The Quad SLI system manages to outdo a single GeForce 7950 GX2 in 3DMark, but not by much. The fastest graphics config of all for 3DMark06 is the Radeon X1950 XTX CrossFire.

3DMark’s pixel and vertex processing tests reveal vertex throughput to be one of Quad SLI’s obvious weaknesses. Although it’s the fastest of the pack in the pixel shader test, it finishes second to last in the simple vertex shader test.

Power consumption
We measured total system power consumption at the wall socket using an Extech power analyzer model 380803. The monitor was plugged into a separate outlet, so its power draw was not part of our measurement. Remember, out of necessity, we’re using different motherboards for the CrossFire systems. Otherwise, the system components other than the video cards were kept the same.

The idle measurements were taken at the Windows desktop. The cards were tested under load running Oblivion using the game’s Ultra Quality setting at 1600×1200 resolution with 16X anisotropic filtering.

All things considered, power consumption on the Quad SLI rig isn’t that bad. It’s definitely the most power-hungry system at idle, as one would anticipate, but its power use running Oblivion is actually lower than the top ATI CrossFire solutions—and that’s despite the fact that the P5N32-SLI SE Deluxe motherboard in the Quad SLI system seems to draw more power than the one in the CrossFire system.

There are probably a couple of good reasons for the Quad SLI system’s relatively modest power draw under load. The first is that restrained parallelism naturally produces better results on this front than cranking up a chip to near its absolute limits of voltage and clock speed. (The whole voltage squared part of the power equation looms large here.) Second, we’ve already seen that Quad SLI doesn’t perform especially well in Oblivion, and its lower power draw may be a consequence of underutilization. Without many bits toggling, the GPUs probably don’t draw that much power. I didn’t do any comparative testing, but I did see the Quad SLI rig top 400W on the power meter when running Quake 4, a game that makes better use of all four GPUs via four-way AFR.

Noise levels and cooling
We measured noise levels on our test systems, sitting on an open test bench, using an Extech model 407727 digital sound level meter. The meter was mounted on a tripod approximately 14″ from the test system at a height even with the top of the video card. The meter was aimed at the very center of the test systems’ motherboards, so that no airflow from the CPU or video card coolers passed directly over the meter’s microphone. We used the OSHA-standard weighting and speed for these measurements.

You can think of these noise level measurements much like our system power consumption tests, because the entire systems’ noise levels were measured, including CPU and chipset fans. We had temperature-based fan speed controls enabled on the motherboard, just as we would in a working system. We think that’s a fair method of measuring, since (to give one example) running a pair of cards in SLI may cause the motherboard’s coolers to work harder. The motherboard we used for all single-card and SLI configurations was the Asus P5N32-SLI SE Deluxe, which on our open test bench required an auxiliary chipset cooler. The Asus P5W DH Deluxe motherboard we used for CrossFire testing didn’t require a chipset cooler, so those systems were inherently a little bit quieter. In all cases, we used a Zalman CNPS9500 LED to cool the CPU.

Of course, noise levels will vary greatly in the real world along with the acoustic properties of the PC enclosure used, whether the enclosure provides adequate cooling to avoid a cards’ highest fan speeds, placement of the enclosure in the room, and a whole range of other variables. These results should give a reasonably good picture of comparative fan noise, though.

We measured the coolers at idle on the Windows desktop and under load while playing back our Quake 4 nettimedemo. The cards were given plenty of opportunity to heat up while playing back the demo multiple times. Still, in some cases, the coolers did not ramp up to their very highest speeds under load. The Radeon X1800 GTO and Radeon X1900 cards, for instance, could have been louder had they needed to crank up their blowers to top speed. Fortunately, that wasn’t necessary in this case, even after running a game for an extended period of time.

You’ll see two sets of numbers for the GeForce 7950 GT below, one for the XFX cards with their passive cooling and another for the BFG Tech cards, which use the stock Nvidia active cooler. I measured them both for an obvious reason: they were bound to produce very different results.

The GeForce 7950 GX2 is a fairly quiet video card, so doubling up on them doesn’t produce horrible results. Of course, we’ve measured these noise levels on an open test bench. Given the Quad SLI system’s idle power draw, it’s not really a good candidate for a truly quiet system.

Ultra high quality performance
Next, we have a round of tests with the extra-high antialiasing levels available in SLI and CrossFire. For these tests, we cranked up the quality of the cards. On the GeForces, that meant raising the image quality slider in the Nvidia drivers from “Quality” to “High Quality” and turning on supersampled transparency antialiasing. For the Radeons, that meant enabling ATI’s high-quality anisotropic filtering and turning on “Quality” adaptive antialiasing.

Strictly speaking, these settings may not be entirely fair, because of the ways ATI and Nvidia handle certain things. In most cases, ATI is getting the short end of the stick here, because to my eye, their default anisotropic filtering algorithm is roughly equivalent to the best “High Quality” filtering the Nvidia cards can muster—if not better. And Nvidia’s G71 has no capability analogous to the Radeon X1950’s high-quality, angle-independent anisotropic filtering algorithm. My goal here, however, was to achieve the highest image quality for each graphics solution and then test performance. We will look at image quality, particularly in connection with antialiasing, shortly.

ATI partisans may take some comfort in the fact that I’ve consolidated Nvidia’s 16X and ATI’s 14X AA modes into the same series in the graphs below for the sake of simplicity. ATI’s Super AA 14X mode is the closest rival to Nvidia’s SLI AA 16X modes. Even though it has fewer coverage samples, it has a superior sample pattern.

In addition to the SLI AA and SuperAA modes, I’ve included the usual 4X AA modes and Nvidia’s unique 8xS mode in the results below for comparison.

Quad SLI scales well in Quake 4 generally, so it’s no surprise to see it performing well here. It is a little jarring to see that both 16X and 32X SLI AA modes are running frame rates essentially too low to be considered playable. Remember, we’re only at 1600×1200 resolution, and yet Quad SLI is running out of steam.

The place where Quad SLI shines, though, is in 8X SLI AA. The GeForce 7900 GTX is actually faster running 8xS and normal AFR load balancing than it is with 8X SLI AA. Quad SLI, on the other hand, performs better in 8X SLI AA. Whatever Nvidia put into the secret sauce for this mode, it certainly seems to work.

The F.E.A.R. and Half-Life 2: Episode One numbers tell a similar story—SLI 8X AA looks to be something of a sweet spot for Quad SLI. Remarkably, Half-Life 2: Episode One actually runs faster in SLI 8X mode than in the regular 4X mode. Some of Nvidia’s marketing materials for Quad SLI have leaned heavily on SLI 8X AA numbers, and now we know why. In some DirectX games, SLI 8X AA proves to be Quad SLI’s most effective method of load balancing.

Another trend worth noting throughout these results is the relative strength of the Radeon X1950 CrossFire rig in Super AA 14X mode. The CrossFire system is able to match or beat Quad SLI’s performance in SLI 16X AA almost across the board. Also, for the most part, none of these systems produce fluidly playable frame rates at 1600×1200 in 14X, 16X, or 32X AA.

Antialiasing image quality – Single-GPU
Now that we’ve tested antialiasing performance, we’ll take a quick look at AA image quality. This first page has images from AA modes available on single-GPU systems as well as multi-GPU. Since the image output for the various G71 chips doesn’t vary in these modes, we only have one sample image for each mode from the GeForce 7 series.

This sample scene from Half-Life 2 may look familiar, because we’ve used it in the past. I like it for several reasons. The fine detail in the antennas on the rooftops clearly illustrates the impact of antialiasing, and the telephone pole and rooftops present some nice examples of high-contrast, near-vertical and near-horizontal edges—always tough cases for AA. Also, the tree leaves in the far left of each image are a good example of alpha transparency, another tough case.

As in the tests on the previous page, I’ve run the cards at very high quality settings here, with supersampled transparency AA and “quality” adaptive AA enabled (which should help with the alpha transparency of tree leaves.) These images were captured with FRAPS, which attempts to get to correct image result, even though Nvidia doesn’t do some of its AA resolve work until the image is sent to the display. I have enlarged these images to four times their original size, but they are otherwise unretouched. They’re lossless PNG images, so downloading them may take a few seconds.

GeForce 7 series – No AA

Radeon X1950 XTX – No AA

GeForce 7 series – 2X AA

Radeon X1950 XTX – 2X AA

Radeon X1950 XTX – 4X AA

GeForce 7 series – 4X AA

Radeon X1950 XTX – 6X AA

GeForce 7 series – 8xS AA

Until we get to 4X AA, things look pretty rough in our example. To my eye, ATI’s 6X AA mode looks almost as good as Nvidia’s 8xS mode. That’s due in part to ATI’s pseudo-random sample pattern, although its effects are less apparent in a still screenshot than in motion. Notice how even the transparent texture in the tree leaves is being handled well by both GPUs.

Antialiasing image quality – Multi-GPU
Here’s where things get tricky, because the Quad SLI system’s sample patterns diverge from the GeForce 7900 GTX SLI’s. Lean into your monitor and squint, folks.

Radeon X1950 CrossFire – 8X SuperAA

Radeon X1950 CrossFire – 10X SuperAA

GeForce 7950 GX2 SLI – SLI 8X AA

GeForce 7900 GTX SLI – SLI 8X AA

I’ve long preferred ATI’s antialiasing both in theory and in practice, but in these multi-GPU 8X modes, I have to say I prefer the output from the GeForce-based solutions. Perhaps it’s the multiple texture samples, but the Nvidia GPUs seem to do a better job clarifying the fine detail in this scene. ATI’s 10X mode, which takes its two texture samples from different locations, doesn’t seem to help matters any. Even ATI’s gamma-adjusted blends, which tend to provide smoother color gradients in places like at the edge of the telephone pole, aren’t enough to overcome this deficit. (By the way, Nvidia has a gamma-correct blend option in its drivers, but it doesn’t work with SLI AA modes.)

The dual and quad 8X SLI AA modes have slightly different underlying sample patterns, but they look the same pixel for pixel to me.

Radeon X1950 CrossFire – 12X SuperAA

Radeon X1950 CrossFire – 14X SuperAA

GeForce 7950 GX2 SLI – SLI 16X AA

GeForce 7900 GTX SLI – SLI 16X AA

GeForce 7950 GX2 SLI – SLI 32X AA

Take your pick from among these high-sample modes. I’m not sure I could choose between them. The Nvidia cards, with their higher texture sampling rates, tend to resolve the fine detail in the antennas on the rooftops better, but the ATI 12X mode’s output may be a truer representation of those extremely thin objects. 32X SLI AA does look very nice—the gradient on the telephone pole’s edge is buttery smooth—but it’s only a minor incremental improvement over the 16X mode.

Conclusions
Quad SLI really does work, and it’s reasonably transparent to the user, just like dual-GPU SLI. At its best, Quad SLI delivers world-beating performance well beyond that of any other multi-GPU solution, as we saw in Quake 4 and F.E.A.R. However, Quad SLI has serious performance drawbacks in some DirectX games. These drawbacks may not affect outright playability much, because Quad SLI is rarely much slower than a single GeForce 7950 GX2 card, which is itself crazy fast. But they do call into question Quad SLI’s value, as shaky a concept as that was in the first place for an $1100 graphics subsystem. The key problem, in my view, is the three-buffer limit in DirectX 9, which cramps Quad SLI’s performance potential.

One bright spot is Quad SLI’s performance in DirectX games with its SLI 8X antialiasing mode. However it works, it’s a pretty effective method of load balancing. The higher SLI AA modes like 16X and 32X, though, remain largely unusable because of the performance hit they impose. We found them not to be viable in our test apps at 1600×1200 resolution. I can’t imagine dropping down to an even lower res to move from 8X to 16X or 32X AA. The tradeoff isn’t worth it.

The power consumption and noise production of a Quad SLI system isn’t as bad as one might think, either, although they’re certainly no picnic. Surprisingly, the Radeon X1950 CrossFire setup pulled more power under load in Oblivion than our Quad SLI rig. Given that, and based on our experience, building a Quad SLI system isn’t as complex as Nvidia first made it out to be when they were keeping it out of the hands of DIYers. I tend to think the performance issues were a much larger part of that situation than power and cooling, although the G71 GPU did admittedly reduce power consumption versus the G70 used in the first wave of OEM PCs with Quad SLI.

In light of its performance issues, Nvidia is right to position Quad SLI as a niche product for users with monitors capable of 2560×1600 resolution who desperately want to run their games at 8X AA or higher with 16X anisotropic filtering. That’s almost nobody, and almost nobody should shell out the big bucks for this product in its current form.

Trouble is, when you start focusing intently on picky image quality details as Quad SLI’s main selling point, the weaknesses of the G71 GPU become an issue. Nvidia’s texture filtering methods produce too much moire, crawl, and high-frequency noise. ATI’s Radeon X1950 GPUs have superior anisotropic and trilinear texture filtering and are capable of angle-independent anisotropic filtering. They have programmable antialiasing hardware with smarter sample patterns and native gamma-correct blending. They can also do AA in combination with 16-bit FP texture filtering for high dynamic range lighting. The G71 has none of these capabilities. The Radeon X1950 CrossFire system also performed very well in our tests, and it doesn’t have the performance liability in DirectX apps that Quad SLI does. If you want to use extreme graphics rendering power to achieve the absolute best image quality, your best bet right now is a Radeon X1950 CrossFire rig.

I can’t help but think Nvidia could do more than it has to help Quad SLI’s performance. Using all three of DirectX’s buffers for three-way AFR might not employ all of the GPUs in a Quad SLI rig, but it would likely mean better performance in many cases. Nvidia could also probably add a fourth buffer for AFR in its drivers, although it might not get Microsoft’s WHQL stamp of approval. No doubt there are other creative possibilities for Quad SLI rendering modes, as well.

Nvidia, however, seems to be concentrating its efforts elsewhere right now. Rumors of the new G80 GPU abound, and the product seems to be imminent. This GPU will be built for DirectX 10, and I would be shocked if DX10 didn’t also include provisions for more than three frame buffers. The current Quad SLI implementation may have its weaknesses, but some things like its AA modes are intriguing innovations. They may be the foundation for a future Quad SLI implementation that better lives up to its performance potential. And the truth is, no current Quad SLI rig is going to keep you from wanting a G80 when it arrives, if you’re truly extreme.

Scott Wasson

View all posts by Scott Wasson

Quad SLI under the microscope

Scott Wasson

Scott Wasson

Most Popular News

Latest News

Joint International Police Operation Disrupts LabHost – A Platform That Supported 2,000+ Cybercriminals

Apple Removes WhatsApp and Threads from Its App Store in China

XRP Falls to $0.3 Amid Massive Weekend Sell-off – Can $1 Be Achieved Post-Halving?

Cardano Could Rally to $27 After Bitcoin Halving Following a Historical Performance

Japanese Banking Firm Launches Passive Income Program for Shiba Inu

Ripple CLO Clarifies Future Steps With the SEC While Quenching Settlement Rumors

Cisco Launches AI-Driven Security Solution ‘Hypershield’