Home CrossFire X explored
Reviews

CrossFire X explored

Scott Wasson
Disclosure
Disclosure
In our content, we occasionally include affiliate links. Should you click on these links, we may earn a commission, though this incurs no additional cost to you. Your use of this website signifies your acceptance of our terms and conditions as well as our privacy policy.

GPUs, it seems, are everywhere, breeding like rabbits. We see the introduction of a new GPU seemingly every month, and multi-GPU schemes like SLI and CrossFire are omnipresent. We now have multiple GPUs on a single graphics card, hybrid multi-GPU implementations involving integrated graphics, and more-than-two-way incarnations of both SLI and CrossFire.

The most intriguing bit of multi-GPU madness we’ve seen recently may be AMD’s CrossFire X, simply because in this generation, AMD opted to chain together three or four mid-range GPUs in place of creating a separate high-end graphics processor. That’s a bold move, fraught with peril, because multi-GPU schemes can be rather fragile, with iffy compatibility and less-than-ideal performance scaling. Then again, AMD’s decision to rely on CrossFire X to round out the high end of its product lineup has surely helped to concentrate its attention on making the scheme work well. So who knows?

We’ve taken a quick look at AMD’s first drivers for CrossFire X, and we have some interesting things to report. Read on to see what we learned.

Extending CrossFire to X
CrossFire X is, quite simply, an extension of the CrossFire dual-GPU feature to three and four GPUs. The hardware to make such a thing possible has been on the market for some time now, and last week’s release of the Catalyst 8.3 driver revision finally enabled this feature in software, as well. The basic building block of CrossFire X is AMD’s RV670 GPU, which is present in all of the various incarnations of the Radeon HD 3800 series of graphics cards. Getting to three or four GPUs can be achieved using a dizzying number of potential card combinations, which AMD has summarized in this helpful matrix:


Possible card combos for CrossFire X. Source: AMD.

The options are many. You could harness four GPUs together by using a pair of dual-GPU Radeon HD 3870 X2 cards, or given enough PCIe x16 slots, you could achieve a similar result using four Radeon HD 3850s. Cross-breeding is an option, as well, so a Radeon HD 3870 X2 could pair up with a single Radeon HD 3850 in a three-way config. Kinky.


A Radeon HD 3870 matched up with a Radeon HD 3870 X2 on an Intel X38 chipset

The caveat here is that CrossFire X will settle on the lowest core GPU clock, memory clock, and video RAM size to determine the operative clock speeds and effective memory size. As a result, a Radeon HD 3870 X2 paired with a Radeon HD 3850 256MB would perform like a trio of Radeon HD 3850 256MB cards. And, of course, that means the effective memory size for the entire GPU phalanx would effectively be 256MB, not 768MB, because memory isn’t shared between GPUs in CrossFire (or in SLI, for that matter).

Like its dual-GPU predecessor, CrossFire X works on a fairly broad range of motherboards, including those based on AMD 480, 580, and 7-series chipsets, as well as boards based on many of Intel’s more recent chipsets—among them: the 955, 965, 975, P35, G35, X38, and X48.

CrossFire X’s performance and feature set will be more or less optimal depending on the chipset’s topology and the motherboard’s allocation of PCIe lanes. AMD cites its own 790FX chipset as the most optimal possible config, where the motherboard could dedicate eight lanes of PCIe 2.0 bandwidth to each of four PCIe x16 slots. On the other hand, Intel’s P35 chipset would be less than ideal, since it has 16 lanes of PCIe 1.1 connectivity feeding a single PCIe x16 slot off of the north bridge chip, while the second PCIe x16 slot hangs off of the south bridge and has only four lanes connected. The P35’s lower bandwidth will impose some limitations on CrossFire X: image compositing must be done in hardware (so you’ll definitely need to have those CrossFire bridge connectors attached) and OpenGL support won’t be possible.


CrossFire X is either on or off—the user can’t specify three- or four-way operation

Of course, CrossFire X will impose its own set of limitations, simply due to its nature. As I’ve said, multi-GPU schemes are fragile, and more-than-two-way schemes are even more fragile than dual-GPU arrangements. Most of what I said about this subject in my three-way SLI review applies to CrossFire X, as well. CrossFire X will require game-specific profiles in the video driver to work best, and as a new technology, it has a limited stable of game profiles available. Even with a proper profile available, four GPUs will rarely be anything approaching four times as fast as a single GPU. Performance scaling just isn’t that easy. In many cases, you’d be lucky to see three times the performance of one GPU. Furthermore, AMD hasn’t yet implemented CrossFire X support for OpenGL games in its drivers, and for reasons we’ll discuss in a moment, most DirectX 10 games don’t yet benefit from the presence of a fourth GPU.

On the plus side, thanks to the power-efficient nature of AMD’s RV670 GPU, CrossFire X doesn’t impose the almost unreasonable power supply requirements of Nvidia’s three-way SLI. Our test system’s PC Power & Cooling Silencer 750W PSU, which is both quiet and fairly reasonably priced, had no trouble supplying power to both three- and four-way CrossFire X configs. Heck, a couple of Radeon HD 3870 X2s requires four PCIe aux power connectors, just like a pair of Radeon HD 2900 XTs.

CrossFire X has a few other nice attributes that SLI doesn’t share. One of those is the ability to work seamlessly with multiple monitors—no more enabling and disabling multi-GPU mode in order to switch between single-screen gaming and multi-display productivity sessions. We extolled the virtues of this feature in our Radeon HD 3870 X2 review—it’s a feature Nvidia’s SLI can’t match—and AMD now says this capability has been extended to more than two GPUs. Even four GPUs and eight displays ought to work effortlessly, as I understand it, though I’ve not had the chance to try it out myself. The one drawback here is that 3D apps running in a window are only accelerated by a single GPU. AMD says multi-GPU support in windowed mode is on its roadmap, but not yet ready.

Another new perk AMD has added for CrossFire X is the ability to use the Radeon HD series’ custom antialiasing filters in conjunction with the CrossFire “Super AA” mode. Super AA, for the uninitiated, is a GPU load-balancing method in which each GPU renders a different set of sub-pixel samples; those samples are then composited into a highly antialiased final image. The Super AA mode available on the Radeon HD series is a 16X mode. When combined with a wide-tent filter, Super AA can deliver what ATI classifies as “32X” AA. AMD also has an edge-detect custom filter that it claims can achieve up to “42X” AA in combination with Super AA, but that filter isn’t available in Catalyst 8.3.


Super AA plus wide-tent filter equals 32X AA

Incidentally, Catalyst 8.3 includes a number of other new features and enhancements for both single- and multi-GPU use. We’ve already covered those elsewhere, so I won’t repeat the laundry list of changes here.

The multi-GPU scaling challenge
AMD claims development on CrossFire X drivers has taken a year, and that the total effort amounts to twice that of its initial dual-GPU CrossFire development effort. In order to understand why that is, I spoke briefly with Dave Gotwalt, a 3D Architect at AMD responsible for CrossFire X driver development. Gotwalt identified several specific challenges that complicated CrossFire X development.

One of the biggest challenges, of course, is avoiding CPU bottlenecks, long the bane of multi-GPU solutions. Gotwalt offered a basic reminder that it’s easier to run into CPU limitations with a multi-GPU setup simply because multi-GPU solutions are faster overall. On top of that, he noted, multi-GPU schemes impose some CPU overhead. As a result, removing CPU bottlenecks sometimes helps more with multi-GPU performance than with one GPU.

In this context, I asked about the opportunities for multithreading the driver in order to take advantage of multiple CPU cores. Surprisingly, Gotwalt said that although AMD’s DirectX 9 driver is multithreaded, its DX10 driver is not—neither for a single GPU nor for multiples. Gotwalt explained that multithreading the driver isn’t possible in DX10 because the driver must make callbacks though the DX10 runtime to the OS kernel, and those calls must be made through the main thread. Microsoft, he said, apparently felt most DX10 applications would be multithreaded, and they didn’t want to create another thread. (What we’re finding now, however, noted Gotwalt, is that applications aren’t as multithreaded as Microsoft had anticipated.)

With that avenue unavailable to them, AMD had to focus on other areas of potential improvement for mitigating CPU bottlenecks. One of the keys Gotwalt identified is having the driver queue up several command buffers and several frames of data, in order to determine ahead of time what needs to be rendered for the next frame.

Even with such provisions in place, Windows Vista puts limitations on video drivers that sometimes prevent CrossFire X from scaling well. The OS, Gotwalt explained, controls the “flip queue” that holds upcoming frames to be displayed, and by default, the driver can only render as far as three frames ahead of the frame being displayed. Under Vista, both DX9 and DX10 allow the application to adjust this value, so that the driver could get as many as ten frames ahead if the application allowed it. The driver itself, however, has no control over this value. (Gotwalt said Microsoft built this limitation into the OS, interestingly enough, because “a certain graphics vendor—not us” was queuing up many more frames than the apps were accounting for, leading to serious mouse lag. Game developers were complaining, so Microsoft built in a limit.)

For CrossFire X, AMD currently relies solely on a method of GPU load balancing known as alternate frame rendering (AFR), in which each GPU is responsible for rendering a whole frame and frames are distributed to GPUs sequentially. Frame 0 will go to GPU 0, frame 1 to GPU 1, frame 2 to GPU 2, and so on. Because of the three-frame limit on rendering ahead, explained Gotwalt, the fourth GPU in a CrossFire X setup will have no effect in some applications. Gotwalt confirmed that AMD is working on combining split-frame rendering with AFR in order to improve scaling in such applications. He even alluded to another possible technique, but he wasn’t willing to talk about it just yet. Those methods will have to wait for a future Catalyst release.

Another performance challenge Gotwalt pointed to is one of Vista’s resource management practices. In order for an application to access a resource (such as a buffer), the application must “lock” this resource. The fastest type of lock, he said, is a lock-discard, which is useful when one doesn’t care about modifying the current contents of the resource, since a lock-discard simply allocates a new chunk of memory. This sort of lock makes sense for certain types of resources, like vertex buffers. The problem, according to Gotwalt, is that the OS’s implementation of lock-discard is expensive for small buffers. A kernel transition is involved, and the memory manager will only allow a given buffer to be renamed 64 times. After that, the DirectX runtime will require the driver to flush its command buffer, invoking a severe performance penalty. As Gotwalt put it, “We have now just serialized the whole system.” This limitation exists for both DX9 and DX10, but Gotwalt said it isn’t as evident in DX9. DirectX 10 presents more of a problem because its constant buffers are different in nature; they are smaller and can have a higher update frequency than vertex buffers.

As a result, AMD has taken over management of renaming in its drivers. Doing so isn’t a trivial task, Gotwalt pointed out, because one must avoid over-allocating memory. At present, AMD has a constant buffer renaming mechanism in place in Catalyst 8.3, but it involves some amount of manual tweaking, and new applications could potentially cause problems by exhibiting unexpected behavior. However, Gotwalt said AMD has a new, more robust solution coming soon that won’t involve so much tweaking, won’t easily be broken by new applications, and will apply to any resource that is renamed—not just constant buffers, but vertex buffers, textures, and the like.

The final issue Gotwalt described may be the thorniest one for multi-GPU rendering: the problem of persistent resources. In some cases, an application may produce a result that remains valid across several succeeding frames. Gotwalt’s example of such a resource was a shadow map. The GPU renders this map and then uses it as a reference in rendering the final frame. This sort of resource presents a problem because multiple GPUs in CrossFire X don’t share memory. As a result, he said, the driver will have to track when the map was rendered and synchronize its contents between different GPUs. Dependences must be tracked, as well, and the driver may have to replicate both a resource and anything used to create it from one GPU to the next (and the next). This, Gotwalt said, is one reason why profiled AFR ends up being superior to non-profiled AFR: the driver can turn off some of its resource tracking once the application has been profiled.

Gotwalt pointed out that “AFR-friendly” applications will simply re-render the necessary data multiple frames in a row. However, he said, the drivers must then be careful not to sync data unnecessarily when the contents of a texture have been re-rendered but haven’t changed.

Curious, I asked Gotwalt whether re-rendering was typically faster than transferring a texture from one GPU to the next. He said yes, in some applications it is, but one must be careful about it. If you’re re-rendering too many resources, you’re not really sharing the workload, and performance won’t scale. In those cases, it’s faster to copy the data from GPU to GPU. Gotwalt claimed they’d found this to be the case in DirectX 10 games, whereas DX9 games were generally better off re-rendering.

Gotwalt attributed this difference more to changes in the usage model in newer games than to the API itself. (Think about the recent proliferation of post-processing effects and motion blur.) DX10 games make more passes on the data and render to textures more, creating a “cascading of resources.” DX10’s ability to render to a buffer via stream out also allows more room for the creation of persistent resources. Obviously, this is a big problem to manage case by case, and Gotwalt admitted as much. He qualified that admission, though, by noting that AMD learns from every game it profiles and tries to incorporate what it learns into its general “compatible AFR” implementation when possible.

Clearly, AMD has put a tremendous amount of sweat and smarts into making CrossFire X work properly and into achieving reasonably good performance scaling with multiple GPUs. The obstacles Gotwalt outlined are by no means trivial, and the AMD driver team’s ability to navigate those obstacles with some success is impressive. Still, some of the challenges they face aren’t going to go away. In fact, the persistent resources problem is only growing thornier and more complex with time. This is one of the major reasons multi-GPU solutions—based on today’s GPU architectures, at least—will probably always be somewhat fragile and very much reliant on driver updates in order to deliver strong performance scaling. There’s reason for optimism here based on the good work that folks at AMD and elsewhere are putting into these problems, but also reason for caution.

Test notes
CrossFire X presents a wealth of possible test configs, but we chose a couple that we thought would be representative of common configurations. For our quad-GPU tests, we used a pair of Radeon HD 3870 X2 cards, and for three GPUs, we used a single Radeon HD 3870 X2 paired with a Radeon HD 3870. Our test motherboard, a Gigabyte GA-X38-DQ6, has two PCIe x16 slots with a full 16 lanes of PCIe 2.0 connectivity routed to each. These hardware combinations should be more or less optimal for CrossFire X in terms of interconnect bandwidth and the like, giving it plenty of opportunity for performance scaling.

Please note that we tested the single and dual-GPU Radeon configs with the Catalyst 8.2 drivers, simply because we didn’t have enough time to re-test everything with Cat 8.3. The one exception is Crysis, where we tested single- and dual-GPU Radeons with AMD’s 8.451-2-080123a drivers, which include many of the same application-specific tweaks that the final Catalyst 8.3 drivers do.

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor Core
2 Extreme X6800
2.93GHz
Core
2 Extreme X6800
2.93GHz
System
bus
1066MHz
(266MHz quad-pumped)
1066MHz
(266MHz quad-pumped)
Motherboard Gigabyte
GA-X38-DQ6
XFX
nForce 680i SLI
BIOS
revision
F7 P31
North
bridge
X38
MCH
nForce
680i SLI SPP
South
bridge
ICH9R nForce
680i SLI MCP
Chipset
drivers
INF
update 8.3.1.1009

Matrix Storage Manager 7.8

ForceWare
15.08
Memory
size
4GB
(4 DIMMs)
4GB
(4 DIMMs)
Memory
type
2
x Corsair
TWIN2X20488500C5D
DDR2 SDRAM
at 800MHz
2
x Corsair
TWIN2X20488500C5D
DDR2 SDRAM
at 800MHz
CAS
latency (CL)
4 4
RAS
to CAS delay (tRCD)
4 4
RAS
precharge (tRP)
4 4
Cycle
time (tRAS)
18 18
Command
rate
2T 2T
Audio Integrated
ICH9R/ALC889A
with RealTek 6.0.1.5497 drivers
Integrated
nForce 680i SLI/ALC850
with RealTek 6.0.1.5497 drivers
Graphics Diamond Radeon HD
3850 512MB PCIe

with Catalyst 8.2 drivers
Dual
GeForce
8800 GT 512MB PCIe

with ForceWare 169.28 drivers
Dual Radeon HD
3850 512MB PCIe

with Catalyst 8.2 drivers
Dual
Palit GeForce
9600 GT 512MB PCIe

with ForceWare 174.12 drivers

Radeon HD 3870 512MB PCIe

with Catalyst 8.2 drivers
Dual

Radeon HD 3870 512MB PCIe

with Catalyst 8.2 drivers


Radeon HD 3870 X2 1GB PCIe

with Catalyst 8.2 drivers

Dual Radeon HD 3870 X2 1GB PCIe
with Catalyst 8.3 drivers


Radeon HD 3870 X2 1GB PCIe
+

Radeon HD 3870 512MB PCIe

with Catalyst 8.3 drivers
Palit
GeForce
9600 GT 512MB PCIe

with ForceWare 174.12 drivers
GeForce
8800 GT 512MB PCIe

with ForceWare 169.28 drivers
EVGA
GeForce 8800 GTS 512MB PCIe

with ForceWare 169.28 drivers
GeForce
8800 Ultra 768MB PCIe

with ForceWare 169.28 drivers
Hard
drive
WD
Caviar SE16 320GB SATA
OS Windows
Vista Ultimate
x86 Edition
OS
updates
KB936710, KB938194, KB938979,
KB940105, KB945149,
DirectX November 2007 Update

Thanks to Corsair for providing us with memory for our testing. Their quality, service, and support are easily superior to no-name DIMMs.

Our test systems were powered by PC Power & Cooling Silencer 750W power supply units. The Silencer 750W was a runaway Editor’s Choice winner in our epic 11-way power supply roundup, so it seemed like a fitting choice for our test rigs. Thanks to OCZ for providing these units for our use in testing.

Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.

We used the following versions of our test applications:

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Solving for X
As you may know, the RV670 GPU that powers the Radeon HD 3800 series of graphics cards is a pretty solid mid-range graphics processor, but it’s no match for Nvidia’s current higher-end GPUs. The contest is close enough, though, that stacking up three or four RV670s makes for a very potent graphics solution. Here’s how, in theory, our three- and four-way CrossFire X setups compare to most of today’s single-GPU setups—and with two- and three-way SLI setups involving Nvidia’s fastest card, the GeForce 8800 Ultra.

Peak
pixel
fill rate
(Gpixels/s)

Peak bilinear
texel
filtering
rate
(Gtexels/s)

Peak bilinear

FP16 texel
filtering
rate
(Gtexels/s)


Peak
memory
bandwidth
(GB/s)

Peak
shader
arithmetic
(GFLOPS)

GeForce 8800 GT 9.6 33.6 16.8 57.6 504
GeForce 8800 GTS

10.0 12.0 12.0 64.0 346
GeForce 8800 GTS 512 10.4 41.6 20.8 62.1 624

GeForce 8800 GTX

13.8 18.4 18.4 86.4 518
GeForce 8800 Ultra

14.7 19.6 19.6 103.7 576
GeForce 8800 Ultra SLI (x2) 29.4 39.2 39.2 207.4 1152
GeForce 8800 Ultra SLI (x3) 44.1 58.8 58.8 311.0 1728
Radeon HD 2900 XT

11.9 11.9 11.9 105.6 475
Radeon HD 3850 10.7 10.7 10.7 53.1 429
Radeon HD 3870 12.4 12.4 12.4 72.0 496
Radeon HD 3870 X2

26.4 26.4 26.4 115.2 1056
Radeon HD 3870 X2 + 3870 (x3)

37.2 37.2 37.2 172.8 1488
Radeon HD 3870 X2 CrossFire
(x4)

52.8 52.8 52.8 230.4 2112

Of course, simply adding together the peak theoretical capabilities of multiple GPUs, as we’ve done here, doesn’t account for any of the multi-GPU performance scaling issues we’re discussed. But it does give us a sense of where things stand. On this basis, our four-way CrossFire X rig leads all contenders in terms of pixel fill rate and shader arithmetic capacity. Staggeringly, the four-GPU config peaks at over 2.1 teraflops of shader power. Not bad for under a grand! These shader arithmetic numbers are all the more impressive because there’s an argument to be made that the GeForce FLOPS numbers you see above may be inflated by a third, depending on how the GPU is being used.

Overall, the three-way CrossFire X solution matches up well against two GeForce 8800 Ultras in SLI and against (if you do the math) a pair of GeForce 8800 GT cards in SLI, as well.

We can test these theoretical capacities with some precision using synthetic benchmarks. These aren’t a measure of real-world performance, but they do test something close to the actual peak throughput the hardware can achieve.

Things work out about as expected in terms of finishing order for pixel and texel fill rates. We know from history that this pixel fill rate test tends to be limited more by memory bandwidth than by raw GPU pixel output capacity, but the four-way CrossFire X setup manages to outdo the three-way SLI system despite having less peak memory bandwidth. In the multitextured fill rate test, the GPUs reach closer to their theoretical peaks, which is good news for CrossFire X. One surprise of sorts, if you weren’t watching for it, is the multitexturing performance of dual GeForce 8800 GTs in SLI. The G92 GPU has incredible texture filtering prowess with the most commonly used texture formats, although it’s only half as fast with FP16 textures.

CrossFire X largely dominates 3DMark’s simple pixel and vertex shader tests. Let’s see whether it can do the same in real games.

Call of Duty 4: Modern Warfare
We tested Call of Duty 4 by recording a custom demo of a multiplayer gaming session and playing it back using the game’s timedemo capability. Since these are high-end graphics configs we’re testing, we enabled 4X antialiasing and 16X anisotropic filtering and turned up the game’s texture and image quality settings to their limits.

We’ve chosen to test at 1680×1050, 1920×1200, and 2560×1600—resolutions of roughly two, three, and four megapixels—to see how performance scales. I’ve also tested at 1280×1024 with the lower-end graphics cards, since some of them struggled to deliver completely fluid rate rates at 1680×1050.

CrossFire X performance scales quite nicely to three GPUs in CoD4—almost linearly from one GPU to two to three, in fact. That’s sufficient for three 3870s to outperform two GeForce 8800 GTs. With four GPUs, scaling isn’t as impressive, and it’s not quite enough to allow CrossFire X to overcome a pair of GeForce 8880 Ultras.

Half-Life 2: Episode Two
We used a custom-recorded timedemo for this game, as well. We tested Episode Two with the in-game image quality options cranked, with 4X AA and 16 anisotropic filtering. HDR lighting and motion blur were both enabled.

Episode Two is a very different scaling story for CrossFire X and three-way SLI. Adding a third GeForce 8800 Ultra is a performance detriment, while going to three and even four GPUs benefits the Radeon HD 3870. Here’s the tough question, though: why? Two GeForce 8800 Ultras are faster than four Radeon HD 3870s. The fact that each 3870 GPU is slower means there’s more potential headroom for CrossFire X to scale. That said, the results at lower resolutions seem to indicate AMD is doing a better job than Nvidia of managing three (and four) GPUs without running into CPU bottlenecks or the like.

In this game, we should note, CrossFire X is only needed at 2560×1600 resolution. Below that, at 1920×1200, quite a few single- and dual-GPU solutions are plenty fast. Also, notice that three-way CrossFire X is only a hair faster than two GeForce 9600 GTs in SLI—a much less expensive option.

Crysis
I was a little dubious about the GPU benchmark Crytek supplies with Crysis after our experiences with it when testing three-way SLI. The scripted benchmark does a flyover that covers a lot of ground quickly and appears to stream in lots of data in a short period, possibly making it I/O bound—so I decided to see what I could learn by testing with FRAPS instead. I chose to test in the “Recovery” level, early in the game, using our standard FRAPS testing procedure (five sessions of 60 seconds each). The area where I tested included some forest, a village, a roadside, and some water—a good mix of the game’s usual environments.

Due to the fact that FRAPS testing is a time-intensive endeavor, I’ve tested the lower-end graphics cards at 1680×1050 and the higher end cards at 1920×1200, with CrossFire X included in each group.

This is one of those applications where CrossFire X can only make use of three GPUs due to limits on how many frames the driver can render ahead. As a result, four-way CrossFire X performs the same or apparently slightly slower in Crysis. Of course, since we’re playing through the game manually, some variance in the scores is likely. I’d say CrossFire X four-way performs essentially the same as three-way.

Also, CrossFire X is of no benefit in Crysis at 1680×1050 resolution with these quality settings. At 1920×1200, adding a third Radeon HD 3870 GPU does raise average frame rates slightly, but the median low frame rate doesn’t budge. My seat-of-the-pants impression is similar: the game doesn’t play any better with a third GPU.

In order to better tease out the differences between two, three, and four GPUs, I cranked up Crysis to its “very high” quality settings and turned on 4X antialiasing.

None of the graphics solutions produce truly playable performance, but we do see a clear difference between two and three Radeon HD 3870s. Note that, although CrossFire X manages higher average frame rates than two 8800 Ultras, its frame rate minimums are lower. The reality here is that, for practical purposes, having more than two GPUs is no help in Crysis right now.

Unreal Tournament 3
We tested UT3 by playing a deathmatch against some bots and recording frame rates during 60-second gameplay sessions using FRAPS. This method has the advantage of duplicating real gameplay, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent and trustworthy results. In addition to average frame rates, we’ve included the low frames rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

Because UT3 doesn’t natively support multisampled antialiasing, we tested without AA. Instead, we just cranked up the resolution to 2560×1600 and turned up the game’s quality sliders to the max. I also disabled the game’s frame rate cap before testing.

Here’s another case where CrossFire X scales up to three GPUs better than Nvidia does, and this time, it’s enough to put the Radeons over the top. Adding a fourth GPU is no help, and it even seems to hurt performance. In fact, this may be one case where rendering too far ahead causes problems. Four-way CrossFire X didn’t seem to play UT3 particularly smoothly, and I had trouble (more than usual) with placing shock rifle shots, too. Whatever the cause, I was able to be more accurate with three or fewer GPUs.

Power consumption
We measured total system power consumption at the wall socket using an Extech power analyzer model 380803. The monitor was plugged into a separate outlet, so its power draw was not part of our measurement. The cards were plugged into a motherboard on an open test bench.

The idle measurements were taken at the Windows Vista desktop with the Aero theme enabled. The cards were tested under load running UT3 at 2560×1600 resolution, using the same settings we did for performance testing.

Note that the SLI configs were, by necessity, tested on a different motherboard, as noted in our testing methods section.

Amazingly, our three-way CrossFire X system draws only a few more watts than the same system equipped with a single GeForce 8800 Ultra. Very nice. The RV670 GPU is very easy on the watt meter at idle, so even putting four of them in a system isn’t too bad.

When running a game, there’s no escaping the fact that having three or four GPUs onboard will make a PC pull quite a bit of juice—over 500W in the case of our four-way CrossFire X rig—but it still draws less power than our GeForce 8800 Ultra SLI system.

Noise levels
We measured noise levels on our test systems, sitting on an open test bench, using an Extech model 407727 digital sound level meter. The meter was mounted on a tripod approximately 12″ from the test system at a height even with the top of the video card. We used the OSHA-standard weighting and speed for these measurements.

You can think of these noise level measurements much like our system power consumption tests, because the entire systems’ noise levels were measured, including the stock Intel cooler we used to cool the CPU. Of course, noise levels will vary greatly in the real world along with the acoustic properties of the PC enclosure used, whether the enclosure provides adequate cooling to avoid a card’s highest fan speeds, placement of the enclosure in the room, and a whole range of other variables. These results should give a reasonably good picture of comparative fan noise, though.

Unfortunately—or, rather, quite fortunately—I wasn’t able to reliably measure noise levels for most of these systems at idle. Our test systems keep getting quieter with the addition of new power supply units and new motherboards with passive cooling and the like, as do the video cards themselves. I decided this time around that our test rigs at idle are too close to the sensitivity floor for our sound level meter, so I only measured noise levels under load.

I should mention, though, that our CrossFire X systems were much quieter at idle than the three-way SLI system, whose 1200W power supply generated quite a bit of fan noise. The CrossFire X rigs remained under the ~40 dB level at idle.

The Radeon HD 3870 X2 isn’t horribly loud while gaming, but it is noticeably louder than most other cards these days. When you add another video card to the mix, especially a second 3870 X2, the system becomes even noisier, since two closely situated video cards are both working to expel heat.

Personally, I’m not particularly bothered by the noise levels of the CrossFire X cards’ coolers while gaming. They’re not noisy enough to become a nuisance. Since CrossFire X is fairly quiet at idle, I could probably live with it.

GPU temperatures
Per your requests, I’ve added GPU temperature readings to our results. I captured these using AMD’s Catalyst Control Center and Nvidia’s nTune Monitor, so we’re basically relying on the cards to report their temperatures properly. In the case of multi-GPU configs, well, I only got one number out of CCC. I used the highest of the numbers from the Nvidia monitoring app. These temperatures were recorded while running UT3 in a window.

I’m a little worried about the temperatures we saw out of our single and dual Radeon HD 3870 cards, but the X2 seems to keep GPU temps more in check. As a result, the temperatures we saw reported with our CrossFire X systems were par for the course for today’s GPUs.

A look at 32X SuperAA antialiasing
Given the opportunity to play with 32X antialiasing, I had to take it for a spin. Here’s a quick look at CrossFire X’s 32X AA mode, which combines 16X Super AA load balancing with a wide-tent filter.

Incidentally, I decided to use Half-Life 2: Episode Two for this excursion because AMD’s custom AA filters don’t yet work with DirectX 10 games, even with a single GPU. That’s a pretty major limitation, so keep it in mind.

As you may be aware, the Radeon HD series’ narrow- and wide-tent filters grab subpixel samples from adjacent pixels and then use a weighted average to blend all available samples into a final color. Samples taken farther from the pixel center are weighted more lightly, but you’ll still see a slight blurring effect with these tent filters. We’ve found them to be pretty effective in the past without causing excessive blurriness, but, well, have a look at this:

4X MSAA 6X
4X MSAA + Narrow tent)
8X
(4X MSAA + Wide tent)
32X
(16X Super AA + Wide tent)

Super AA plus the wide-tent filter banishes the jaggies on the tree trunk, branches, vegetation, everything, but it does so at the expense of sharpness on object silhouettes and texture clarity. I don’t mind so much the effect of the narrow-tent filter with 6X AA, but even that produces softer edges than I’d like. Everything on screen looks oddly cartoonish with the blurring produced by the 32X mode—not good.

On top of that, divvying up the antialiasing work remains a poor method of multi-GPU load balancing. With the 32X mode enabled, I saw frame rates of about 14 FPS in Episode Two using our four-way CrossFire X rig. 4X multisampling plus the wide-tent filter ran at 58 FPS, and just 4X AA alone ran at over 70 FPS.

Conclusions
I’m not entirely sure what to make of CrossFire X. On the one hand, AMD has managed to coax some pretty solid performance scaling out of three GPUs—and, in some cases, even four. There’s still opportunity for additional performance enhancements, too, once AMD releases drivers that incorporate the upcoming changes Dave Gotwalt described to us. Not only that, but CrossFire X doesn’t really fall into the “mega-extreme” category reserved for energy drinks, Intel’s Skulltrail, and Nvidia’s three-way SLI. Even with four GPUs, CrossFire X doesn’t have outsized power supply requirements, and it doesn’t draw as much power, produce as much heat, or create as much noise as three-way SLI.

Nor does it cost as much money, because the raw ingredients are cheaper. Right now, you can buy a Radeon HD 3870 X2 online for about $430. Better yet, board makers are now selling higher clocked Radeon HD 3850 512MB cards that perform similarly to the 3870 for as little as $169. Collecting the right mix of video cards to enable three-way or four-way CrossFire X won’t be cheap, but it should be quite a bit more affordable than two or three GeForce 8800 Ultras.

All of these things are good, and CrossFire X’s seamless multi-monitor support is a wonderful thing to have, too.

On the other hand, CrossFire X has the same disadvantages as any current multi-GPU scheme, and those are often multiplied by the presence of three or four GPUs. We’ve talked about the performance scaling challenges involved. The games we tested tended to scale well to at least three GPUs, but not every game will have been profiled by AMD. Too often, brand-new games aren’t properly supported for a period of time after their release—the period during which I’d like to be playing them.

On top of that, we can’t ignore practical questions about the utility of CrossFire X. If you’re really looking to build a three- or four-GPU system, you’d better be planning to connect it to a very high resolution display, like a 30″ widescreen LCD with 2560×1600 resolution, in order to get the most out of it. At lower resolutions, two GPUs are probably more than sufficient for the majority of today’s games. The most prominent game where they’re not, Crysis, doesn’t appear to benefit from CrossFire X at playable frame rates. We even found that two GeForce 8800 GTs in SLI outperform a three-way CrossFire X config in Half-Life 2: Episode Two. According to the cold logic of price-performance ratios, I’d have a hard time passing up a pair of GeForce 9600 GTs or 8800 GTs for a three-way CrossFire X setup, even if CrossFire X could sometimes deliver higher frame rates.

And yet, CrossFire X remains impressive in its way, as a plausible alternative to Nvidia’s pricier solutions involving two or more high-end GPUs. Over the long run, I’m not sold on the concept of lashing together multiple mid-range GPUs as a replacement for a true high-end GPU. We’ll have to see how committed AMD is to this direction. But this isn’t a bad start with more than two GPUs, all things considered.

Latest News

Joint International Police Operation Disrupts LabHost
News

Joint International Police Operation Disrupts LabHost – A Platform That Supported 2,000+ Cybercriminals

Apple Removes WhatsApp and Threads From App Store In China
News

Apple Removes WhatsApp and Threads from Its App Store in China

On Friday Apple announced that it’s removing WhatsApp and Threads from its App Store in China over security concerns from the government. Adding further, Apple said it’s only doing its...

XRP Falls to $0.3 Amid Massive Weekend Sell-off - Can $1 Be Achieved Post-Halving?
Crypto News

XRP Falls to $0.3 Amid Massive Weekend Sell-off – Can $1 Be Achieved Post-Halving?

The crypto market is sinking lower, moving away from its impressive Q1 peak of $2.86 trillion. Major altcoins like Ethereum have not been spared either, with investors facing losses from the...

Cardano Could Rally to $27 After Bitcoin Halving if Historical Performance
Crypto News

Cardano Could Rally to $27 After Bitcoin Halving Following a Historical Performance

Japanese Banking Firm Launches Passive Income Program for Shiba Inu
Crypto News

Japanese Banking Firm Launches Passive Income Program for Shiba Inu

Ripple CLO Clarifies Future Steps With the SEC While Quenching Settlement Rumors
Crypto News

Ripple CLO Clarifies Future Steps With the SEC While Quenching Settlement Rumors

Cisco Launches AI-Driven Security Solution 'Hypershield'
News

Cisco Launches AI-Driven Security Solution ‘Hypershield’