Lucid's GPU load-balancer storms IDF
Converts one skeptic to a hopeful
by Scott Wasson — 10:47 AM on August 22, 2008

When a company called Lucid unveiled a web site promising a revolutionary new technology that could deliver near-perfect performance scaling for multiple GPUs, independent of GPU type, we were initially skeptical. Their claims sounded odd and perhaps too good to be true. But not only were they were present on the show floor at IDF, they were showing a demo of working silicon. Remarkably enough, it appears they may just be on to something big.

To understand what they're doing, you'll first want to recall that, despite their growing popularity, schemes like SLI and CrossFire that combine multiple graphics cards to achieve higher performance often face serious challenges for performance scaling. Dropping in a second video card may get you nearly double the performance if all goes well, but multi-GPU schemes are fragile, and frequently, performance doesn't scale nearly that well—particularly in games that use advanced but potentially problematic rendering methods. Adding a third or fourth GPU to the mix may not help and can even harm performance.

Part of the problem is the way GPUs are architected; unlike CPUs, they're not capable of sharing a common pool of memory, so graphics firms end up managing inter-GPU coordination manually in their drivers, profiling games and making tweaks on a case-by-case basis.

On top of that, SLI and CrossFire both use relatively simple load-balancing algorithms, the most popular of which is alternate frame rendering (AFR), in which GPU 0 renders frame A, GPU 1 renders frame B, GPU 0 renders frame C, and so on. AFR sometimes works well, but isn't compatible with every application. A common alternative is split-frame rendering (SFR), in which GPU 0 draws the top half of the screen while GPU 1 draws the bottom half. SFR is more broadly compatible, but doesn't redistribute the work required in the earlier stages of the graphics pipeline, which harms performance scaling. There are a few variations on these schemes out there, but they don't get much more sophisticated than that.

By contrast, Lucid's approach is much more complex—though still a bit mysterious at the most basic level—and involves its own custom hardware created for graphics load balancing: the Hydra 100 chip. This chip has several key components, including a RISC processing core that Lucid licensed from a third party, Lucid's own proprietary 48-lane PCI Express switch fabric, and an image compositing engine. In a typical implementation, the Hydra 100 would be connected to a system's north bridge chip via a 16-lane PCIe connection. Two GPUs would then sit behind it, each connected to it via a PCIe x16 link. (The Hyrda 100 can also partition its PCIe lanes into a 4x8 config for quad-GPU setups.)

The Hydra 100 then appears to the host OS as a PCIe device, with its own driver. It intercepts calls made to the most common graphics APIs—OpenGL, DirectX 9/10/10.1—and reads in all of the calls required to draw an entire frame of imagery. Lucid's driver and the Hydra 100's RISC logic then collaborate on breaking down all of the work required to produce that frame, dividing the work required into tasks, determining where the bottlenecks will likely be for this particular frame, and assigning the tasks to the available rendering resources (two or more GPUs) in real time—for graphics, that's within the span of milliseconds. The GPUs then complete the work assigned to them and return the results to the Hydra 100 via PCI Express. The Hydra streams in the images from the GPUs, combines them as appropriate via its compositing engine, and streams the results back to the GPU connected to the monitor for display.

As I understand it, because data is streamed from the GPUs into the compositing engine pixel by pixel, and because the compositing engine immediately begins streaming back out the combined result, the effective latency for the compositing step is very low.

Once a frame has been completed, Lucid analyzes the relative performance of its client GPUs for that frame and dynamically adjusts its expectations for the next one. As a result, Lucid President and co-founder Offir Remez told us, the Hydra 100 is capable of effectively load-balancing for asymmetrical GPU configurations, such as a GeForce 8600 GTS and a 9800 GTX. Or, in another potential real-world scenario, Lucid demonstrated its real-time load-balancing running Crysis fluidly while one of the two GPUs involved spent a portion of its power displaying a streaming video.

Because Lucid is simply intercepting and then making OpenGL or DirectX calls, the Hydra 100 is purportedly GPU-agnostic, unconcerned and unaware whether it's working with a Radeon, a GeForce or anything else. (In fact, one of the firm's primary financial backers is Intel's capital investment arm.) One limitation is that the GPUs involved must all use the same graphics driver, so mixing a GeForce with a Radeon won't work.

The most intriguing aspect of this scheme is how Lucid actually breaks down a scene and apportions work to the individual GPUs. Remez said the firm has applied for over 50 patents, many of them for its load-balancing algorithms, which are much more fine-grained than AFR, SFR, or the like.

It's difficult to express verbally, but Lucid's demo on the show floor offered a good sense of what's happening. The demo system had two GeForce GTX 260 cards connected to a Hydra 100 and enclosed in a box. A PCI Express cable then attached this test mule to the PCIe x16 slot in an enthusiast-class system (based on an Intel chipset). The whole setup was running Unreal Tournament 3. On one screen, we could see the output from a single GPU, while the other showed the output from either the second GPU or, via a hotkey switch, the final and composited frame. GPU 0 was rendering the entire screen space, but only portions of the screen showed fully textured and shaded surfaces—a patch of the floor, a wall, a column, a sky box—while other bits of the screen were black. GPU 1, meanwhile, rendered the inverse of the image produced by GPU 0. Wiggle the mouse around, and the mix of surfaces handled by each GPU changed frame by frame, creating an odd flickering sensation that left us briefly transfixed. The composited images, however, appeared to be a pixel-perfect rendition of UT3. Looking at the final output, you'd never suspect what's going on beneath the covers.

Remez told us Lucid uses a mix of load-balancing algorithms, and he wouldn't reveal too many specifics about how the various algorithms might work. He claims the end result is near-linear performance scaling. Even with mismatched GPUs, if the slower of the two is only 30% the speed of the faster one, the total system could produce nearly 1.3X the performance of a single card alone.

I asked Remez about potential snags or incompatibilities in the scheme, things that might cause problems, whether it be multisampled antialiasing of edges or some of the cases that cause SLI and CrossFire to stumble. He asserted that MSAA would work properly and emphasized that the API-level, GPU-agnostic approach Lucid takes tends to shield them from application- or hardware-specific compatibility issues.

Lucid has identified a few places where its technology could likely be deployed at first. The most obvious, perhaps, is in place of a simple PCI Express switch chip on a dual-GPU video card like the Radeon HD 4870 X2. Lucid is already talking with board makers about the possibilities there. Another obvious possibility is for the Hydra 100 to find its way onto motherboards, where it could enable peak performance from high-end multi-GPU teams and offer upgraders the possibility of pairing an older, slower video card with a newer, quicker one for better overall performance. The presence of a Hydra 100 could also provide an easy workaround for chipset-specific multi-GPU lockouts. For instance, a Hydra-equipped motherboard based on an Intel chipset would be able to run multiple GeForce GPUs together, even though Nvidia doesn't allow SLI on Intel chipsets. The third place where the Hydra might be deployed is in "pods" or external multi-GPU enclosures for the professional visualization market, similar to the Quadro enclosures Nvidia sells.

Lucid's IDF demos were running on alpha silicon, but the company has just gotten final silicon back and says it's on track to deliver products during the first half of 2009. The first chips support only PCI Express Gen 1, but Lucid claims that's sufficient for now given the way its scheme works. The A0 silicon demoed at IDF was capable of running without a heatsink, and my finger survived a quick touch test. Andrew Schmied, VP of Marketing for Lucid, pegged the chip's power draw at under 5W.

Assuming the Hydra 100 does work as advertised, the big questions now are "How does it really perform?" and "Who will make use of it?" As for the first question, we got a demo of Crysis running at 1920x1200 at the highest quality levels available in DirectX 9. The test system was using a pair of GeForce 9800 GTX cards, and performance ranged between 40 and 60 FPS on the game's built-in frame rate counter. The game played very, very smoothly, and I didn't perceive any latency between mouse inputs and on-screen responses. That seemed very promising, but we'll have to get one of these things into Damage Labs for a true test of Lucid's scaling claims before we can draw any real conclusions about performance.

We don't yet know exactly who Lucid's first customers might be, but we know that at least one major Taiwanese mobo and video card maker is working with them. Interest in the firm's technology at IDF seemed to be considerable.

We're also curious to see what AMD and Nvidia make of this upstart firm with apparently superior technology to their own load-balancing methods. We haven't yet spoken with either company about Lucid, but we plan to soon. Stay tuned.TR

Related articles

  • AMD's Radeon HD 4830 graphics processor
    October 22, 2008

    AMD has a brand-new Radeon to unveil today, and it's certainly worthy of our attention. However, our time to devote to this card is limited. We'll be in and out of our look at the Radeon HD 4830 in no time, faster and cleaner than a celebrity... Read more...

    91 comments
    Last post by MadManOriginal at 12:57 AM on October 29, 2008

  • Nvidia's GeForce 9300 chipset
    October 15, 2008

    AMD's 780G has been our integrated graphics chipset of choice for nearly eight months, but Nvidia's new GeForce 9300 looks poised to claim the crown for the green team. Keep reading for the goods on the latest GeForce MCP and how it compares with the rest of the IGP... Read more...

    57 comments
    Last post by Phatkat at 7:37 PM on November 24, 2008

  • Intel's G45 Express chipset
    October 9, 2008

    AMD and Nvidia have beefed up their integrated graphics chipsets to offer decent gaming performance and Blu-ray decode acceleration. We take Intel's latest G45 Express for a spin to see if it can keep up with the graphics giants. Read more...

    34 comments
    Last post by derFunkenstein at 8:10 AM on October 11, 2008

  • GeForce GTX 260 reloaded vs. the Radeon HD 4870 1GB
    October 9, 2008

    Tight competition has resulted in two new video cards that redefine their end of the market for just a smidgen under 300 bucks: the GeForce GTX 260 "reloaded" and the Radeon HD 4870 1GB. Read more...

    219 comments
    Last post by StuG at 6:29 PM on October 22, 2008

  • Can a sub-$100 graphics card get the job done?
    September 24, 2008

    So just how much money should you spend on a graphics card? The latest models for under 100 bucks might surprise you with their potency, both in games and HD video playback. We've lined up eight cards to see where the values are. Read more...

    112 comments
    Last post by tocatl at 11:43 PM on October 24, 2008

  • Live blog from the Nvision finale
    August 27, 2008

    The Mythbusters are going to blow stuff up, or so we hear, and we're going to attempt to document it—live from the San Jose performing arts center. Read more...

    15 comments
    Last post by Steel at 11:18 AM on August 29, 2008

  • Live blog from the Nvision opening keynote
    August 25, 2008

    We're live at the San Jose performing arts center for the Nvision 08 opening keynote. Join us as we watch Nvidia CEO Jen-Hsun Huang take on the Cylons. Read more...

    57 comments
    Last post by palisade at 11:32 PM on September 5, 2008

  • AMD's Radeon HD 4870 X2 graphics card
    August 12, 2008

    We've already previewed the Radeon HD 4870 X2, and today the card is becoming official. To mark the occasion, we've wrangled two of these dual-GPU monsters and paired them up via CrossFire. We've also assembled a bundle of GeForce GTX cards for comprehensive next-generation multi-GPU madness. Read more...

    131 comments
    Last post by Damage at 10:06 AM on October 8, 2008

Tags: GPUs

Latest news stories

Related articles

Copyright ©1999-2009 The Tech Report. All rights reserved.
About us | Privacy policy | Subscribe to our mailing list