MATROX’S NEW CARD hit Damage Labs late last week, and we’ve only had a little bit of time to spend with it since then. Not nearly enough to put together a full review of this very complex and feature-fortified graphics chip. However, what we’ve learned so far has been interesting enough that we’d like to share our first impressions with you. Before we go any further, though, you must read our Parhelia technology preview. It will give you an excellent introduction to Matrox’s new graphics chip, complete with highly detailed explanations and even more highly smart-aleck comments. Plus, I spent a lot of time writing that thing, and then our web service clogged up and nobody read it. So indulge me.
To refresh your memory if you did read the preview the first time out, Parhelia is Matrox’s first new GPU in pretty much forever, and it’s a beast: 80 million transistors, 4 pixel pipelines with 4 texture units each, pixel and vertex shaders, a $399 price tag, and fully twice the memory bandwidth of a Radeon 8500. You wouldn’t know it from looking at the card, necessarily. Our review unit is a final production card with 128MB of memory, and it’s no larger than most video cardsand quite a bit smaller than a GeForce4 Ti 4400 or 4600.
The Parhelia 128MB card looks pretty conventional
Dual DVI outputs can serve two LCD panels or three VGA monitors via converters
That big ol’ chip
I was curious to see exactly how big this 80-million-transistor chip is, so I yanked the cooler off the GPU, only to find a big metal cap, or “heat spreader” if you wanna get fancy, on top of the chip. Like this:
The Parhelia-512 GPU is packaged in a metal heat spreader thingy So we can’t tell exactly how big the chip is without probably ruining the card by pulling off the cap, and Matrox ain’t saying. We do know it’s fabbed by UMC on a 150nm process. I’m sticking with my estimate that it’s a little bigger than Rhode Island. The metal cap is roughly the size of a former Soviet breakway republic, so I can’t be far off.
The final specs
One thing we know now that we didn’t before the card arrived is the final GPU and memory clock speed specifications. As we kind of expected, that hefty chip won’t hit super-high clock speeds too well. The retail version of Parhelia comes in at only 220MHz, in fact, which is a little slow compared to the competition. The memory clock runs at 275MHz, or 550MHz in DDR-speak, which isn’t fast enough to deliver the 20GB/s memory bandwidth Matrox initially intended. That’s no big deal, though, because the GPU’s clock speed is likely to be the limiting factor for performance in most situations.
There will also be an OEM or “bulk” version of Parhelia floating around out there in pre-built PCs and, most likely, at some mail-order houses. That version has a 200MHz GPU clock and 500MHz memory. The only way to tell the difference between the two cards for sure, according to Matrox, is a “B” in the model number of the bulk/OEM editions. So watch carefully. At least Matrox is being up-front about it, even though I’d prefer a better naming convention.
With that said, let’s whip out the GPU table and see how the final Parhelia specs fit into the big picture. Remember, as always, that specs aren’t destiny, and performance will vary. The chip table here is just a useful little guide to give you a sense of each chip’s capabilities.
Core clock (MHz) | Pixel pipelines | Peak fill rate (Mpixels/s) | Texture units per pixel pipeline | Peak fill rate (Mtexels/s) | Memory clock (MHz) | Memory bus width (bits) | Peak memory bandwidth (GB/s) | |
GeForce4 MX 440 | 270 | 2 | 540 | 2 | 1080 | 400 | 128 | 6.4 |
GeForce3 Ti 200 | 175 | 4 | 700 | 2 | 1400 | 400 | 128 | 6.4 |
GeForce4 Ti 4200 128MB | 250 | 4 | 1000 | 2 | 2000 | 444 | 128 | 7.1 |
Radeon 7500 | 290 | 2 | 580 | 3 | 1740 | 460 | 128 | 7.4 |
GeForce3 Ti 500 | 240 | 4 | 960 | 2 | 1920 | 500 | 128 | 8.0 |
GeForce4 Ti 4200 64MB | 250 | 4 | 1000 | 2 | 2000 | 500 | 128 | 8.0 |
Radeon 8500LE | 250 | 4 | 1000 | 2 | 2000 | 500 | 128 | 8.0 |
GeForce4 Ti 4400 | 275 | 4 | 1100 | 2 | 2200 | 550 | 128 | 8.8 |
Radeon 8500 | 275 | 4 | 1100 | 2 | 2200 | 550 | 128 | 8.8 |
GeForce4 Ti 4600 | 300 | 4 | 1200 | 2 | 2400 | 650 | 128 | 10.4 |
Parhelia-512 OEM | 200 | 4 | 800 | 4 | 3200 | 500 | 256 | 16.0 |
Parhelia-512 Retail | 220 | 4 | 880 | 4 | 3520 | 550 | 256 | 17.6 |
Let me call your attention to a few key numbers here, so you can see how Parhelia’s final specs are likely to affect its performance in 3D applications.
- Peak pixel fill rate Because of its relatively low core clock speed, Parhelia’s pixel fill rateits raw ability to draw pixels onscreenis lower than many of its competitors. At 880 Mpixels/s, it’s slower than even a GeForce4 Ti 4200 or a Radeon 8500LE, in fact. Pixel fill rates alone aren’t generally a limiting factor in high-end graphics cards, but they are worth noting.
- Peak texel fill rate The more important number in many respects is a GPU’s peak texel fill rate, or its ability to draw textured pixels onscreen. Here, the Parhelia has everything else outclassed thanks to its four texture units per pixel pipe. Even the mighty GeForce4 Ti 4600 is over 1000 Mtexels/s behind.
- Texel fill rate with two textures This number isn’t actually listed on the chart. However, in reality, this number may be more important than the other two numbers above. To get this number, simply multiply a chip’s pixel fill rate by twoall the cards have at least two texture units per pipeline. This number is key because lots of current 3D applications and games only lay down two textures per rendering pass, so this number determines actual performance in a lot of cases. And here the Parhelia’s lower GPU clock speed is a real sticking point. At 1760 Mtexels/s, the big Matrox chip runs well behind the Ti 4600 (2400) and the Radeon 8500 (2200). Once more, even the Ti 4200 and the Radeon 8500LE chips (both at 2000 Mtexels/s) are faster. Keep this in mind as the benchmarks unfold.
- Memory bandwidth We’ve long said that memory bandwidth is the primary limiting factor in terms of pixel-pushing power, and the Parhelia has gobs more memory bandwidth than the competition. However, we have to wonder how well the GPU can take advantage of all that bandwidth when it’s not using all four texture units at once.
Not only that, but efficient memory controllers and techniques like Z-buffer compression and (especially) occlusion detection (a/k/a hidden surface removal) have helped the latest NVIDIA and ATI chips alleviate the memory bandwidth bottleneck. Parhelia has a highly optimized memory controller, but it lacks occlusion detection. Regardless, memory bandwidth isn’t likely to be a significant bottleneck for Parhelia cards right now.
So what does this festival of bulleted points tell us? In a nutshell, this: If an application can’t make full use of Parhelia’s four texture units per pipe, expect the benchmark results to look a bit pokey. If an application can use all four texture units, then duck.
Well, OK, maybe you don’t need to duck. After all, we are dealing early revisions of Parhelia drivers. Past history tells us there’s probably lots of room for optimizations and improvements in any early-rev graphics drivers.
Now, without further ado, let’s look at the few benchmarks we’ve had time to run on this thing.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least twice, and the results were averaged.
Our test system was configured like so:
Athlon XP | |
Processor | AMD Athlon XP 2200+ 1.8GHz |
Front-side bus | 266MHz (133MHz double-pumped) |
Motherboard | Shuttle AK35GT2/R |
Chipset | VIA KT333 |
North bridge | VT8367 |
South bridge | VT8233A |
Chipset drivers | VIA 4-in-1 4.38(2)v(a) |
Memory size | 512MB (2 DIMMs) |
Memory type | Corsair XMS3000 PC2700 DDR SDRAM |
Sound | Creative SoundBlaster Live! |
Storage | Maxtor DiamondMax Plus D740X 7200RPM ATA/100 hard drive |
OS | Microsoft Windows XP Professional |
OS updates | None |
We used Matrox’s 2.25 drivers, which are purportedly the release revision, for testing. For comparison, we used an Abit Siluro GF4 Ti 4600 128MB AGP card with NVIDIA’s new 29.42 drivers, plus an ATI Radeon 8500 128MB with ATI’s new CAYALYST 7.72 drivers.
I want to give a big thanks to Corsair for providing us with DDR333 memory for our testing. Their XMS3000 DIMMs allowed us to run the memory on our Shuttle AK35GT2/R test motherboard at CAS2 timings at 166MHz (that’s 333MHz DDR, kids). If you’re looking to tweak out your system to the max and maybe overclock it a little, Corsair’s RAM is definitely worth considering. Using it makes life easier for us as we’re dealing with brand-new chipsets and pre-production motherboards, because we don’t have to worry so much about stability and compatibility. The stuff flat works.
The test systems’ Windows desktops were set at 1024×768 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- MadOnion 3DMark 2001 SE build 330
- Codecreatures Benchmark Pro
- Serious Sam SE v1.05
All the tests and methods we employed are publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Codecreatures Benchmark Pro
First up is Codecreatures’ Benchmark Pro. This test uses pixel and vertex shaders, and it can bring a GF4 Ti 4600 to its knees. Seems like an ideal candidate for testing a next-gen graphics chip.
Early on, the Parhelia outruns the Radeon 8500, but after that, it runs solidly in last place. Honestly, I kind of expected Parhelia’s extra memory bandwidth to kick in and give it a boost at higher resolutions, but that just wasn’t the case here.
Serious Sam SE
Serious Sam is always fun because, well, you get to shoot things. But Serious Sam benchmarks are fun, too, because we can plot frame rates over time in a sort of graphics card polygraph test. We ran these tests at “Normal” quality settings, and we let the application’s self-tuning mechanisms do their thing, so remember: this is an application benchmark. Also, we used the OpenGL APIthe game’s defaultfor testing. Here’s how Parhelia stacked up.
Once more, Matrox’s latest consistently runs behind the top-line cards from ATI and NVIDIA. At least we’re not seeing any weird spikes or dips out of Parhelia that we don’t see out of the other cards, so basic playability isn’t a problem.
However, Parhelia is slower at 1024×768 than the competing cards are at 1600×1200. Obviously, the performance bottleneck is a polygon throughput or driver execution issue, not a fill-rate limitation.
3DMark 2001 SE
Now for the big dawg of DirectX 8-class performance measurements, 3DMark 2001 SE. We used MadOnion’s just-released build 330 of the app for these tests. 3DMark puts pixel and vertex shaders to extensive use, so it’s a worthy measurement of a card with Parhelia’s capabilities.
Well, the Parhelia didn’t exactly knock the cover off the ball. Let’s break down 3DMark’s results to see what went wrong.
Game tests
The game tests all tell the same story: Parhelia is slower than the other cards. We don’t see an inordinate performance drop-off with Parhelia when moving from the low-detail to high-detail versions of the test, so it’s not just polygon throughput holding it back. And of course, at 3DMark’s default resolution of 1024×768, none of these cards is likely to be strictly fill-rate limited in these game tests.
Fill rate and pixel shader performance
I’ve tried to group 3DMark’s synthetic test together as logically as possible. The first two of these tests measure fill rate.
As we predicted when we looked over the chip chart, Parhelia’s single-textured fill rate is nothing spectacular. The GPU’s low clock speed limits performance here. However, we do see a flash of brilliance (or is it three?) out of Parhelia: the multi-textured fill rate leads the pack, although it’s not anywhere near the chip’s theoretical peak of over 3500 Mtexels/s.
The next two tests measure performance with the two most popular forms of bump mapping. Matrox was once the world’s leading proponent of environmental bump mapping, when they were one of the only ones with an implementation in hardware.
Parhelia shows some strength here, beating out the Radeon 8500 in both tests.
Let’s look at pixel shader performance.
This is about the kind of performance one might expect given Parhelia’s clock speed disadvantage. I watched the Parhelia run through these tests, and everything looked right to me. In fact, the image quality was quite good.
Poly throughput and vertex shader performance
Matrox says the Parhelia has an array of four vertex shaders, which should give it an edge; the ATI and NVIDIA chips have two vertex shaders each. These first two tests, however, are a bit of a wild card, because they use fixed-function T&L instead of vertex shaders. Generally, fixed-function T&L capabilities are implemented as a vertex program on a vertex-shader-equipped GPU.
Parhelia’s a little slow with only one light in the scene, but it performs better when eight lights are present.
Now let’s look at straight-up vertex shaders.
The new Matrox chip is competitive here, but even in a pure vertex shader test, the NVIDIA is fastest.
Point sprites or particles are generally handled by the vertex shader, which is why they’re listed here. Matrox has some catching up to do in point sprite performance, which is key for certain types of effects.
Antialiasing performance
We haven’t had much time to test, but we’ll delve into antialiasing just a little bit because Parhelia’s 16X fragment AA is so intriguing. This antialiasing technique is an edge-only affair, which should make it quite efficient. And with a larger sample size16Xthan any GeForce, it’s bound to look good.
Boy, does it look good. Get a load of the screenshot below, which is a low-compression JPEG of an in-game screenshot at actual size (sorry, modem users).
Gorgeous, no? It looks just as good in motion, which is the true test.
Let’s take Parhelia’s antialiasing for a spin versus the competition.
Parhelia’s 16X FAA holds up quite well versus the other cards at 4X AA. Matrox is delivering 16 samples at some very nice speeds. We’ve included results for the Parhelia at 4X AA, where it’s doing ordered-grid supersampling instead of fragment AA. As you can see, the edges-only fragment AA is efficient enough to be much faster than 4X supersampling, even with four times the number of samples on edge pixels.
Parhelia may just have the best edge antialiasing capabilities in a consumer card yet, both in terms of image quality and efficiency of implementation.
That other kind of AA
There is another kind of antialiasing, however: texture AA, which we commonly refer to as texture filtering. Matrox has claimed Parhelia offers “the world’s most advanced texture filtering units,” capable of delivering 64 texture supersamples per clock.
The best sort of texture filtering we tend to see is anisotropic filtering. Unfortunately, with current drivers, the “most advanced texture filtering units” can’t do better than 2X (16-sample) anisotropic filtering. I noticed this limitation and asked Matrox about it, and they confirmed to me that current drivers are limited to 2X aniso for performance reasons. The hardware can do 8X (64-sample) aniso, and Matrox is considering enabling that capability in future drivers. Given that the GF4 Ti can do 8X aniso and the Radeon 8500 can (with some caveats) handle 16X aniso, I think enabling stronger forms of anisotropic filtering would be wise.
That said, Parhelia can do 2X aniso and trilinear filtering simultaneously, which gives it a leg up on the Radeon 8500.
Other image quality notes
Running Parhelia in Quake III at 1280×1024 in 10-bits/channel “gigacolor” with 16X AA and 2X anisotropic filtering cranked up looks amazing. Running any sort of pixel shader demo looks fantastic; reflective and refractive surfaces (or both at once) deliver clarity and color saturation that’s more than a match for a GeForce4, at least by my first impressions.
And Matrox’s Reef Demo is the second most stunning visual experience I’ve seen rendered in real time on a personal computer (Doom III, if a QuickTime movie counts, is first). The Reef Demo is Matrox’s showcase for Parhelia’s 3D rendering engine, and this demo alone may sell a few graphics cards.
The beefed-up Reef Demo now has loads of fish, each with its own pixel and vertex shaders As for 2D, I’ve not hooked this card up to a really nice monitor yet, so I can’t comment on RAMDACs. In Matrox’s little viewer app, I can easily spot the difference 10-bit gigacolor makes on Matrox’s sample images.
Conclusions
We’ve barely scratched the surface of what this 80-million-transistor chip can do, and unfortuntately, that’s all the article we’ve had time to prepare. (You really, really should go read our Parhelia technology article to learn more about the chip’s features.) We’ll get to work soon on a more in-depth review of Parhelia in the usual TR style. Still, I think we’ve seen enough to draw some tentative conclusions about the basic 3D performance of the chip and about its place in the graphics market.
Matrox told us at the outset that they weren’t looking to capture huge chunks of market share with Parhelia, and obviously, they weren’t kidding. Matrox clearly prefers to run upmarket from ATI and NVIDIA, relying on unique features like DualHead or 10-bits/channel color in order to sell its premium-priced graphics cards. In current 3D games, Parhelia performs more like a Radeon 8500LE or a GeForce4 Ti 4200 than a high-end card. Future driver revisions may help performance considerably. Still, at $399, Parhelia isn’t going to be for everyone. But if you want 10-bits/channel “gigacolor,” surround gaming, hardware displacement mapping, and 16X FAA, you’re going to have to pay the price of admission.
Some users won’t bat an eye. The Parhelia cards ship with a Photoshop plug-in to enable 10-bits/channel color, and I expect to see some scuffles in the aisles of Best Buy this week as rabid graphics artists fight over that last card. Goatees will be yanked, hemp clothing torn.
Likewise, if Matrox can extend Parhelia’s reach into the workstation market, it may have a shot there. (16X fragment AA would be a big hit in workstation apps, as would 10-bits/channel color.) To do so, they’ll need to compete against the same GPUs we’ve used for comparison, but on different cardsQuadros and Fire GL cardsusing different drivers. Matrox says they plan to offer certified OpenGL drivers for all Parhelia cards, which makes the card’s $399 price tag seem much more reasonable. (Fire GL 8800 128MB cards start at just under $500.) Given the current performance of Parhelia’s drivers, those drivers optimized for high polygon throughput can come none too soon.
So there you have it. Matrox is back, and with Parhelia, they’ve delivered a very noteworthy achievement. The current performance isn’t quite what we might have hoped, but Matrox has sacrificed that performance for features. Parhelia’s features are substantial enough to have captured, at least for a moment, a portion of the graphics technology lead for Matroxno mean feat in the midst of a brutal struggle between ATI and NVIDIA. If UMC can make the transition to a 130nm fab process, Matrox is in position to take advantage. The Parhelia needs only a clock speed increase to scale up nicely; the memory bandwidth is already there.