Home AMD’s Socket AM3 Phenom II processors
Reviews

AMD’s Socket AM3 Phenom II processors

Scott Wasson
Disclosure
Disclosure
In our content, we occasionally include affiliate links. Should you click on these links, we may earn a commission, though this incurs no additional cost to you. Your use of this website signifies your acceptance of our terms and conditions as well as our privacy policy.

It seems like only last month, we were reviewing the first Socket AM2+ versions of the Phenom II processor.

Perhaps because, well, we were.

Yet here we are a month later, and AMD has produced a new revision of the Phenom II capable of working with Socket AM3-style motherboards and DDR3 memory. Hard to keep up sometimes, innit?

Fortunately, although the change is no small accomplishment for AMD, it is relatively simple in the grand scheme of things. The Phenom II’s memory controller has been modified to add support for DDR3 memory, mainly. Another happy consequence of the new silicon revision is additional clock speed headroom for the “uncore” (as Intel might call it) portions of the Phenom II—the memory controller, L3 cache, and HyperTransport—whose clocks run at 2GHz in this wave of new Socket AM3 processors.

Beyond that, little has changed in a month. The chips are still manufactured using AMD’s 45nm SOI fab process, and AMD hasn’t even modified its die size or transistor count estimates: they’re still 258 mm² and 758 million, just like previous Phenom IIs. The new chips are still compatible with existing 7-series chipsets from AMD, as well.


A Socket AM2+ processor (left) next to a Socket AM3 CPU (right)

The move to a new memory type requires a new pinout configuration, though, and that’s where Socket AM3 comes into the picture. You’ll need a Socket AM3 motherboard in order to use these Phenom II processors with DDR3 memory. This new socket type looks an awful lot like the prior Socket AM2+, but it has two fewer pins, for a total of 938. As a result, Socket AM2+ processors can’t fit into Socket AM3 motherboards. But in a clever muggle trick, Socket AM3 processors will happily drop into Socket AM2+ motherboards and work with DDR2 memory.

Socket AM3 retains the same lever-style ZIF socket as

Socket AM2+ and should be compatible with the same coolers

As you probably know by now, DDR3 memory enables higher clock speeds (and thus bandwidth) than DDR2-type memory, and it can operate at lower voltages, leading to reduced power consumption, as well. As with many such transitions, DDR3 isn’t magically better than DDR2 in every way; it’s just an incremental improvement. And, although it’s been around for a while now in Intel systems, DDR3 still costs more per megabyte than DDR2. Most folks expect shipment volumes to tip in favor of DDR3 at some point this year, though, and when that happens, prices should become more even. Heck, DDR3 is already pretty stinkin’ cheap, even if it does cost more than DDR2.

The new Phenom IIs officially support DDR3 memory at up to 1333MHz, but the multipliers are present for 1600MHz operation, as well, as they are in high-end Core 2 and Core i7 systems. Unlike the Core i7, the Phenom II still has “only” two memory channels onboard, not three. I say “only” because each channel of DDR3-1333 memory can transfer up to 10.7 GB/s. Combined with the 2GHz HyperTransport 3 link on each CPU, the total bandwidth available via Socket AM3 is roughly 37.3 GB/s, considerably more than the peak data rate of 10.7 GB/s available via a Core 2 processor’s front-side bus (even if it is less than the staggering 64 GB/s possible with a Core i7-965 Extreme and three channels of DDR3 at 1600MHz.) One caveat: the Phenom II only supports 1333MHz DDR3—at least, officially—with a single DIMM in each memory channel. With four DDR3 DIMMs, 1066MHz is the standard. Such limitations are nothing new, of course. Previous Phenoms have long supported 1066MHz DDR2 memory, but only with a single DIMM per channel.


An unnecessarily large close-up of the Phenom II X4 810

Oddly enough, the newest Phenom II chips aren’t the fastest ones. The first wave of Socket AM2+ only processors, including the X4 920 and 940, are higher end products with faster core clock speeds. The first Socket AM3 parts are cut-down versions of the Phenom II with lower speeds and less cache. Here’s a list of ’em all.

Model Clock speed North
bridge/

L3 cache speed

L3
cache
size
Cores TDP Price
Phenom
II X3 710
2.6
GHz
2.0
GHz
6MB 3 95W $125
Phenom
II
X3 720 Black Edition
2.8
GHz
2.0
GHz
6MB 3 95W $145
Phenom
II X4 805
2.5
GHz
2.0
GHz
4MB 4 95W
Phenom
II X4 810
2.6
GHz
2.0
GHz
4MB 4 95W $175
Phenom
II X4 910
2.6
GHz
2.0
GHz
6MB 4 95W

I haven’t listed it above, but as with all Phenoms, these Socket AM3 processors have 512KB of L2 cache per core. Also, notice that there’s no pricing for the Phenom II X4 805 and the X4 910. Both of these processors are only intended for large PC makers, so AMD hasn’t set any retail pricing for these products.

We have two of the retail products in hand today. The X4 810 is a quad-core processor with 4MB of L3 cache (the remaining 2MB in silicon has been disabled), and AMD has positioned it roughly opposite the Core 2 Quad Q8200, given its price tag of 175 bucks. Like the Phenom II X4 810, the Q8200 has a 95W TDP rating, so the matchup between these two rivals should be fairly straightforward.

Less so is the case of the Phenom II X3 720, which has a higher clock speed of 2.8GHz, a full 6MB L3 cache—and one core disabled. AMD cites the Core 2 Duo E8400 as the 720’s most direct competitor, and that’s a bold statement indeed, since the E8400 has been an enthusiast value favorite for some time now. The E8400 has two higher performance cores, against the 720’s three lower performance ones. We’ll have to see how that dynamic works itself out in the performance sweeps, but the answer is likely to be complicated. Another complication: the E8400 is a 65W part, while the X3 720 has a 95W TDP, so you may pay in added power consumption for the additional core. That downside may be offset by the fact that the X3 720 is a Black Edition processor with an unlocked upper multiplier for dead-simple overclocking. All in all, an intriguing matchup.


Asus’ M4A79T Deluxe mobo

The first Socket AM3 motherboard to make it into Damage Labs is the Asus M4A79T Deluxe, pictured above. This is a relatively high-end board based on the 790FX chipset, and it includes a total of 32 PCIe 2.0 lanes for graphics, which can be configured in various ways across its four physical PCIe x16 slots, including dual x16 and quad x8 arrangements. As you can see, this mobo packs the customary complement of high-end features, with more ports than Oakland (and probably a better football team, too.) The M4A79T Deluxe is already listed at a couple of online vendors for around 200 bucks. In my limited use of this board during CPU testing, I found it to be in pretty good shape for such an early product, with exemplary stability during normal use and decent overclocking headroom, as well. We’ll see about subjecting it to a full review soon.

Test notes
In order to gauge the impact of memory type on performance and power use, we’ve tested the Phenom II X4 810 both with DDR2 memory on a Socket AM2+ board and with DDR3 memory on a Socket AM3 board. You’ll find the results in the follow pages, labeled appropriately.


The Core 2 Quad Q8300

Here’s a look at the Core 2 Quad Q8300 processor we used for testing. This processor came to us courtesy of the good folks at NCIX and NCIXUS. Thanks to them for making this comparison possible. We haven’t yet had a Core 2 Quad Q8000-series processor in house for testing, a situation we’re happy to remedy. This quad-core processor is based on a pair of 45nm dual-core Penryn chips, like other new Core 2 Quads, but the chips on the Q8300 have had their onboard L2 caches reduced from 6MB to 2MB, so the Q8300 has a total of 4MB L2 cache. That’s a big reduction, but these are value quad-cores. The Q8300 has a 1333MHz front-side bus and a core clock of 2.5GHz, and it sells for as little as $190 right now.

The more direct competition for the Phenom II X4 810 is the Core 2 Quad 8200, which runs at 2.33GHz, so we’ve underclocked our Q8300 to simulate a Q8200 for this review. I’m sure we’ll get around to testing the Q8300 at its stock speed, as well, eventually.

We’ve simulated several other speed grades via underclocking, too. Specifically, the Phenom II X4 920 is an underclocked 940, and the Core 2 Quad Q9550 is an underclocked Core 2 Extreme QX9650. We expect the performance of these “simulated” speed grades to be identical to the real things, but we generally omit these processors from our power consumption testing because we do anticipate power use would vary slightly from the actual products.

Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.

Our test systems were configured like so:

Processor Core
2 Quad Q6600
2.4 GHz
Core
2 Duo E8400
3.00 GHz
Core
2 Duo E8600
3.33 GHz
Core 2 Quad Q8200 2.33 GHz
Core 2 Quad Q9300 2.5 GHz
Core 2 Quad Q9400 2.66 GHz
Core 2 Quad Q9550 2.83 GHz
Core
2 Extreme QX9770 3.2 GHz
Dual
Core
2 Extreme QX9775 3.2 GHz
Core
i7-940 2.66 GHz
Core i7-940 2.93 GHz
Core
i7-965
Extreme 3.2 GHz
Athlon
64 X2 6400+
3.2 GHz
Phenom
X3 8750
2.4 GHz

Phenom II X4 920
2.8 GHz
Phenom II X4 940
3.0 GHz
Phenom
II X4 810
2.6 GHz

Phenom X4 9950
Black 2.6 GHz
Phenom
II X3 720
2.8 GHz
Phenom II X4 810
2.6 GHz
System bus 1066
MT/s
(266 MHz)
1333
MT/s
(333 MHz)
1600
MT/s
(400 MHz)
1600
MT/s
(400 MHz)
QPI
4.8 GT/s
(2.4 GHz)
QPI
6.4 GT/s
(3.2 GHz)
HT
2.0 GT/s
(1.0 GHz)
HT
3.6 GT/s (1.8 GHz)
HT
3.6 GT/s (1.8 GHz)
HT
4.0 GT/s (2.0 GHz)
HT
4.0 GT/s (2.0 GHz)
HT
4.0 GT/s (2.0 GHz)
Motherboard Asus
P5E3 Premium
Asus
P5E3 Premium
Asus
P5E3 Premium
Intel
D5400XS
Intel
DX58SO
Intel
DX58SO
Asus
M3A79-T Deluxe
Asus
M3A79-T Deluxe
MSI
DKA790GX Platinum
Asus
M4A79T Deluxe
BIOS revision 0605 0605 0605 XS54010J.86A.1149.
2008.0825.2339
SOX5810J.86A.2260.
2008.0918.1758
SOX5810J.86A.2260.
2008.0918.1758
0403 0403 11/25/08 0703
1.6
(1/21/09)
North bridge X48
Express MCH
X48
Express MCH
X48
Express MCH
5400
MCH
X58
IOH
X58
IOH
790FX 790FX 790GX 790FX
South bridge ICH9R ICH9R ICH9R 6321ESB ICH ICH10R ICH10R SB750 SB750 SB750 SB750
Chipset drivers INF
Update 9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF
Update 9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF
Update 9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF Update
9.0.0.1008

Matrix Storage Manager 8.5.0.1032

INF
update 9.1.0.1007
Matrix Storage Manager 8.5.0.1032
INF
update 9.1.0.1007
Matrix Storage Manager 8.5.0.1032
AHCI
controller 3.1.1540.61
AHCI
controller 3.1.1540.61
AHCI
controller 3.1.1540.61
AHCI
controller 3.1.1540.61
Memory size 4GB
(2 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
6GB
(3 DIMMs)
6GB
(3 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
4GB
(2 DIMMs)
Memory type Corsair
TW3X4G1800C8DF
DDR3 SDRAM
Corsair
TW3X4G1800C8DF
DDR3 SDRAM
Corsair
TW3X4G1800C8DF
DDR3 SDRAM
Micron
ECC DDR2-800
FB-DIMM
Corsair
TR3X6G1600C8D
DDR3 SDRAM
Corsair
TR3X6G1600C8D
DDR3 SDRAM
Corsair
TWIN4X4096-8500C5DF
DDR2 SDRAM
Corsair
TWIN4X4096-8500C5DF
DDR2 SDRAM
Corsair
TWIN4X4096-8500C5DF
DDR2 SDRAM
Corsair
TW3X4G1600C9DHXNV
DDR3 SDRAM
Memory
speed (Effective)
1066
MHz
1333
MHz
1600
MHz
800
MHz
1066
MHz
1600
MHz
800
MHz
1066
MHz
1066
MHz
1333
MHz
CAS latency (CL) 7 8 8 5 7 8 4 5 5 8
RAS to CAS delay (tRCD) 7 8 8 5 7 8 4 5 5 8
RAS precharge (tRP) 7 8 8 5 7 8 4 5 5 8
Cycle time (tRAS) 20 20 24 18 20 24 12 15 15 20
Command
rate
2T 2T 2T 2T 2T 1T 2T 2T 2T 2T
Audio Integrated
ICH9R/AD1988B
with SoundMAX 6.10.2.6480 drivers
Integrated
ICH9R/AD1988B
with SoundMAX 6.10.2.6480 drivers
Integrated
ICH9R/AD1988B
with SoundMAX 6.10.2.6480 drivers
Integrated
6321ESB/STAC9274D5
with SigmaTel 6.10.5713.7 drivers
Integrated
ICH10R/ALC889
with Realtek 6.0.1.5704 drivers
Integrated
ICH10R/ALC889
with Realtek 6.0.1.5704 drivers
Integrated
SB750/AD2000B
with SoundMAX 6.10.2.6480 drivers
Integrated
SB750/AD2000B
with SoundMAX 6.10.2.6480 drivers
Integrated
SB750/ALC888
with Realtek 6.0.1.5704 drivers
Integrated
SB750/ALC1200
with Realtek 6.0.1.5704 drivers
Hard drive WD Caviar SE16 320GB SATA
Graphics Radeon
HD 4870 512MB PCIe with Catalyst 8.55.4-081009a-070794E-ATI
drivers
OS Windows Vista Ultimate x64 Edition
OS updates Service
Pack 1, DirectX redist update August 2008

Thanks to Corsair for providing us with memory for our testing. Their products and support are far and away superior to generic, no-name memory.

Our single-socket test systems were powered by OCZ GameXStream 700W power supply units. The dual-socket system was powered by a PC Power & Cooling Turbo-Cool 1KW-SR power supply. Thanks to OCZ for providing these units for our use in testing.

Also, the folks at NCIXUS.com hooked us up with a nice deal on the WD Caviar SE16 drives used in our test rigs. NCIX now sells to U.S. customers, so check them out.

The test systems’ Windows desktops were set at 1600×1200 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled.

We used the following versions of our test applications:

The tests and methods we employ are usually publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

Memory subsystem performance

These most excellent squiggly lines show bandwidth at various stages of the cache and memory hierarchy. Stunningly confusing, innit? The new Phenom IIs perform more or less as expected here, for what it’s worth.

Since it’s difficult to see the results once we get into main memory, let’s take a closer look at the 256MB block size:

We get a bit of a look at DDR3 in action here, as the Phenom X4 810 on the Socket AM3 mobo transfers more data in this test than any other desktop processor save for the Core i7. The bandwidth boost when going from 1066MHz DDR2 to 1333MHz DDR3 isn’t huge, but it’s real and measurable.

The transition from DDR2 to DDR3 doesn’t exact a big penalty in terms of memory access latencies—just a single nanosecond on the X4 810. Notably, the Socket AM3 processors are a couple of nanoseconds quicker at getting to memory than the older Phenom IIs, likely due to the 200MHz higher L3 cache speeds of the Socket AM3 chips.

Crysis Warhead
We measured Warhead performance using the FRAPS frame-rate recording tool and playing over the same 60-second section of the game five times on each processor. This method has the advantage of simulating real gameplay quite closely, but it comes at the expense of precise repeatability. We believe five sample sessions are sufficient to get reasonably consistent results. In addition to average frame rates, we’ve included the low frame rates, because those tend to reflect the user experience in performance-critical situations. In order to diminish the effect of outliers, we’ve reported the median of the five low frame rates we encountered.

We tested at at relatively modest graphics settings, 1024×768 resolution with the game’s “Mainstream” quality settings, because we didn’t want our graphics card to be the performance-limiting factor. This is, after all, a CPU test.

In Warhead, as in most of today’s games, a pair of fast cores translates into better performance than three or more slower cores. As a result, the Core 2 Duo E8400 maintains higher frame rates than the Phenom II X3 720, and both of those processors are faster than the Phenom II X4 810. Among the quad cores, the X4 810 outperforms the Q8200. And, in a photo finish, the DDR2 and DDR3 configurations of the X4 810 perform almost identically.

Far Cry 2
After playing around with Far Cry 2, I decided to test it a little bit differently by recording frame rates during the jeep ride sequence at the very beginning of the game. I found that frame rates during this sequence were generally similar to those when running around elsewhere in the game, and after all, playing Far Cry 2 involves quite a bit of driving around. Since this sequence was repeatable, I just captured results from three 90-second sessions.

Again, I didn’t want the graphics card to be our primary performance constraint, so although I tested at fairly high visual quality levels, I used a relatively low 1024×768 display resolution and DirectX 9.

The new Phenoms edge out their ostensible rivals from Intel by very small margins here. Again, fewer, faster cores prove to be the best choice for this game, and DDR2 continues to match DDR3 on the X4 810.

Incidentally, some of the scores for Core 2 processors here are higher than you may have seen in our other recent reviews. After running into some strange results, I wound up re-testing the Core 2 processors in Far Cry 2, and several of them came out faster. I’m not sure what the cause of the problem was, but I’m confident these scores are now correct. I’ll be going back to the older reviews and updating those scores, as well.

Unreal Tournament 3
As you saw on the preceding page, I did manage to find a couple of CPU-limited games to use in testing. I decided to try to concoct another interesting scenario by setting up a 24-player CTF game on UT3’s epic Facing Worlds map, in which I was the only human player. The rest? Bots controlled by the CPU. I racked up frags like mad while capturing five 60-second gameplay sessions for each processor.

Oh, and the screen resolution was set to 1280×1024 for testing, with UT3’s default quality options and “framerate smoothing” disabled.

There’s undoubtedly some variance built into these results, since I was playing a pretty darned random botmatch, but I’m not convinced the X3 720’s strong showing is a fluke. Instead, I think its faster L3 cache may be what gives it the advantage over, say, the Phenom II X4 940. Also, hey, our little botmatch idea has produced a real gaming scenario where having more than two cores seems to matter. The Core 2 Quad Q8200 beats out the Core 2 Duo E8400 here, for instance. Along those same lines, the X3 720 seems to have the ideal balance of core count and clock speed for this scenario.

The Phenom II X4 810, meanwhile, churns out frames faster than the Q8200, and it gets a nice little boost from DDR3, as well.

Half Life 2: Episode Two
Our next test is a good, old custom-recorded in-game timedemo, precisely repeatable.

All of these frame rates are ridiculously high, of course. If we consider relative performance, the Phenom II X3 720 just trails the E8400 by a hair, while the X4 810 easily surpasses the Q8200.

Source engine particle simulation
Next up is a test we picked up during a visit to Valve Software, the developers of the Half-Life games. They had been working to incorporate support for multi-core processors into their Source game engine, and they cooked up some benchmarks to demonstrate the benefits of multithreading.

This test runs a particle simulation inside of the Source engine. Most games today use particle systems to create effects like smoke, steam, and fire, but the realism and interactivity of those effects are limited by the available computing horsepower. Valve’s particle system distributes the load across multiple CPU cores.

The X3 720’s third core makes itself known again here, as the X3 just beats out the Core 2 Duo E8400. The quad-core processors are very evenly matched, the X4 810 in a virtual tie with the Q8200.

WorldBench
WorldBench’s overall score is a pretty decent indication of general-use performance for desktop computers. This benchmark uses scripting to step through a series of tasks in common Windows applications and then produces an overall score for comparison. WorldBench also records individual results for its component application tests, allowing us to compare performance in each. We’ll look at the overall score, and then we’ll show individual application results alongside the results from some of our own application tests.

The Socket AM3 processors prove to be a little disappointing in WorldBench compared to their rivals from Intel. The E8400 opens up a big lead over any AMD product, in fact.

Productivity and general use software

MS Office productivity

Firefox web browsing

Multitasking – Firefox and Windows Media Encoder

WinZip file compression

Nero CD authoring

Through the MS Office, Firefox, and multitasking tests, the Socket AM3 processors look to be very competitive. In fact, the Core 2 Quad 8200 has a much rougher time, finishing dead last in two of the three tests. However, the Phenoms suffer when we get to the WinZip and Nero tests, both of which tend to rely on disk controller performance to a degree.

Those are the breaks in these days of “platformization.” AMD’s entire lineup of south bridge chips for several years has had trouble with a key performance feature, Native Command Queuing for Serial ATA. Turning on NCQ can improve performance in these tests, but it comes at the cost of higher CPU utilization, which hurts performance in other tests—most notably, in WorldBench’s Photoshop test.

For this review, we’ve included results with AHCI (and thus NCQ and SATA hot-swapping) disabled, at AMD’s request, for all Phenom II processors. (The Athlon 64 and original Phenoms were tested with AHCI enabled.) When we disabled AHCI, we found that performance in Photoshop rose and performance in Nero and other tests dropped by offsetting amounts; the overall WorldBench score was unchanged.

Image processing

Photoshop

The Phenom IIs perform better here than the older Phenoms, as expected. Yet even the fastest Phenom II trails the slowest Intel processor, the Q8200, by over 30 seconds.

The Panorama Factory photo stitching
The Panorama Factory handles an increasingly popular image processing task: joining together multiple images to create a wide-aspect panorama. This task can require lots of memory and can be computationally intensive, so The Panorama Factory comes in a 64-bit version that’s widely multithreaded. I asked it to join four pictures, each eight megapixels, into a glorious panorama of the interior of Damage Labs. The program’s timer function captures the amount of time needed to perform each stage of the panorama creation process. I’ve also added up the total operation time to give us an overall measure of performance.

The Q8200 finishes just a little sooner than the X4 810 here. Despite the relatively strong performance of the Intel processors in this application, though, the Phenom X3 720’s additional core puts it ahead of the E8400.

Below is a look at the individual operations required to create a panorama, if you care to see that sort of detail.

picCOLOR image analysis
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA. Many of the individual functions that make up the test are multithreaded.

The 720’s third core isn’t sufficient to give it the advantage over the E8400 in this application, even though it is multithreaded. The Q8200 and X4 810 are in a familiar place, meanwhile: a dead heat.

Media encoding and editing

x264 HD benchmark
This benchmark tests performance with one of the most popular H.264 video encoders, the open-source x264. The results come in two parts, for the two passes the encoder makes through the video file. I’ve chosen to report them separately, since that’s typically how the results are reported in the public database of results for this benchmark. These scores come from the newer, faster version 0.59.819 of the x264 executable.

We’ll give the X4 810 the win over the Q8200 on the strength of its performance in pass one of the encoding process. The X3 720 proves faster than the E8400 in both passes, another triumph of its triple-core config.

Windows Media Encoder x64 Edition video encoding
Windows Media Encoder is one of the few popular video encoding tools that uses four threads to take advantage of quad-core systems, and it comes in a 64-bit version. Unfortunately, it doesn’t appear to use more than four threads, even on an eight-core system. For this test, I asked Windows Media Encoder to transcode a 153MB 1080-line widescreen video into a 720-line WMV using its built-in DVD/Hardware profile. Because the default “High definition quality audio” codec threw some errors in Windows Vista, I instead used the “Multichannel audio” codec. Both audio codecs have a variable bitrate peak of 192Kbps.

Windows Media Encoder doesn’t know what to do with a triple-core CPU, so it only spins off two threads, and the X3 720’s encode times suffer as a result. At least the X4 810 does well, finishing ahead of the Q8200.

Windows Media Encoder video encoding

Roxio VideoWave Movie Creator

Sadly, neither of WorldBench’s video manipulation benchmarks appear to use more than two threads, and thus cores, to any good effect. Put simply, I prefer our other video encoding tests, which are good examples of multithreaded real-world applications.

LAME MT audio encoding
LAME MT is a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. Of course, multithreading works even better on multi-core processors. You can download a paper (in Word format) describing the programming effort.

Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. That means this test won’t really use more than two CPU cores.

We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here.

This app favors fewer, faster cores, and as a result, some of our cheaper processors outperform their more expensive counterparts. At the same time, among the products we’re comparing today, the Intel CPUs finish encoding before their AMD rivals, regardless of the compiler used.

3D modeling and rendering

Cinebench rendering
Graphics is a classic example of a computing problem that’s easily parallelizable, so it’s no surprise that we can exploit a multi-core processor with a 3D rendering app. Cinebench is the first of those we’ll try, a benchmark based on Maxon’s Cinema 4D rendering engine. It’s multithreaded and comes with a 64-bit executable. This test runs with just a single thread and then with as many threads as CPU cores (or threads, in CPUs with multiple hardware threads per core) are available.

Chalk up another win for AMD’s triple-core wonder. As expected, it’s faster than the dual-core E8400 in this rendering app. The X4 810 is quicker than the Q8200, as well.

POV-Ray rendering
We’re using the latest beta version of POV-Ray 3.7 that includes native multithreading and 64-bit support. Some of the beta 64-bit executables have been quite a bit slower than the 3.6 release, but this should give us a decent look at comparative performance, regardless.

The chess scene shows us the sort of multicore performance scaling to which we’re accustomed with rendering applications, and the Socket AM3 processors perform well, as a result. The benchmark scene, on the other hand, involves a long calculation that’s not multithreaded, so the Core 2 Duo E8400 outruns the Phenom II X4 810.

3ds max modeling and rendering

Valve VRAD map compilation
This next test processes a map from Half-Life 2 using Valve’s VRAD lighting tool. Valve uses VRAD to pre-compute lighting that goes into games like Half-Life 2.

In our last two rendering tests, the Q8200 is just a few seconds faster than the X4 810, while the X3 720 rides its third core to wide victories over the E8400.

Folding@Home
Next, we have a slick little Folding@Home benchmark CD created by notfred, one of the members of Team TR, our excellent Folding team. For the unfamiliar, Folding@Home is a distributed computing project created by folks at Stanford University that investigates how proteins work in the human body, in an attempt to better understand diseases like Parkinson’s, Alzheimer’s, and cystic fibrosis. It’s a great way to use your PC’s spare CPU cycles to help advance medical research. I’d encourage you to visit our distributed computing forum and consider joining our team if you haven’t already joined one.

The Folding@Home project uses a number of highly optimized routines to process different types of work units from Stanford’s research projects. The Gromacs core, for instance, uses SSE on Intel processors, 3DNow! on AMD processors, and Altivec on PowerPCs. Overall, Folding@Home should be a great example of real-world scientific computing.

notfred’s Folding Benchmark CD tests the most common work unit types and estimates performance in terms of the points per day that a CPU could earn for a Folding team member. The CD itself is a bootable ISO. The CD boots into Linux, detects the system’s processors and Ethernet adapters, picks up an IP address, and downloads the latest versions of the Folding execution cores from Stanford. It then processes a sample work unit of each type.

On a system with two CPU cores, for instance, the CD spins off a Tinker WU on core 1 and an Amber WU on core 2. When either of those WUs are finished, the benchmark moves on to additional WU types, always keeping both cores occupied with some sort of calculation. Should the benchmark run out of new WUs to test, it simply processes another WU in order to prevent any of the cores from going idle as the others finish. Once all four of the WU types have been tested, the benchmark averages the points per day among them. That points-per-day average is then multiplied by the number of cores on the CPU in order to estimate the total number of points per day that CPU might achieve.

This may be a somewhat quirky method of estimating overall performance, but my sense is that it generally ought to work. We’ve discussed some potential reservations about how it works here, for those who are interested. I have included results for each of the individual WU types below, so you can see how the different CPUs perform on each.

Yeah, so: the Folding parity between the X4 810 and Q8200 couldn’t be much clearer. And, once more with feeling, the X3’s multiplicity trumps the E8400.

MyriMatch proteomics
Our benchmarks sometimes come from unexpected places, and such is the case with this one. David Tabb is a friend of mine from high school and a long-time TR reader. He has provided us with an intriguing new benchmark based on an application he’s developed for use in his research work. The application is called MyriMatch, and it’s intended for use in proteomics, or the large-scale study of protein. I’ll stop right here and let him explain what MyriMatch does:

In shotgun proteomics, researchers digest complex mixtures of proteins into peptides, separate them by liquid chromatography, and analyze them by tandem mass spectrometers. This creates data sets containing tens of thousands of spectra that can be identified to peptide sequences drawn from the known genomes for most lab organisms. The first software for this purpose was Sequest, created by John Yates and Jimmy Eng at the University of Washington. Recently, David Tabb and Matthew Chambers at Vanderbilt University developed MyriMatch, an algorithm that can exploit multiple cores and multiple computers for this matching. Source code and binaries of MyriMatch are publicly available.

In this test, 5555 tandem mass spectra from a Thermo LTQ mass spectrometer are identified to peptides generated from the 6714 proteins of S. cerevisiae (baker’s yeast). The data set was provided by Andy Link at Vanderbilt University. The FASTA protein sequence database was provided by the Saccharomyces Genome Database.

MyriMatch uses threading to accelerate the handling of protein sequences. The database (read into memory) is separated into a number of jobs, typically the number of threads multiplied by 10. If four threads are used in the above database, for example, each job consists of 168 protein sequences (1/40th of the database). When a thread finishes handling all proteins in the current job, it accepts another job from the queue. This technique is intended to minimize synchronization overhead between threads and minimize CPU idle time.

The most important news for us is that MyriMatch is a widely multithreaded real-world application that we can use with a relevant data set. MyriMatch also offers control over the number of threads used, so we’ve tested with one to eight threads.

I should mention that performance scaling in MyriMatch tends to be limited by several factors, including memory bandwidth, as David explains:

Inefficiencies in scaling occur from a variety of sources. First, each thread is comparing to a common collection of tandem mass spectra in memory. Although most peptides will be compared to different spectra within the collection, sometimes multiple threads attempt to compare to the same spectra simultaneously, necessitating a mutex mechanism for each spectrum. Second, the number of spectra in memory far exceeds the capacity of processor caches, and so the memory controller gets a fair workout during execution.

Here’s how the processors performed.

I had really hoped to see DDR3 make a big difference in this bandwidth-intensive application, but things just didn’t work out that way. Regardless, both of the Socket AM3 processors fare relatively well.

STARS Euler3d computational fluid dynamics
Charles O’Neill works in the Computational Aeroservoelasticity Laboratory at Oklahoma State University, and he contacted us to suggest we try the computational fluid dynamics (CFD) benchmark based on the STARS Euler3D structural analysis routines developed at CASELab. This benchmark has been available to the public for some time in single-threaded form, but Charles was kind enough to put together a multithreaded version of the benchmark for us with a larger data set. He has also put a web page online with a downloadable version of the multithreaded benchmark, a description, and some results here.

In this test, the application is basically doing analysis of airflow over an aircraft wing. I will step out of the way and let Charles explain the rest:

The benchmark testcase is the AGARD 445.6 aeroelastic test wing. The wing uses a NACA 65A004 airfoil section and has a panel aspect ratio of 1.65, taper ratio of 0.66, and a quarter-chord sweep angle of 45º. This AGARD wing was tested at the NASA Langley Research Center in the 16-foot Transonic Dynamics Tunnel and is a standard aeroelastic test case used for validation of unsteady, compressible CFD codes.

The CFD grid contains 1.23 million tetrahedral elements and 223 thousand nodes . . . . The benchmark executable advances the Mach 0.50 AGARD flow solution. A benchmark score is reported as a CFD cycle frequency in Hertz.

So the higher the score, the faster the computer. Charles tells me these CFD solvers are very floating-point intensive, but oftentimes limited primarily by memory bandwidth. He has modified the benchmark for us in order to enable control over the number of threads used. Here’s how our contenders handled the test with different thread counts.

The switch to DDR3 doesn’t contribute much to Phenom II performance here, either, but it is enough to lift the X4 810 past the Q8200.

Power consumption and efficiency
Our Extech 380803 power meter has the ability to log data, so we can capture power use over a span of time. The meter reads power use at the wall socket, so it incorporates power use from the entire system—the CPU, motherboard, memory, graphics solution, hard drives, and anything else plugged into the power supply unit. (We plugged the computer monitor into a separate outlet, though.) We measured how each of our test systems used power across a set time period, during which time we ran Cinebench’s multithreaded rendering test.

All of the systems had their power management features (such as SpeedStep and Cool’n’Quiet) enabled during these tests via Windows Vista’s “Balanced” power options profile.

Although we don’t usually include “simulated” CPU speed grades in our power results, I’ve made an exception for the Q8200 out of sheer curiosity.

Let’s slice up the data in various ways in order to better understand them. We’ll start with a look at idle power, taken from the trailing edge of our test period, after all CPUs have completed the render.

The Phenom II is a major, major improvement in idle power draw over the original Phenom. As with the past generation, though, the deactivation of one of the cores on the X3 product has no measurable benefit to power consumption at idle. However, I’d say the power draw of the DDR3 system is pretty decent, considering that our Asus Socket AM3 board is a high-end mobo.

Next, we can look at peak power draw by taking an average from the ten-second span from 15 to 25 seconds into our test period, during which the processors were rendering.

Looks like DDR3 saves a few watts in peak power draw, at least. Even so, the X4 810 consumes a little more power than Intel’s comparable quad-core processors. And the X3 720 draws just as much power as the X4 810. Looks like the 720’s higher clock speed and larger cache are making up the difference.

Another way to gauge power efficiency is to look at total energy use over our time span. This method takes into account power use both during the render and during the idle time. We can express the result in terms of watt-seconds, also known as joules.

We can quantify efficiency even better by considering specifically the amount of energy used to render the scene. Since the different systems completed the render at different speeds, we’ve isolated the render period for each system. We’ve then computed the amount of energy used by each system to render the scene. This method should account for both power use and, to some degree, performance, because shorter render times may lead to less energy consumption.

Boy, are these last two measures close. Our simulated Q8200 uses just a little less energy to render the scene than the X4 810. Intel may have a slight edge in power efficiency, but in this product category, the Phenom II is very, very similar.

Overclocking
Because it’s a Black Edition, overclocking the Phenom II X3 720 is just a matter of turning up the CPU multiplier in the BIOS. For my run at glory with the 720, I used the Asus Socket AM3 board, just for fun, along with a ridiculously huge Cooler Master heatsink/fan combo. I tested stability by booting into Windows and running a multithreaded Prime95 stress test. The log of my attempts is below; the process was fairly simple, and I started at 3.4GHz based on my prior experience with Phenom IIs.

-3.4GHz, 1.325V – BSOD on boot
-3.4GHz, 1.35V – Boots Windows, reboot in P95
-3.4GHz, 1.375V – BSOD in P95
-3.4GHz, 1.4V – Seems OK
-3.5GHz, 1.425V – Boots Windows, reboot in P95
-3.5GHz, 1.45V – Seems OK
-3.6GHz, 1.45V – Reboot in P95
-3.6GHz, 1.475V – BSOD in P95
-3.6GHz, 1.5V – BSOD in P95

With a relatively modest number of attempts, I’d determined that this X3 720 can run at 3.5GHz, at 1.45V, without much trouble. Not too shabby.

Overclocking the Phenom II X4 810 was a more complicated process since I had to turn up the base HyperTransport clock in order to raise the CPU speed. Making this work involved more voltage tweaks, reductions of the HyperTransport multiplier, adjustments to memory clocks, and several forms of psychotropic drugs. Eventually, I settled on the following stable configuration: a 3.458GHz core clock at 1.45V, with a 266MHz base clock, a 1596MHz HyperTransport link, and 1418MHz memory. I’d given additional juice to the north bridge and RAM, along with a slight increase in HyperTransport voltage, when all was said and done. Again, not bad, but getting there was a bit more work than with the X3 720, and I was less confident in the overall stability of the system at the end of the process.

What about performance?

At 3.5GHz, the X3 720 is really stinkin’ fast. The X4 810 is less impressive at 3.46GHz, probably held back here by its smaller L3 cache and slower HT/L3 “uncore” frequency.

Conclusions
Well, jeez, it’s hard not to like the Phenom II X3 720, which is just a bundle of gimpy goodness. Thanks to its higher clock speed and larger cache, the X3 720 quite frequently outperforms its bigger brother, the Phenom II X4 810, even though it costs less. And, at 2.8GHz, the 720 is fast enough to match up pretty well against the Core 2 Duo E8400 in many applications—including games—that tend to run best with fewer and faster cores. In more widely multithreaded apps where the 720’s third core kicks in, the Phenom II X3 almost always outruns the E8400, sometimes dramatically. Oddly enough, the 720’s combination of three cores and relatively high clock speeds may be the ideal trade-off for the current state of PC software. Who knew?

Add in the X3 720’s fairly tame power consumption, its apparently excellent overclocking proposition, and the fact that—regardless of memory type—the Phenom II has a superior system architecture to the Core 2, and the E8400 starts to look rather weak by comparison. The Phenom II X3 720 is our new favorite among mid-range PC processors. Look for it to secure a place in one of the builds in our upcoming system guide refresh.

The Phenom II X4 810 is also generally faster and more attractive overall than the Core 2 Quad Q8200, but I can’t say I like the value propsition with either of these processors all that well. Because of their reduced cache sizes and clock speeds, these value quad-cores rely almost entirely on multithreaded applications to achieve strong performance. When software doesn’t oblige (and it often doesn’t), they stumble, as illustrated by the Q8200’s poor showings in several of our benchmarks, including MS Office, Firefox, and the gaming tests. For the vast majority of users, the Phenom II X3 720 will be a better choice, and it costs less.

Oh, and we didn’t see much in the way of performance gains when moving the Phenom II X4 810 from DDR2 memory to DDR3 memory. That’s no great shock, all things considered, and no knock on AMD’s implementation of Socket AM3. I suspect we may see more benefits from DDR3 once we get our hands on a non-neutered Socket AM3 quad-core, like a Phenom II X4 940 or something even faster, especially if AMD builds in support for higher memory frequencies. Until then, Socket AM3 is a fine upgrade path waiting for a reason to exist.

Latest News

Joint International Police Operation Disrupts LabHost
News

Joint International Police Operation Disrupts LabHost – A Platform That Supported 2,000+ Cybercriminals

Apple Removes WhatsApp and Threads From App Store In China
News

Apple Removes WhatsApp and Threads from Its App Store in China

On Friday Apple announced that it’s removing WhatsApp and Threads from its App Store in China over security concerns from the government. Adding further, Apple said it’s only doing its...

XRP Falls to $0.3 Amid Massive Weekend Sell-off - Can $1 Be Achieved Post-Halving?
Crypto News

XRP Falls to $0.3 Amid Massive Weekend Sell-off – Can $1 Be Achieved Post-Halving?

The crypto market is sinking lower, moving away from its impressive Q1 peak of $2.86 trillion. Major altcoins like Ethereum have not been spared either, with investors facing losses from the...

Cardano Could Rally to $27 After Bitcoin Halving if Historical Performance
Crypto News

Cardano Could Rally to $27 After Bitcoin Halving Following a Historical Performance

Japanese Banking Firm Launches Passive Income Program for Shiba Inu
Crypto News

Japanese Banking Firm Launches Passive Income Program for Shiba Inu

Ripple CLO Clarifies Future Steps With the SEC While Quenching Settlement Rumors
Crypto News

Ripple CLO Clarifies Future Steps With the SEC While Quenching Settlement Rumors

Cisco Launches AI-Driven Security Solution 'Hypershield'
News

Cisco Launches AI-Driven Security Solution ‘Hypershield’