Today, our wish is fulfilled in the form of the Athlon 64 X2 3800+, a dual-core processor running at 2GHz with 512K of L2 cache per core. AMD has priced this baby at $354significantly less than any of its other dual-core products. It doesn’t take a Ph.D. in computer engineering to figure out that the X2 3800+ ought to offer a very potent combo of price and performance.
As is our custom, we compared the X2 3800+ against over a dozen single- and dual-core competitors to see just how it fits into the big picture. Then we overclocked the living daylights out of the thing, and everything went soft and fuzzy. Our heads are still spinning. Keep reading to see why.
Code name: Manchester
Our first exposure to the Athlon 64 X2 came in the form of the 4800+ model. That chip is code-named “Toledo,” and it packs 1MB of L2 cache per processor core, as do the dual-core Opterons. Toledo-core chips sport a transistor count of about 230 million, all crammed into a die size of 199 mm2.
AMD also makes several models of Athlon 64 X2 that have only 512K of L2 cache. In the past, CPUs with smaller caches have sometimes been based on the exact same chip as the ones with more cache, but they’d have half of the L2 cache disabled for one reason or another. That’s not the case with the X2 3800+. AMD says this “Manchester”-core part has about 154 million transistors and a die size of 147 mm2, so it’s clearly a different chip. AMD rates the max thermal power needed to cool the X2 3800+ at 89Wwell below the 110W rating of the 4800+and they’ve revised down the max thermal power of the X2 4200+ to 89W, as well. The Manchester core is obviously a smaller, cooler, and cheaper-to-manufacture chip than Toledo.
The Athlon 64 X2 3800+ Cosmetically, though, you’d never know it, because the X2 3800+ looks like pretty much any other Socket 939 processor. The X2 3800+ is intended to work with AMD’s existing Socket 939 infrastructure, and it may well be an upgrade option for current owners of Athlon 64 systems. You’ll want to check with your motherboard maker to see whether or not your board will support an X2 before making the leap, though. Some boards need only a BIOS update, but we’re finding out that some others just can’t handle X2 processors. Most newer motherboards should be fine.
Before we dive into the benchmark numbers, let’s have a quick look at where the X2 3800+ fits into the bigger picture. With its introduction, the Athlon 64 X2 family now looks like so:
CPU | Clock speed | L2 cache size | Price |
Athlon 64 X2 3800+ | 2.0GHz | 512KB | $354 |
Athlon 64 X2 4200+ | 2.2GHz | 512KB | $482 |
Athlon 64 X2 4400+ | 2.2GHz | 1024KB | $537 |
Athlon 64 X2 4600+ | 2.4GHz | 512KB | $704 |
Athlon 64 X2 4800+ | 2.4GHz | 1024KB | $902 |
At $354, the X2 3800+ isn’t exactly cheap, but it does extend the X2 line into more affordable territory. You’ve probably noticed the apparent hole in the X2 models at 4000+. Logic would dictate that the X2 4000+ would run at 2GHz and have 1MB of L2 cache. So where is it? I asked AMD this very question, and they told me that they won’t comment on unannounced productsand besides there aren’t any plans for an X2 4000+ right now.
I’m not too broken up about that, because I’m not convinced the additional L2 cache is worth paying more money to get. We’ll address that issue in more detail when we look at the benchmark results.
Now for something really confusing. How does the Athlon 64 X2 3800+ stack up against the competition? Figuring out such things has become horribly puzzling as Model Number Mania has taken hold of the CPU market. Here’s my attempt at lining up the various AMD and Intel CPU models according to rough price parity:
CPU | Price | CPU | Price | CPU | Price | CPU | Price | CPU | Price |
Pentium 4 541 | $218 | Pentium 4 630 | $224 | Athlon 64 3200+ | $194 | ||||
Pentium 4 551 | $278 | Pentium 4 640 | $237 | Pentium D 820 | $241 | Athlon 64 3500+ | $223 | ||
Pentium D 830 | $316 | Athlon 64 3800+ | $329 | Athlon 64 X2 3800+ | $354 | ||||
Pentium 4 561 | $417 | Pentium 4 650 | $401 | Athlon 64 4000+ | $375 | ||||
Athlon 64 X2 4200+ | $482 | ||||||||
Pentium D 840 | $530 | Athlon 64 X2 4400+ | $537 | ||||||
Pentium 4 571 | $637 | Pentium 4 660 | $605 | Athlon 64 X2 4600+ | $704 | ||||
Pentium 4 670 | $851 | Athlon 64 FX-55 | $827 | ||||||
Pentium 4 XE 3.73GHz | $999 | Pentium XE 840 | $999 | Athlon 64 FX-57 | $1031 | Athlon 64 X2 4800+ | $902 |
From this handy table, we learn that the X2 3800+’s dual-core competition from Intel is probably the Pentium D 830. You can have a single-core Pentium 4 551 for about 75 bucks less than the price of the X2 3800+, or you could pick up AMD’s single-core Athlon 64 3800+ in the same basic price range as the “equivalent” X2. The Athlon 64 3800+ runs at 2.4GHz and has a 512K L2 cache, so you lose 400MHz and pick up a whole second CPU core by going for the X2 3800+ instead. I’d say that’s an easy tradeoff to make, but the benchmark results will tell us more about the shape of that choice.
Why didn’t you…?
I wish I could have included results here for a number of interesting CPU models, including the X2 3800+’s most direct competitor, the Pentium D 830. The reason I didn’t include them is simple: lousy multiplier control. I couldn’t get the motherboards in my test rigs to clock down some of these CPUs to lower speeds in order to simulate lower-speed-grade processors. That’s why you won’t see results here for the Pentium D 830, and that’s mostly why there are only three of the five Athlon X2 models represented. Sorry about that. We will try again next time around with different motherboards.
Our testing methods
As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and the results were averaged.
Our test systems were configured like so:
Processor | Pentium D 820 2.8GHz | Pentium 4 660 3.6GHz Pentium D 840 3.2GHz Pentium Extreme Edition 840 3.2GHz |
Pentium 4 Extreme Edition 3.73GHz | Athlon 64 3500+ 2.2GHz (Venice) Athlon 64 3800+ 2.4GHz (Venice) Athlon 64 4000+ 2.4GHz (130nm) Athlon 64 FX-55 2.6GHz (130nm) Athlon 64 FX-57 2.8GHz Athlon 64 X2 3800+ 2.0GHz Athlon 64 X2 4200+ 2.2GHz Athlon 64 X2 4800+ 2.4GHz |
Pentium 4 670 3.8GHz | ||||
System bus | 800MHz (200MHz quad-pumped) | 800MHz (200MHz quad-pumped) | 1066MHz (266MHz quad-pumped) | 1GHz HyperTransport |
Motherboard | Intel D945GTP | Intel D955XBK | Intel D955XBK | Asus A8N-SLI Deluxe |
BIOS revision | NT94510J.86A.0897 | BK95510J.86A.1152 | BK95510J.86A.1234 | MCT2/dualcore |
BK95510J.86A.1452 | ||||
North bridge | 945G MCH | 955X MCH | 955X MCH | nForce4 SLI |
South bridge | ICH7R | ICH7R | ICH7R | |
Chipset drivers | INF Update 7.0.0.1019 | INF Update 7.0.0.1019 | INF Update 7.0.0.1019 | SMBus driver 4.45 IDE driver 4.75 |
Memory size | 1GB (2 DIMMs) | 1GB (2 DIMMs) | 1GB (2 DIMMs) | 1GB (2 DIMMs) |
Memory type | Corsiar XMS2 5400UL DDR2 SDRAM at 533MHz | Corsiar XMS2 5400UL DDR2 SDRAM at 533MHz | Corsiar XMS2 5400UL DDR2 SDRAM at 667MHz | Corsair XMS Pro 3200XL DDR SDRAM at 400MHz |
CAS latency (CL) | 3 | 3 | 4 | 2 |
RAS to CAS delay (tRCD) | 2 | 2 | 2 | 2 |
RAS precharge (tRP) | 2 | 2 | 2 | 2 |
Cycle time (tRAS) | 8 | 8 | 8 | 5 |
Hard drive | Maxtor DiamondMax 10 250GB SATA 150 | |||
Audio | Integrated ICH7R/STAC9221D5 with SigmaTel 5.10.4456.0 drivers |
Integrated ICH7R/STAC9221D5 with SigmaTel 5.10.4456.0 drivers |
Integrated ICH7R/STAC9221D5 with SigmaTel 5.10.4456.0 drivers |
Integrated nForce4/ALC850 with Realtek 5.10.0.5820 drivers |
Graphics | GeForce 6800 Ultra 256MB PCI-E with ForceWare 71.84 drivers | |||
OS | Windows XP Professional x64 Edition | |||
OS updates | – |
All tests on the Pentium systems were run with Hyper-Threading enabled, except where otherwise noted.
We have included results for the Pentium D 840 in the following pages. We obtained these results by disabling Hyper-Threading on our Extreme Edition 840. Since the Pentium D 840 is just an Extreme Edition 840 sans HT, the numbers should be valid. Similarly, the Athlon 64 3500+ scores you’ll see in the following pages were obtained by underclocking an Athlon 64 3800+ (with the new “Venice” core) to 2.2GHz. The performance should be identical to a “real” 3500+.
Thanks to Corsair for providing us with memory for our testing. Their products and support are both far and away superior to generic, no-name memory.
Also, all of our test systems were powered by OCZ PowerStream power supply units. The PowerStream was one of our Editor’s Choice winners in our latest PSU round-up.
The test systems’ Windows desktops were set at 1152×864 in 32-bit color at an 85Hz screen refresh rate. Vertical refresh sync (vsync) was disabled for all tests.
We used the following versions of our test applications:
- SiSoft Sandra 2005 SR1 10.50 64-bit
- ScienceMark 2.0 64-bit
- Compiled binary of C Linpack port from Ace’s Hardware
- POV-Ray for Windows 3.6 64-bit
- SMPOV 4.3
- Cinebench 2003
- LAME MT 3.97a 64-bit
- Xmpeg 5.0.3 with DivX Video 5.21
- Windows Media Encoder 9
- Sphinx 3.3
- picCOLOR v4.0 build 545 64-bit
- DOOM 3 1.1 with trdelta1 demo
- Far Cry 1.3 with tr3-pier demo
- Unreal Tournament 2004 v3355 with trdemo1
- 3DMark05 v120
The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.
Memory performance
Up first are some simple memory performance tests. These results won’t tell us about real-world performance, but they do have an impact on that.
The X2 3800+ is the only Athlon 64 among the bunch, dual-core or single, that runs at 2GHz. As a result, its bandwidth scores are a little bit lower than the rest, but they’re still quite good. With dual channels of DDR400 memory and a built-in memory controller, the X2 3800+ has a very fast memory subsystem.
Linpack shows us, among other things, the basic performance of the cache hierarchies on these CPUs. A second CPU core is no help in this single-threaded test, and the X2 3800+ is again the lowest-clocked Athlon 64 in the group. You can see, also, how its performance drops off once Linpack starts crunching on matrix sizes above about 576K. That’s where we hit the limits of either core’s 64K L1 data cache combined with its 512K L2 cache. The Athlon 64 processors with 1MB of L2 cache perform better with larger data matrices, as does the Pentium D, which packs a 1MB L2 cache per core.
When it comes time to grab data from main memory, the X2 3800+ is very quick. That’s the big advantage of its integrated memory controller.
Gaming performance
Up next are some gaming tests. Notice that we’ve included above each result a little graph generated by the Windows Task Manager as the benchmark ran on a dual Opteron 275 system (with four total CPU cores.) This should give you some indication of the amount of threading in the application. In some cases with single-threaded apps like the games below, the task will oscillate back and forth between one CPU and the next, but total utilization generally won’t go above 50% for a dual-core or 25% for a quad-core (or quad-front-end, in the case of the XE 840 with Hyper-Threading) system.
Doom 3
We tested performance by playing back a custom-recorded demo that should be fairly representative of most of the single-player gameplay in Doom 3.
Far Cry
Our Far Cry demo takes place on the Pier level, in one of those massive, open outdoor areas so common in this game. Vegetation is dense, and view distances can be very long.
Unreal Tournament 2004
Our UT2004 demo shows yours truly putting the smack down on some bots in an Onslaught game.
The gaming performance of the X2 3800+ isn’t stellar compared to higher-clocked Athlon 64 models, but it’s still better than any Pentium D or Pentium 4 in the bunch, even the Extreme Edition 3.73GHz. The X2 3800+ surprisingly manages to best the Athlon 64 3500+ in a couple of tests, even though the X2 3800+ runs at a lower clock speed.
3DMark05
3DMark’s main test is obviously graphics-bound, since the CPU doesn’t seem to matter much to the overall score. The CPU test, however, uses multiple software threads to handle vertex processing, and the dual-core processors get to strut their stuff. The X2 3800+ finishes just behind the Pentium Extreme Edition 840 and just ahead of AMD’s single-core monster, the Athlon 64 FX-57.
POV-Ray rendering
POV-Ray just recently made the move to 64-bit binaries, and thanks to the nifty SMPOV distributed rendering utility, we’ve been able to make it multithreaded, as well. SMPOV spins off any number of instances of the POV-Ray renderer, and it will bisect the scene in several different ways. For this scene, the best choice was to divide the screen up horizontally between the different threads, which provides a fairly even workload.
The X2 3800+ rips through this POV-Ray scene faster than a Pentium Extreme Edition 840, and it trounces any single-core would-be competition. When tasks are easily parallelizable like rendering, dual-core processors reign supreme.
Cinema 4D rendering
Cinema 4D’s rendering engine does a very nice job of distributing the load across multiple processors, as the Task Manager graph shows.
Here again the X2 3800+ puts in a strong showing. It’s a little quicker than the Pentium D 840, a processor that costs quite a bit more than the X2 3800+. Once more, the single-core CPUs are left in the dust.
The tables turn in Cinebench’s single-threaded shading tests. The X2 3800+ runs near the bottom of the pack here.
LAME audio encoding
LAME MT is, as you might have guessed, a multithreaded version of the LAME MP3 encoder. LAME MT was created as a demonstration of the benefits of multithreading specifically on a Hyper-Threaded CPU like the Pentium 4. You can even download a paper (in Word format) describing the programming effort.
Rather than run multiple parallel threads, LAME MT runs the MP3 encoder’s psycho-acoustic analysis function on a separate thread from the rest of the encoder using simple linear pipelining. That is, the psycho-acoustic analysis happens one frame ahead of everything else, and its results are buffered for later use by the second thread. The author notes, “In general, this approach is highly recommended, for it is exponentially harder to debug a parallel application than a linear one.”
We have results for two different 64-bit versions of LAME MT from different compilers, one from Microsoft and one from Intel, doing two different types of encoding, variable bit rate and constant bit rate. We are encoding a massive 10-minute, 6-second 101MB WAV file here, as we have done in our previous CPU reviews.
The mighty Athlon 64 FX-57 struggles to keep pace with the X2 3800+ in the multithreaded MP3 encoding tests, falling behind in three of the four instances. The Pentium D does relatively well here, though, with the 840 topping the X2 3800+ most of the time.
Xmpeg/DivX video encoding
We used the Xmpeg/DivX combo to convert a DVD .VOB file of a movie trailer into DivX format. Like LAME MT, this application is only dual threaded.
Windows Media Encoder video encoding
We asked Windows Media Encoder to convert a gorgeous 1080-line WMV HD video clip into a 640×460 streaming format using the Windows Media Video 8 Advanced Profile codec.
Despite its relatively low clock speed, the X2 3800+ makes a very decent media encoding processor.
ScienceMark
We’re using the 64-bit beta version of ScienceMark for these tests, and several of its components are multithreaded. ScienceMark author Alexander Goodrich says this about the Molecular Dynamics simulation:
Molecular Dynamics is lightly multithreaded – one thread takes care of U/I aspects, and the other thread takes care of the computation. The computation itself is not multithreaded, though Tim and I were looking into ways of changing the algorithm to support multi-threading programming a couple years ago – it’s a lot of effort, unfortunately. When MD [is] running there [is] a total of 2 threads for the process.
Here are the results:
The Primordia test “calculates the Quantum Mechanical Hartree-Fock Orbitals for each electron in any element of the periodic table.” Alex says this about it:
Primordia is multithreaded. Two main tasks occur which allow this to happen. Essentially, we identified 2 parallel tasks that could be done. We could probably take this a step further and optimize it even more. There is an issue, however, with the Pentium Extreme Edition that we’ve identified. The second computation thread gets executed on the logical HT thread rather than the 2nd core, so performance isn’t as good as it could be. This will be fixed in the next revision. This doesn’t effect [sic] the regular Pentium D. A workaround could include disabling HT on Pentium EE. There are 3 threads for primordia – 2 threads for computation, 1 thread for U/I.
Yet again, the X2 3800+ is running closely with the Athlon 64 FX-57, oddly enough. The X2 processors congregate at the top of the pack in the molecular dynamics simulation, while the X2 3800+ falls to the middle of the bunch in Primordia.
SiSoft Sandra
Next up is SiSoft’s Sandra system diagnosis program, which includes a number of different benchmarks. The one of interest to us is the “multimedia” benchmark, intended to show off the benefits of “multimedia” extensions like MMX and SSE/2. According to SiSoft’s FAQ, the benchmark actually does a fractal computation:
This benchmark generates a picture (640×480) of the well-known Mandelbrot fractal, using 255 iterations for each data pixel, in 32 colours. It is a real-life benchmark rather than a synthetic benchmark, designed to show the improvements MMX/Enhanced, 3DNow!/Enhanced, SSE(2) bring to such an algorithm.
The benchmark is multi-threaded for up to 64 CPUs maximum on SMP systems. This works by interlacing, i.e. each thread computes the next column not being worked on by other threads. Sandra creates as many threads as there are CPUs in the system and assignes [sic] each thread to a different CPU.
We’re using the 64-bit port of Sandra. The “Integer x16” version of this test uses integer numbers to simulate floating-point math. The floating-point version of the benchmark takes advantage of SSE2 to process up to eight Mandelbrot iterations at once.
The Pentiums rock and roll in this test, thanks to their prowess with vector math. If you are doing vector math, though, it’s nice to have a second core to help out. The X2 3800+ beats any single-core Athlon 64 by a wide margin.
Sphinx speech recognition
Ricky Houghton first brought us the Sphinx benchmark through his association with speech recognition efforts at Carnegie Mellon University. Sphinx is a high-quality speech recognition routine. We use two different versions, built with two different compilers, in an attempt to ensure we’re getting the best possible performance. However, the versions of Sphinx we’re using are only single-threaded.
You will compromise some single-threaded performance by going with the X2 3800+, as these results illustrate. The X2 3800+’s relatively low clock speed catches up with it here.
picCOLOR
picCOLOR was created by Dr. Reinert H. G. Müller of the FIBUS Institute. This isn’t Photoshop; picCOLOR’s image analysis capabilities can be used for scientific applications like particle flow analysis. Dr. Müller has supplied us with new revisions of his program for some time now, all the while optimizing picCOLOR for new advances in CPU technology, including MMX, SSE2, and Hyper-Threading. Naturally, he’s ported picCOLOR to 64 bits, so we can test performance with the x86-64 ISA.
At our request, Dr. Müller, the program’s author, added larger image sizes to this latest build of picCOLOR. We were concerned that the thread creation overhead on the tests rather small default image size would overshadow the benefits of threading. Dr. Müller has also made picCOLOR multithreading more extensive. Eight of the 12 functions in the test are now multithreaded.
Scores in picCOLOR, by the way, are indexed against a single-processor Pentium III 1GHz system, so that a score of 4.14 works out to 4.14 times the performance of the reference machine.
Another strong finish for the X2 3800+, ahead of some processors that cost nearly three times as much.
Power consumption
We measured the power consumption of our entire test systems, except for the monitor, at the wall outlet using a Watts Up PRO watt meter. The test rigs were all equipped with OCZ PowerStream 520W power supply units. The idle results were measured at the Windows desktop, and we used SMPOV and the 64-bit version of the POV-Ray renderer to load up the CPUs. In all cases, we asked SMPOV to use the same number of threads as there were CPU front ends in Task Managerso four for the Pentium XE 840, two for the Athlon 64 X2, and so on.
The graphs below have results for “power management” and “no power management.” That deserves some explanation. By “power management,” we mean SpeedStep or Cool’n’Quiet. In the case of the Pentium 4 600-series processors and the Pentium D 840 and Pentium XE 840 CPUs, the C1E halt state is always active, even in the “no power management” tests. The Pentium D 820 and P4 Extreme Edition 3.73GHz don’t support the C1E halt state or SpeedStep.
The beta BIOS for our Asus A8N-SLI Deluxe mobo wouldn’t support Cool’n’Quiet on the X2 processors. I was able to update to Asus’ 1011 BIOS rev and get Cool’n’Quiet support for the FX-57, and using BIOS version 1013-002 allowed me to enable Cool’n’Quiet on the X2 models 3800+ and 4800+. Oddly, I couldn’t get Cool’n’Quiet working on the X2 4800+ with any of these BIOS revisions.
I’ve seen ’em before when we’ve reviewed other X2 processors, but these results continue to astonish. The system based on the X2 3800+ draws less power at idle and under load than anything here but the single-core A64 3800+. Under load, the Pentium D 840-based rig draws 292W at the wall socket, while the X2 3800+ system draws 166W. And the X2 3800+ outperforms the Pentium D 840 more often than not. The performance-per-watt picture on the X2 3800+ is impressive indeed.
Overclocking
With very little effort and even less drama, I was able to get the X2 3800+ running stable at 2.4GHz by setting the HyperTransport clock to 240MHz. The Asus A8N-SLI Deluxe mobo on our test system was giving the X2 3800+ about 1.31V by default. I turned that up to 1.3375V, backed the HyperTransport multiplier down to 3X, and the X2 3800+ seemed quite happy.
Now, that’s a sweet overclock all by itself, but hitting 2.4GHz has the added benefit of bringing everything into line. When the memory clock is set to the proper divider for DDR333 operation and the HyperTransport clock is raised to 240MHz, the memory actually runs at 400MHz even. Lock down the PCI and PCI-E bus speeds using the motherboard’s BIOS, and you’re running virtually everything but the CPU and HyperTransport link at stock speeds. I was able to leave the RAM timings at 2-2-2-5, nice and tight. This is the sort of overclock I could live with for everyday use.
With a little more coaxing, I managed to get the X2 3800+ running at 2.5GHz long enough to record benchmark scores, but I had to back off of the memory timings a little bit in order to do it. Here’s how it performed.
The extra clock speed headroom translates into quite a bit more performance, as one might expect. The smaller cache doesn’t hold it back much, either; the X2 3800+ challenges the X2 4800+ pretty well when they’re both at 2.4GHz.
Well, we asked for a cheaper Athlon 64 X2, and AMD delivered. As expected, the Athlon 64 X2 3800+ performs quite well in our test suite, which is heavy on multithreaded applications and 64-bit binariesthe types of programs that an X2 purchased today should spend much of its life running. In fact, in multithreaded applications, the X2 beats out AMD’s single-core flagship, the Athlon 64 FX-57, more often than not. There is a tradeoff involved in the X2 3800+, because its 2GHz clock speed is relatively low, and as a result, its performance in single-threaded applications is decent, but not stellar. Still, the X2 3800+ plays today’s single-threaded games better than any form of Pentium 4 or D.
The Pentium D 820 is still a good value at $241, but I suspect most enthusiasts will think the extra hundred bucks or so is worth it to step up to the X2 3800+. AMD’s cheapest dual-core processor generally outruns the Pentium D 840, and in some cases, the Pentium Extreme Edition 840, as well. I’d still like to see AMD compete at the $250 range with a dual-core offering, but I suppose that will come with time. The X2 3800+ is a step in that direction.
In fact, now that the entry point for dual-core Athlon 64 processors has dropped to $354, I am almost ready to stop recommending single-core processors for anything but budget PCs. Unless you absolutely cannot afford it, I’d suggest picking a dual-core CPU for your next system. Even for gamers, there’s little point in passing on a second CPU core just to get a somewhat higher clock speed, in my view. The X2 3800+ is more than passable for today’s games, and multithreaded game engines and graphics drivers are already on the horizon. For anything but games, having a second CPU around, even if it’s just to handle antivirus and antispyware chores, makes perfect sense.
Now, if you’ll excuse me, I’m going to step out of the way. AMD says these chips should be available for purchase right now. If most X2 3800+ chips overclock like our review sample did, then PC enthusiasts are going to stampede toward this thing en masse.