The Silverthorne pipeline
To better understand this new CPU, we met with a trio of project managers from the Silverthorne group, including Jonathan Tyler, Gian Gerosa, and Haytham Samarchi. Tyler gave a detailed presentation on the guts of the processor, with Gerosa and Smarchi interjecting comments along the way. One of the most important things we heard from them was a characterization of the team's overall approach to the Silverthorne design, which was focused on fitting into the chip's targeted power budget and then adding as much performance as possible. Echoing sentiments we've heard expressed by Glenn Henry at Centaur, the trio said one of their biggest challenges was getting the engineers involved to adapt their mentality to this project's goals.
The Silverthorne team imposed discipline on this front by beginning with a simple single-issue, in-order CPU pipeline. They then added new features bit by bit, iteratively, until their performance and power efficiency goals were met. Potential features were vetted in quantifiable ways for efficiency, and those that failed to make the cut were not included.
The result of this process was a very distinctive new design. Most modern CPUs use out-of-order execution to achieve best performance, but Silverthorne's designers didn't like the tradeoff involved. As a rule, Tyler said, they avoided aggressive control and data speculation for efficiency's sake.
They found another efficiency win in optimizing this in-order pipeline to handle x86 instructions atomically. Virtually all of today's x86-compatible processors decode x86's CISC-style instructions into their own internal instructions, but the Silverthorne team tailored its pipeline to handle those translated instructions as single, atomic units. As you may know, Intel calls x86 instructions macro-ops and internal CPU core instructions micro-ops. Like other recent Intel chips, Silverthorne has the ability to fuse multiple macro-ops into a single micro-op. On Silverthorne, basic x86 instructions with memory operands translate as a single micro-op, which brings higher efficiencies for both decoding and scheduling.

Silverthorne instruction translation data. Source: Intel.
Tyler presented the data above to illustrate how Silverthorne handles some typical workloads. Complex x86 instructions like cosine are still micro-coded, but otherwise, an average of 96% of macro-ops execute as directly translated (1:1) or fused single micro-ops. This behavior obviously gives the processor a higher IPC, and Tyler claimed issuing "big chunks" like this increases power efficiency, as well.
Once they had this foundation, Silverthorne's designers were able to extract considerably more performance by adding another new feature: simultaneous multithreading, better known by Intel's marketing name, Hyper-Threading. Silverthorne is both dual-issue and dual-threaded; the chip can issue two instructions per clock and manage the execution of two separate threads interleaved together. Tyler characterized the threading as very fine-grained throughout the pipeline, with dual instruction queues and cycle-by-cycle scheduling based on availability. Threads are controlled in hardware and treated as equals. The pipeline itself is non-blocking, and the two threads can be completely intermixed within it.
The team found that this form of thread-level parallelism worked well in conjunction with their in-order pipeline for improving performance in a power-efficient manner. Tyler estimated that Silverthorne's SMT contributes a 36% to 47% increase in performance at the expense of a 17% to 19% increase in power consumptiona clear win.
Along with its focus on power efficiency, Silverthorne was intended to be a thoroughly modern CPU. That's another reason Intel chose not to base this processor on an older design. Moving an older design to this fabrication process and adding all of the latest features, they claimed, would have been more work than producing this new architecture. Silverthorne does have almost all of the latest bells, whistles, and ISA extensions Intel has introduced over the years. As I've mentioned, it supports SSE3 and the newer Supplemental SSE3 instructions added with the original Core 2 Duo, though it lacks SSE4 support. It also has extensions for virtualization (VT) and is compatible with AMD's x86-64 extensions for 64-bit addressing. These ISA extensions should grant Silverthorne better IPC and higher efficiency, in addition to new capabilities. The architecture is even multi-core capable, a fact that may prove handy when Diamondville ships.
Tyler said one of the key considerations in selecting clock frequency targets for Silverthorne was keeping the number of gates per clock cycle relatively small. Accordingly, they chose a 16-stage main pipeline, breaking the pipe into many relatively simple stages. This decision allowed the use of low-power circuits and power-efficient algorithms at each step along the way. Also, surprisingly enough, it enabled Intel's designers to use extensive automation in the design process. Inside of Silverthorne's execution core, the ROMs and the thermal sensor are the only fully custom blocks. The rest are cell-based designs, with half of those synthesized and the rest made up of structured data paths. (The core is only part of the story, of course. Only about 30% of the chip's transistors are there.)

Silverthorne's pipeline stages. Source: Intel.
Here's a look at the various stages of Silverthorne's pipeline. Tyler said this pipeline is capable of very high frequenciesup to 2.5GHz with typical silicon at 1.2V. In fact, he offered this look at clock frequency scaling versus core voltage.

Silverthorne voltage and frequency scaling. Source: Intel.
Intel has chosen to stay at the lower end of this curve for Menlow-based products, keeping under 2GHz and likely under 1.0V. Given these numbers, I wouldn't be surprised to see higher clock frequencies from Silverthorne in Diamondville systems and in other low-cost applications that aren't as power sensitive.

Atom web page rendering performance versus ARM 11. Source: Intel.
We don't yet have a Silverthorne-based device we can benchmark ourselves, so we'll have to rely on Intel's numbers for the time being. This web page rendering benchmark offers our first sense of how the chip might perform compared to its competition. Tyler flashed a number of other scores in front of us during his presentation, but most of them were unofficial, preliminary, and had various caveats attached. The one comparison worth mentioning was a quick one they'd done pitting a 2W Atom processor (presumably at 1.6GHz) against a 3W "Dothan" Pentium M ULV at 800MHz. In an array of tests, the Atom's performance ranged between 1X and 1.3X that of the 800MHz Dothan. That should give you a sense of this CPU's performance, and I think it's pretty impressive when you think about it.

