Nehalem now: 10 reasons to upgrade to the new Intel microarchitecture

Intel's Nehalem processor architecture offers some impressive new technologies, including enhanced Hyper Threading, TurboBoost, and Quick Path Interconnect. See why Deb Shinder says a Nehalem-based system will let you do more and do it faster -- and at the lowest operating cost ever.

Intel's Nehalem processor architecture offers some impressive new technologies, including enhanced Hyper Threading, TurboBoost, and Quick Path Interconnect. See why Deb Shinder says a Nehalem-based system will let you do more and do it faster -- and at the lowest operating cost ever.

The new Intel microarchitecture, code named Nehalem after an American Indian tribe in Oregon, is more than just another "tock" in Intel's Tick-Tock model for delivery of new processors. Sure, Nehalem follows the "tock" strategy of maximizing power and speed on the smaller transistors introduced in the "tick" phase (in this case, the 45 nm die introduced with the Penryn chip). But Nehalem does this by making some big changes to its processor architecture.

Whether you're a power user building a new desktop machine for high-end applications, such as graphics design and video editing, you're upgrading a server to run new resource-hungry services, such as databases and terminal services, or you're planning to consolidate servers on a single physical computer using virtualization, the new microarchitecture can make it easier for you to do more and do it faster, at a lower operating cost than ever before. Let's look at some specific reasons you should consider upgrading to a new Nehalem-based system.

Note: This article is also available as a PDF download.

1: 7 on 7 -- The perfect marriage for desktop computing

Windows 7 is Microsoft's next client operating system and even though, at this writing, it's only at the Release Candidate stage, it's received excellent reviews and looks to be on track for final release before the end of 2009. One of the oft-praised characteristics of Windows 7 is its ability to run on lowered powered machines than Vista, but the new OS really shines on Intel's Core i7 Nehalem-based processors.

Gamers and others who run demanding applications will appreciate the top-of-the-line 3.2 GHz model 965, with the unlocked clock multiplier  -- although they might find the $1,000 price tag a little steep. But you can get into the Core i7 world for a lot less money; the low-end model 920 costs less than $300 and still provides speedy performance at 2.66 GHz. And for those who prefer to take the middle road, the 2.93 GHz model 940 comes in at just over $560.

A recent test in World in Conflict showed the Core i7 reaching 250fps in comparison to the AMD Phenom's 136fps, while Intel's Core 2 Extreme came in at 220fps.

For more details, see Core i7 Versus the World.

2: Nehalem Xeon -- Power to spare for heavy duty workloads

For servers and high-end workstations, the Xeon brand has been the gold standard for many generations, and the new Nehalem-based Xeons are ready to take on the heaviest workloads. Bloomfield is the codename for the 3500 series of Xeon processors. These are dual- and quad-core single-socket processors that enjoy the same new features as the Core i7 (and which we'll discuss in detail below), while adding support for ECC memory. They range from 2.40 to 3.2 GHz clock speeds.

The 5500 series of Xeons, codenamed Gainestown, consists of dual-socket quad-core processors ranging from 1.86 to 3.2 GHz and is intended for general purpose high volume servers, HPC systems, and workstations. You can learn more about the Xeon 5500s here.

A recent demo of the Xeon 5500 showed it to be double the speed of the Xeon 5400 (pre-Nehalem) running database queries, and 5500 series servers are reported to have set at least 30 world performance records on a wide range of benchmarks.

3: Enhanced Hyper Threading -- More bang for the buck

Hyper-Threading Technology (HTT) is what Intel calls simultaneous multi-threading, which causes the operating system to interact with a single processor (or processor core) as if you had dual processors (so long as the OS supports symmetric multi-processing). The processing load is shared between the two logical processors. Thus a quad-core Nehalem processor can simultaneously process eight threads.

Simultaneous multi-threading was first studied by IBM in the late 1960s to increase the efficiency and throughput of the processor and was used by some versions of the Pentium 4. The amount of performance increase achieved by SMT depends on the application. To see the benefits of SMT, programmers need to write software with instructions that can be divided into multiple threads. SMT also allows you to run more applications simultaneously without slowing down the system.

4: TurboBoost -- Performance where and when you need it

The Nehalem architecture features a technology called TurboBoost, which works similarly to overclocking -- but only when you need it. The processor can detect when it's running below its capacity and if it's also below the limits on temperature and power usage, it can increase its clock frequency so that it works faster to handle an increased workload. When the workload decreases, the processor slows back down to its normal frequency. Unlike with manual overclocking, you don't have to worry that the processor will overheat.

5: QPI -- No more waiting for the bus

One of the big changes that comes with the Nehalem architecture is the replacement of the shared Front Side Bus (FSB) with a new way of communicating between the processor(s) and Input/Output hubs on the system motherboard, called Quick Path Interconnect (QPI). This is similar to AMD's Hypertransport technology but is faster. It allows for up to 6.4 gigtransfers per second (GT/s) per direction. The Core i7 and Xeon processors can provide 25.6 GB/s per link, which is twice the raw bandwidth of the 1,600 MHz FSB.

The QPI architecture also includes integrated memory controllers for each core, so that instead of the memory and I/O requests sharing a bus, the QPI and the memory bus are separate. QPI provides separate channels for writing and reading, so these tasks can be performed in parallel.

For more in-depth information about how QPI works, see An Introduction to the Intel QuickPath Interconnect.

6: DDR3 -- Thanks for the (faster) memory

Not only is communication with the memory improved through the integrated memory controller, but Nehalem architecture also supports faster (albeit more expensive) RAM: DDR3, which can transfer data at twice the rate of DDR2. DDR3 also has a prefetch buffer that's twice the size of DDR2 RAM. The prefetch buffer is the memory cache on the memory module.

The memory controller on Nehalem-based processors also supports a triple-channel memory configuration. This means that instead of installing RAM modules in pairs, they're installed in groups of three for maximum bandwidth.

7: Get Smart -- Shared last-level cache

Intel's Core 2 processors had only two levels of cache, whereas the Nehalem design expands that to three levels. Each processor core has its own L1 and L2 cache, but the L3 cache is shared between all the cores. It's also much larger (8 MB) and functions as a snoop filter. The L3 cache is an inclusive one; that is, it contains a copy of the contents of the L1 and L2 caches. This results in better performance.

8: Go green -- Reduced power consumption

Energy consumption is a hot topic today, as increased concern for the environment is coupled with tightened company budgets. The Nehalem architecture is built with energy conservation in mind. The Xeon 5500 series has a 50% lower idle power consumption rate than its predecessor, with an idle power level of 10 watts, and the Intelligent Power Node Manager for Xeon servers will automatically adjust the power for the optimum power-performance ratio. The High-K metal gate technology allows idle cores to power down separately, and the hafnium circuitry reduces electrical leakage. In addition, DDR3 memory provides an approximate 30% decrease in power consumption because of its lower voltage (1.5 V vs. 1.8 V for DDR2).

9: Virtually possible -- Enhanced virtualization capabilities

Virtualization is a popular way to reduce hardware costs by consolidating servers, but running multiple virtual machines on a single physical machine requires a lot of processing power and memory. Current motherboards for Nehalem-based processors can handle as much as 144 GB of memory per physical machine (18 x 8GB), providing plenty of memory to be allocated to each VM.

In addition, Nehalem processors use VT-d technology, which connects dedicated direct memory access capable I/) resources to VMs. This decreases the performance hit resulting from the interrupt process that causes the VM to have to exit to the VM manager (hypervisor) each time a packet is processed. For more information on this, see Intel Nehalem Will Give Virtualization a Boost.

10: Intel RAS -- Reliability, availability, and serviceability

Some folks will be quick to point out the new features in the Nehalem architecture that are similar to concepts in existing AMD processors (integrated memory controller, QPI/HyperTransport, independently powered processor cores). And with its new Shanghai chips, AMD has finally transitioned to the 45 nm fabrication that was first introduced by Intel. But Intel is already gearing up to go to a 32 nm process. Still, with AMD's Phenom II 940 costing a little less than the Core i7 920 (and its motherboards costing about half as much), there is certainly a price advantage to going with AMD. The cost difference, however, becomes smaller with the newer AMD processors that will use DDR3 RAM.

In performance tests, Intel's desktop processor beats the AMD, including when the processors are overclocked, as noted in this Tom's Hardware comparison.

With the price/performance trade-off, perhaps the deciding factor comes down to this: Which company do you trust the most? Which one is more likely to be around in the future? Which has produced the more reliable product in the past? AMD suffered a blow to its reputation and sales when its quad-core Barcelona chips proved to be buggy.

In the meantime, Intel's commitment to the RAS concept, along with its relentless push to build better and faster processors, has made its products the choice of more hardware vendors, consumers, and enterprises. Ironically, the competition from AMD has undoubtedly been a major motivating factor that has made Intel's processors better.