System hardware information utilities such as WCPUID are an important part of a PC technician’s troubleshooting arsenal. They simplify information gathering for analysis and allow a technician to form a mental picture of a system with little effort.
While many utilities can be used to acquire basic information without a profound knowledge of system hardware, to obtain the maximum benefit, you need a thorough understanding of the PC hardware concepts involved. You can use H.Oda!’s WCPUID utility, a Windows-based utility that is very popular with hardware technicians, to learn more about such things as the CPU, bus, and cache information. WCPUID is a strong utility that becomes more useful as your understanding of system hardware increases.
Note that both Intel and AMD have their own utilities that help identify their CPUs. Intel’s is the Processor Frequency ID Utility, and AMD provides the AMD CPU Information Display Utility.
To download the WCPUID utility, go to H.Oda!'s home page and click on the download button in the left nav bar. When you run the program, one of the first items listed by WCPUID is the processor type, as shown in Figure A.
|WCPUID output for a Celeron 950 MHz processor|
The CPU Type field indicates whether a processor is an OEM version or a retail end-user version. OEM processors are intended for professional installation; an end-user version would be purchased by a consumer. (In the small box on the right side of the CPU Type field, a zero indicates an OEM version, and a 1 indicates an end-user version.) The Family field represents the processor’s generation, while the Model field describes the architecture of the processor.
A Stepping ID is to a processor what a version number is to software. Generally, different Stepping ID values for CPUs tell you that the manufacturer has fixed some bug in the processor or that modifications are necessary to make a processor more stable for higher speeds.
Programs can obtain the Family, Model, and Stepping ID values by using a processor’s CPUID instruction. The CPUID value shown by the WCPUID utility in the Family, Model, and Stepping ID fields can help you determine the type of core and revisions that your processor has gone through. For Intel processors, you can cross-reference the CPUID value using the Processor Specification Updates for the particular processor family you're searching for. The updates are available at Intel’s Developer Site.
For the Celeron 950 example shown in Figure A, the CPUID value will be 068A. A leading zero is added to the combination of the Family, Model, and Stepping ID values and the entire value is represented in hexadecimal (A = 10 in hexadecimal). Using this CPUID value, we can cross-reference the Celeron Processor Specification Update to determine that the core stepping of this processor is D0 and, more specifically, cD0.
Notice that based solely on the CPUID value, we cannot determine exactly which processor this is, because processors that use different internal clock and system bus speeds may use the same core. If we assume that this really is a 950-MHz processor, we can limit our possibilities as to which processor it is. Intel uses S-Spec values to identify its processors. If you know exactly which Intel processor a system is using, you can find the S-Spec value on the processor itself. The Processor Specification Update shows you where to look.
Another resource provided by Intel that you might find quicker to use is the Processor Spec Finder. You can use it as an alternative to the Processor Specification Updates to find the core stepping for an Intel processor and to zero-in on its S-Spec. If we choose Intel Celeron processors for the processor and 950 MHz for the core speed, we are given a choice of only two possible S-Spec values. If you click on one of the values, a chart appears that details the processor’s specifications. By examining both of the S-Spec choices given, we can determine that these two values identify the same processor.
One S-Spec represents the processor when it is packaged in a box. You can see that the values in the chart correspond to the WCPUID shown in Figure A. We can also be fairly certain that our sample processor is not overclocked. If it were, the bus/core ratio multiplier of 9.5 would show a different value.
PC bus architecture
Figure B below shows a simplified diagram of a typical modern PC architecture. The system’s chipset components are represented by the North Bridge and South Bridge integrated circuits. Intel calls these components the memory controller hub and I/O controller hub, respectively, when referring to Pentium 4 systems.
Regardless of their names, their respective functions remain the same. The North Bridge is responsible for handling access to the fastest buses. These include access to the CPU, memory, and the video subsystem via the AGP bus. The South Bridge controls data access to slower I/O connections, such as IDE devices, USB, and various other I/O ports. Once the South Bridge receives this data, it must pass it on to the North Bridge en route to the CPU or memory via some channel. In many systems, this channel is the PCI bus, although newer architectures have incorporated special high-speed channels between the North and South Bridges.
|One of the main uses of WCPUID is to determine a system’s bus speeds.|
Looking back at Figure A, you can see that the Internal Clock field shows a speed of approximately 950 MHz. As the name implies, this item corresponds to the clock speed at which the processor is running internally. Since this processor is not overclocked, this is its rated speed. To configure the processor at this speed, the System Clock field's value must be multiplied by a factor (or multiplier). The System Clock field shows a speed for this system of approximately 100 MHz. Other common system clock speeds are 66 MHz and 133 MHz, although you shouldn't be surprised to see a system clock speed of 166 MHz on the newer Athlon-based systems.
WCPUID indicates that the multiplier being used is 9.5 (9.5 x 100 = 950). You should be aware that in most modern processors, the multiplier value is locked to prevent overclocking. So if you were to use this same processor with a 66-MHz system clock, you would have an underclocked internal clock value of about 627 MHz (9.5 x 66).
Even though they're closely related, don't confuse the system clock with the system bus speed. The system bus is also often referred to as the front side bus. This is the bus that connects the CPU to the North Bridge. In the example shown in Figure A, the system bus runs at the same speed as the system clock and memory bus, but this isn't always the case. In fact, many newer processors, such as the Pentium 4 and Athlon XPs, have system buses that can take advantage of clocking techniques that allow the system bus to be faster than the system clock. For instance, a typical Pentium 4 system bus speed is 400 MHz.
With the use of fast memory such as RDRAM or DDR, many newer processors have memory bus speeds that are equal to the system bus speed. So the term front side bus may refer to the speed of the memory bus as well, although this isn’t technically correct. You'll also see systems whose chipsets can independently clock the memory bus and system bus. This is usually done to accommodate slower memory on a system.
For example, the WCPUID segment in Figure C comes from an older Athlon 750-MHz Slot A processor. Even though this processor uses a 100-MHz system clock and has a front side bus of 200 MHz, the chipset on the motherboard allows the memory to be independently clocked at 133 MHz to accommodate 133-MHz SDRAM. This can be quite difficult to visualize and confusing without the assistance of a program like WCPUID.
|This is a portion of a WCPUID screen for a classic Athlon 750-MHz processor.|
Latency reduction has become the primary goal of system designers. While processors are unbelievably fast, the subsystems (i.e., memory, hard drives) that they use to read data from and write data to are terribly slow by comparison. To reduce the effects of this disparity, tremendous development efforts went into creating techniques that reduce latency. One of the most common techniques is the use of a hardware cache system.
You can think of a hardware cache as a complex but highly efficient system of fast memory that helps reduce system latency by allowing fast read and write operations.
The general idea is to reduce the time required to read from and write to slow memory and the much slower disk subsystems by storing data and instructions that are likely to be used often in an area of fast, low-latency memory. For CPUs, this is accomplished in cache levels.
When a processor looks for data (a read operation), its internal cache controller first checks to see if the data is in its L1 cache. If it is, a hit is registered, and the CPU receives its data quickly. If the data isn’t in the L1 cache, a miss is registered. The request is passed on to the L2 cache and subsequently to system memory if the data is not in L2. Finally, the processor can search for data in a disk subsystem. Each level traversed represents an increase in latency. That is, the CPU’s wait becomes increasingly longer each time the data is not found.
The L1 cache is a very small amount of fast RAM that communicates with the processor at its full internal clock speed. The L2 cache is a larger cache that, depending on the processor and cache setup, may communicate with the processor at full speed or at a fraction of the processor’s internal speed. Over time, L2 cache has migrated from the motherboard onto the CPU’s die in newer processors.
Today, all new PC processors from Intel and AMD contain full-speed L2 caches. An example of an L2 running at a fraction (2/5) of its processor’s internal clock speed can be seen in Figure C. In this case, the L2 cache, although contained within the same package as the processor die, isn't on the processor's die itself.
Cache systems are composed of two main elements: the cache directory and the cache memory (or data store). The cache directory acts like an index to the cache memory. The cache memory contains the contents (i.e., data or instructions) of the memory addresses that are likely to be reused. The cache memory is the component referred to when specifying a cache size. The information within the cache memory is stored in cache lines. Figure D shows that the cache lines for our Celeron 950-MHz example are 32 bytes long. (Extra cache information can be displayed with WCPUID by choosing View | Cache Info from the File menu.) When data is placed into the cache, it is organized in 32-byte blocks.
|Extra Cache Information can be viewed with WCPUID.|
Each data or instruction cache in the example is set up as a four-way set associative cache, indicating that the cache is divided into sets that contain four cache lines apiece. Each cache line is in a section called a way. From the WCPUID screen, you can tell that for the Celeron 950, there are 128 sets, each containing four cache lines (32 x 4 = 128). You can think of this as a three-dimensional array composed of a column of cache lines, each line belonging to a different set, that is four ways deep.
In each way, there is also a cache directory that assigns a tag address to each cache line. The tag address represents the system memory address, the contents of which are stored in the cache line. Each set can contain the contents from a limited range of system memory addresses. The cache system is organized for optimum efficiency. Using four-way set associative cache provides a good mix of hits and a good response time when determining if data is contained in the cache.
Portions of the CPU’s address bus are used for the tag address and set address. When the CPU places an address on the address bus, the portion set aside for the sets is applied to the cache directories and the cache memories simultaneously. All cache lines that belong to the set indicated by the set address place their contents in data buffers connected to the system’s data bus.
However, the data in these buffers isn't allowed on the system’s data bus unless one of the cache directories contains a tag address that matches the tag address provided by the CPU. If the tag addresses match, a portion of the 32-byte cache line placed in the data buffer for this way is allowed onto the data bus. This represents a cache hit.