There are an abundance of PC system information utilities available today. Some are considered parts of operating systems, while others are third-party utilities available for a variety of operating systems. In this Daily Drill Down, I will describe the hardware concepts involved with understanding the CPU, bus, and cache information obtained using H. Oda’s WCPUID utility, a Windows-based utility that is very popular with hardware technicians. WCPUID is a strong example of a utility that becomes more useful as your understanding of system hardware increases.
To download the WCPUID utility, go to H. Oda’s home page and click on the download button on the left of the page. When you run the program, one of the first items listed by WCPUID is the Processor type, as shown in Figure A.
|WCPUID output for a Celeron 950 MHz processor|
The CPU Type indicates whether a processor is an OEM version or a retail end-user version. OEM processors are intended for professional installation, and an end-user version is purchased by a consumer. (In the small box on the right side of the CPU Type line, a zero indicates an OEM version, and a 1 indicates an end-user version). The Family is the processor’s generation, while the Model describes the architecture of the processor. A stepping ID is to a processor what a version number is to software. Generally, different stepping ID values for CPUs signal that the manufacturer has fixed some bug in the processor, or that modifications are necessary to make a processor more stable for higher speeds.
Programs can obtain the Family, Model, and Stepping ID values by using a processor’s CPUID instruction. The CPUID value that is shown by the WCPUID utility as the Family, Model, and Stepping ID values can help you determine the type of core and revisions that your processor has gone through. For Intel processors, you can cross-reference the CPUID value using the Processor Specification Updates for the particular processor family you’re searching for. The Processor Specification Updates are available at Intel’s Developer Site.
For the Celeron 950 example shown in Figure A, the CPUID value will be 068A. A leading zero is added to the combination of the Family, Model, and Stepping ID values and the entire value is represented in hexadecimal (A = 10 in hexadecimal. Using this CPUID value, we can cross-reference the Celeron Processor Specification Update to determine that the core stepping of this processor is D0 and, more specifically, cD0. Notice that, based solely on the CPUID value, we cannot determine exactly which processor this is, because processors that use different Internal Clock and System Bus speeds may use the same core. If we make the assumption that this really is a 950-MHz processor, then we can limit our possibilities as to which processor this is. Notice that Intel uses S-Spec values to identify its processors. If you know exactly which Intel processor a system is using, you can find the S-Spec value on the processor itself. The Specification Update shows you where to look.
Another resource provided by Intel that you might find quicker to use is the Processor Spec Finder. You can use it as an alternative to the Processor Specification Updates to find the core stepping for an Intel processor and to zero-in on its S-Spec. If we choose Intel Celeron processors for the processor and 950 MHz for the core speed, we are given a choice of only two possible S-Spec values. If you click on one of the values, a chart comes up detailing that processor’s specifications. By examining both of the S-Spec choices given, we can determine that these two values identify the same processor. One S-Spec represents the processor when it is packaged in a box. You can see that the values in the chart correspond to the WCPUID shown in Figure A. We can also be fairly certain that our example processor is not overclocked, because if it was, the Bus/Core Ratio (Multiplier) of 9.5 would be a different value.
PC bus architecture
Figure B below shows a simplified diagram of a typical modern PC architecture. The system’s chipset components are represented by the North Bridge and South Bridge integrated circuits. Intel calls these components the Memory Controller Hub and I/O Controller Hub, respectively, when referring to Pentium 4 systems. Regardless of their names, their respective functions remain the same. The North Bridge is responsible for handling access to the fastest buses. These include access to the CPU, memory, and the video subsystem via the AGP bus. The South Bridge controls data access to slower I/O connections, such as IDE devices, USB, and various other I/O ports. Once the South Bridge receives this data, it must pass it on to the North Bridge en route to the CPU or memory via some channel. In many systems, this channel is the PCI bus, although newer architectures have incorporated special high-speed channels between the North and South Bridges.
|One of the main uses of WCPUID is to determine a system’s bus speeds.|
Looking back at Figure A, you can see that the Internal Clock speed is approximately 950 MHz. As the name implies, this item corresponds to the clock speed at which the processor is running internally. Since this processor is not overclocked, this is its rated speed. In order to configure the processor at this speed, the System Clock must be multiplied by a factor (Multiplier). The System Clock speed for this system is approximately 100 MHz. (Other common System Clock speeds are 66 MHz and 133 MHz, although you should not be surprised to see a System Clock speed of 166 MHz on the newer Athlon-based systems.) WCPUID indicates that the Multiplier being used is 9.5 (9.5 x 100 = 950). You should be aware that in most modern processors, the multiplier value is locked in order to prevent overclocking. Therefore, if you were to use this same processor with a 66-MHz System Clock, you would have an underclocked Internal Clock value of about 627 MHz (9.5 x 66).
Even though they are closely related, do not confuse the System Clock with the System Bus speed. The System Bus is also often referred to as the Front Side Bus. This is the bus that connects the CPU to the North Bridge. While in the example shown in Figure A, the System Bus runs at the same speed as the System Clock and memory bus, this is not always the case. In fact, many newer processors, such as the Pentium 4 and Athlon XPs, have System Buses that can take advantage of clocking techniques that allow the System Bus to be faster than the System Clock. For instance, a typical Pentium 4 System Bus speed is 400 MHz.
With the use of fast memory such as RDRAM or DDR, many newer processors have Memory Bus speeds that are equal to the System Bus speed. Therefore, the term “Front Side Bus” may refer to the speed of the Memory Bus as well, although this isn’t technically correct. You will also see systems whose chipsets can independently clock the Memory Bus and System Bus. This is usually done in order to accommodate slower memory on a system. For example, the WCPUID segment in Figure C comes from an older Athlon 750-MHz Slot A processor. Even though this processor uses a 100-MHz System Clock and has a Front Side Bus of 200 MHz, the chipset on the motherboard allows the memory to be independently clocked at 133 MHz in order to accommodate 133-MHz SDRAM. This can be quite difficult to visualize and confusing without the assistance of a program like WCPUID.
|This is a portion of a WCPUID screen for a classic Athlon 750-MHz processor.|
Latency reduction has become the primary goal of system designers. While processors are unbelievably fast, the subsystems (memory, hard drives, etc.) that they use to read data from and write data to are terribly slow by comparison. In order to reduce the effects of this disparity, tremendous effort has gone in to developing techniques that reduce latency. One of the most common techniques is the use of a hardware cache system.
You can think of a hardware cache as a complex but highly efficient system of fast memory that helps to reduce system latency by allowing fast read-and-write operations.
The general idea is to reduce the time required to read from and write to slow memory and the much slower disk subsystems by storing data and instructions that are likely to be used often in an area of fast, low-latency memory. For CPUs, this is accomplished in cache levels. When a processor looks for data (a read operation), its internal cache controller first checks to see if the data is in its L1 cache. If it is, a hit is registered, and the CPU receives its data quickly. If the data isn’t in the L1 cache, a miss is registered, and the request is passed on to the L2 cache and subsequently to system memory if the data is not in L2. Finally, the processor can search for data in a disk subsystem. Each level that is traversed represents an increase in latency. That is, the CPU’s wait becomes increasingly longer each time the data is not found.
The L1 cache is a very small amount of fast RAM that communicates with the processor at its full Internal Clock speed. The L2 cache is a larger cache that, depending on the processor and cache setup, may communicate with the processor at full speed or at a fraction of the processor’s internal speed. Over time, L2 cache has migrated from the motherboard onto the CPU’s die in newer processors. Today, all new PC processors from Intel and AMD contain full-speed L2 caches, as shown in Figure A. An example of an L2 that is running at a fraction (2/5) of its processor’s Internal Clock speed can be seen in Figure C. In this case, the L2 cache, although contained within the same package as the processor die, is not on the processor’s die itself.
Cache systems are composed of two main elements: the cache directory and the cache memory (data store). The cache directory acts like an index to the cache memory. The cache memory contains the contents (data or instructions) of the memory addresses that are likely to be reused. The cache memory is the component referred to when specifying a cache size. The information within the cache memory is stored in cache lines. Figure D shows that the cache lines for our Celeron 950-MHz example are 32 bytes long (extra cache info can be displayed with WCPUID by choosing View | Cache Info from the menu). This means that when data is placed into the cache, it is organized in 32-byte blocks.
|Extra Cache Information can be viewed with WCPUID.|
Each data or instruction cache in our example is set up as a four-way set associative cache, indicating that the cache is divided into sets that contain four cache lines apiece. Each one of these cache lines is in a section called a “way.” We can tell from the WCPUID screen that for the Celeron 950, there are 128 sets, each containing four cache lines (32 x 4 = 128). You can think of this as a three-dimensional array composed of a column of cache lines (each line belonging to a different set) that is four ways deep. In each way, there is also a cache directory that assigns a tag address to each cache line. The tag address represents the system memory address, the contents of which are stored in the cache line. Each set can contain the contents from a limited range of system memory addresses. The reason the cache system is organized in this complex fashion is to optimize it for efficiency. Using four-way set associative cache provides a good mix of hits as well as a good response time when determining if data is contained in the cache.
Portions of the CPU’s address bus are used for the tag address and set address. When the CPU places an address on the address bus, the portion that is set aside for the sets is applied to the cache directories and the cache memories simultaneously. All of the cache lines that belong to the set indicated by the set address place their contents in data buffers that are connected to the system’s data bus. However, the data in these buffers is not allowed on the system’s data bus unless one of the cache directories contains a tag address that matches the tag address provided by the CPU. If the tag addresses match, then a portion of the 32-byte cache line that was placed in the data buffer for this way is allowed onto the data bus. This represents a cache hit.
Know thy system hardware
System hardware information utilities such as WCPUID are an important part of a PC technician’s troubleshooting arsenal. They simplify information gathering for analysis and allow a technician to form a mental picture of a system with little effort. While many utilities can be used to acquire basic information without a profound knowledge of system hardware, to obtain the maximum benefit, you need a thorough understanding of the PC hardware concepts involved. Fortunately, this understanding can be acquired without a graduate course in computer engineering.