Hyper-Threading Technology. HP BL465c - ProLiant - 2 GB RAM, ML570 - ProLiant - G2, BL40p - ProLiant - 1 GB RAM, ML115 - ProLiant - G5, BL2x220c - ProLiant - G5 Server A, BL20p - ProLiant - G2, ProLiant CL1850, DL360 - ProLiant - Photon, ML350 - ProLiant - G2, ProLiant ML310 Generation 5p

Keeping the pipeline busy requires that the processor begin executing a second instruction before the first has traveled completely through the pipeline. However, suppose a program has an instruction that requires summing three numbers:

X = A + B + C

If the processor already has A and B stored in registers but needs to get C from memory, this causes a

“bubble,” or stall, in the pipeline in which the processor cannot execute the instruction until it obtains the value for C from memory. This bubble must move all the way through the pipeline, forcing each stage that contains the bubble to sit idle, wasting execution resources during that clock cycle. Clearly, the longer the pipeline, the more significant this problem becomes.

Processor stalls often occur as a result of one instruction being dependent on another. If the program has a branch, such as an IF–THEN loop, the processor has two options. The processor either waits for the critical instruction to finish (stalling the pipeline) before deciding which program branch to take, or it predicts which branch the program will follow.

If the processor predicts the wrong code branch, it must flush the pipeline and start over again with the IF–THEN statement using the correct branch. The longer the pipeline, the higher the performance cost for branch mispredicts. For example, the longer the pipeline, the more the processor must execute speculative instructions that must be discarded when a mispredict occurs. Specific to the NetBurst design was an improved branch-prediction algorithm aided by a large branch target array that stored branch predictions.

Hyper-Threading Technology

Intel Hyper-Threading (HT) Technology is a design enhancement for server environments. It takes advantage of the fact that, according to Intel estimates, the utilization rate for the execution units in a

NetBurst processor is typically only about 35 percent. To improve the utilization rate, HT Technology adds Multi-Thread-Level Parallelism (MTLP) to the design. In essence, MTLP means that the core receives two instruction streams from the operating system (OS) to take advantage of idle cycles on the execution units of the processor. For one physical processor to appear as two distinct processors to the OS, the design replicates the pieces of the processor with which the OS interacts to create two logical processors in one package. These replicated components include the instruction pointer, the interrupt controller, and other general-purpose registers ― all of which are collectively referred to as the Architectural State, or AS (see Figure 5).

Figure 5. Hyper-Threading Technology

IA-32 Processor with

Hyper-thread Technology

AS1 AS2

Traditional Dual-processor

(D) System

AS AS

Processor

Core

Processor

Core

Processor

Core

Logical processor

Logical processor

System Bus System Bus

7

Since multi-processing operating systems such as Microsoft Windows and Linux are designed to divide their workload into threads that can be independently scheduled, these operating systems can send two distinct threads to work their way through execution in the same device. This provides the opportunity for a higher abstraction level of parallelism at the thread level rather than simply at the instruction level, as in the Pentium 4 design. To illustrate this concept, refer to Table 3: It is obvious that instruction-level parallelism can take advantage of opportunities in the instruction stream to execute independent instructions at the same time. Thread-level parallelism, shown in Table 4, takes this a step further since two independent instruction streams are available for simultaneous execution opportunities.

It should be noted that the performance gain from adding HT Technology does not equal the expected gain from adding a second physical processor or processor core. The overhead to maintain the threads and the requirement to share processor resources limit HT Technology performance.

Nevertheless, HT Technology was a valuable and cost-effective addition to the Pentium 4 design.

Table 3.

Example of instruction-level parallelism

Instruction number

1

Instruction thread

Read register A

Instruction execution

Operations 1, 2, and 3 are independent and can execute simultaneously if resources permit.

2 Write register B

3 Read register C

Add A + B

4 This operation must wait for instructions 1 and 2 to complete, but it can execute in parallel with operation 3.

5 Inc A This operation needs to wait for the completion of instruction 4 before executing.

Table 4.

Example of thread-level parallelism

Instruction number

Instruction thread

Instruction number

1a Read register A

1b

2a Write register B

2b

3a Read register C

3b

4a

Add A + B 4b

5a Inc

5b

Instruction thread

Add D + E

Inc E

Read F

Add E+F

Write E

Instruction execution

None of the instructions in Thread

2 depend on those in Thread 1; therefore, to the extent that execution units are available, any of them can execute in parallel with those in Thread 1.

As an example, instruction 2b must wait for instruction 1b, but does not need to wait for 1a.

Similarly, if two arithmetic units are available, 4a and 4b can execute at the same time.

According to Intel’s simulations, HT Technology achieves its objective of improving the microarchitecture utilization rate significantly. Improved performance is the real goal though, and Intel reports that the performance gain can be as high as 30 percent.

The performance gained by these design changes is limited by the fact that two threads now share and compete for processor resources, such as the execution pipeline and Level 1 (L1) and L2 caches.

There is some risk that data needed by one thread can be replaced in a cache by data that the other is using, resulting in a higher turnover of cache data (referred to as thrashing) and a reduced hit rate.

8

Hyper-Threading Technology. HP BL465c - ProLiant - 2 GB RAM, ML570 - ProLiant - G2, BL40p - ProLiant - 1 GB RAM, ML115 - ProLiant - G5, BL2x220c - ProLiant - G5 Server A, BL20p - ProLiant - G2, ProLiant CL1850, DL360 - ProLiant - Photon, ML350 - ProLiant - G2, ProLiant ML310 Generation 5p

Hyper-Threading Technology

Related manuals

Table of contents

Hyper-Threading Technology. HP BL465c - ProLiant - 2 GB RAM, ML570 - ProLiant - G2, BL40p - ProLiant - 1 GB RAM, ML115 - ProLiant - G5, BL2x220c - ProLiant - G5 Server A, BL20p - ProLiant - G2, ProLiant CL1850, DL360 - ProLiant - Photon, ML350 - ProLiant - G2, ProLiant ML310 Generation 5p