Hyperthreading is used by most recent Intel servers, Intel describe it here.
I tried searching for information on the performance impact and performance monitoring impact and found several studies that describe improvements in terms of throughput that ranged from negative impact (a slowdown) to speedups of 15-20% per CPU. I didn't see much discussion of the effects on observability or of effects on response time, so thats what I'm going to concentrate on writing about.
First, to summarize the information that I could find, the early measurements of performance included most of the negative impact cases. This is due to two separate effects, software and hardware improvements mean that the latest operating systems on the most recent Intel Pentium 4 and Xeon chipsets have a larger benefit and minimise any negative effects.
From a software point of view, Intel advises that Hyperthreading should be disabled for anything other than Windows XP and recent releases of RedHat and SuSe Linux. Older operating systems are naiive about Hyperthreading and while they run, they don't schedule jobs optimally. I would add that older Solaris x86 releases are also unaware of Hyperthreading and Solaris 9 and 10 for x86 include optimized support. I have also been told that Linux 2.6 includes optimizations for Hyperthreading that have been backported to recent patches for Linux 2.4.
From a hardware point of view, the benefit of Hyperthreading increases as CPU clock rates and pipeline lengths increase. Longer waits for pipeline stalls due to memory references and branch mispredictions create larger "pipeline bubbles" for the second Hyperthread to run in. It is possible that there may also be improvements in the implementation of Hyperthreading at the silicon level as Intel learns from early experiences and improves its designs.
The fundamental problem with monitoring a Hyperthreaded system is that one of the most basic and common assumptions made by capacity planners is invalid. For many years the CPU utilization of a normal system has been used as a primary capacity metric since it is assumed to be linearly related to the capacity and throughput of a normal system that has not saturated. In other words, if you keep the CPU usage below 80-90% busy, you expect that the throughput at 60% busy is about twice the throughput at 30% busy for a constant workload.
Hyperthreaded CPUs are non-linear, in other words they do not "scale" properly. A typical two CPU Hyperthreaded system behaves a bit like two normal fast CPUs with two very slow CPUs which are only active when the system is busy. The OS will normally report it as a four CPU system.
Whereas the service time (CPU used) for an average transaction remains constant on a normal CPU as the load increases, the service time for a Hyperthreaded CPU increases as the load increases.
If all you are looking at is the CPU %busy reported by the OS, you will see a Hyperthreaded system perform reasonably well up to 50% busy then as a small amount of additional load is added the CPU usage shoots up and the machine saturates.
To be clear, the Hyperthreads do normally increase the peak throughput of the system, they often provide a performance benefit, but they make it extremely difficult to manage the capacity of a system.
I've been looking at calibration methods and models that can deal with the capacity measurement problem, and will discuss them in future posts.