Friday, May 20, 2005

Performance monitoring with Hyperthreading

Hyperthreading is used by most recent Intel servers, Intel describe it here.

I tried searching for information on the performance impact and performance monitoring impact and found several studies that describe improvements in terms of throughput that ranged from negative impact (a slowdown) to speedups of 15-20% per CPU. I didn't see much discussion of the effects on observability or of effects on response time, so thats what I'm going to concentrate on writing about.

First, to summarize the information that I could find, the early measurements of performance included most of the negative impact cases. This is due to two separate effects, software and hardware improvements mean that the latest operating systems on the most recent Intel Pentium 4 and Xeon chipsets have a larger benefit and minimise any negative effects.

From a software point of view, Intel advises that Hyperthreading should be disabled for anything other than Windows XP and recent releases of RedHat and SuSe Linux. Older operating systems are naiive about Hyperthreading and while they run, they don't schedule jobs optimally. I would add that older Solaris x86 releases are also unaware of Hyperthreading and Solaris 9 and 10 for x86 include optimized support. I have also been told that Linux 2.6 includes optimizations for Hyperthreading that have been backported to recent patches for Linux 2.4.

From a hardware point of view, the benefit of Hyperthreading increases as CPU clock rates and pipeline lengths increase. Longer waits for pipeline stalls due to memory references and branch mispredictions create larger "pipeline bubbles" for the second Hyperthread to run in. It is possible that there may also be improvements in the implementation of Hyperthreading at the silicon level as Intel learns from early experiences and improves its designs.

The fundamental problem with monitoring a Hyperthreaded system is that one of the most basic and common assumptions made by capacity planners is invalid. For many years the CPU utilization of a normal system has been used as a primary capacity metric since it is assumed to be linearly related to the capacity and throughput of a normal system that has not saturated. In other words, if you keep the CPU usage below 80-90% busy, you expect that the throughput at 60% busy is about twice the throughput at 30% busy for a constant workload.

Hyperthreaded CPUs are non-linear, in other words they do not "scale" properly. A typical two CPU Hyperthreaded system behaves a bit like two normal fast CPUs with two very slow CPUs which are only active when the system is busy. The OS will normally report it as a four CPU system.

Whereas the service time (CPU used) for an average transaction remains constant on a normal CPU as the load increases, the service time for a Hyperthreaded CPU increases as the load increases.

If all you are looking at is the CPU %busy reported by the OS, you will see a Hyperthreaded system perform reasonably well up to 50% busy then as a small amount of additional load is added the CPU usage shoots up and the machine saturates.

To be clear, the Hyperthreads do normally increase the peak throughput of the system, they often provide a performance benefit, but they make it extremely difficult to manage the capacity of a system.

I've been looking at calibration methods and models that can deal with the capacity measurement problem, and will discuss them in future posts.

4 comments:

  1. I got a comment via email from Jaime Cardoso, I've added my own response, and italicized his questions:


    I think I understood your point and, it makes absolutely sense but, do
    you think this will also happen with CPUs with heavy TLP?

    I'll try to explain my question.

    I never saw Solaris running on an Intel with HT but, assuming Sun took
    the same aproach than it did with US-IV (dual core), Solaris only shows
    you 1 CPU per each US-IV (and not one per core).


    From a performance and capacity metrics viewpoint, Solaris shows you TWO CPUs per US-IV. The licencing argument is important but doesn't do anything to hide the two cores, which look exactly like two US-III CPUs by all measures. There is a command that licence managers are supposed to use which will report the number of CPU chips/sockets rather than cores.

    This was made to gain grounds with Oracle's CPU licencing (yeah, I too
    thought the excuse lame but, it's what Sun told us).

    Some vendors do charge per chip, others still charge per schedulable entity. In theory this could lead to Oracle charging twice as much with Hyperthreading enabled. I'm not sure what their actual policy is.

    Now, if I have something like vmstat running on a solaris X86 machine in
    a box with an HT processor, would your thoughts about observability
    still hold up?

    Yes, vmstat will show you the Hyperthreads as if they were CPUs, it doesn't matter whether you run Solaris or Linux you see twice as many CPUs with HT turned on.

    In an Intel CPU with HP, things could be misleading but, in an Niagara,
    this error in observability can be critical

    There is an interesting discussion of Niagara from The Inquirer, late last year. The OS will see it as 32 CPUs, its capacity to utilization characteristic will also be non-linear so my comments on Hyperthreading will apply.

    I'm guessing that vmstat (and sar, and iostat and, ...) should be "TLP
    aware" and report the CPU loads accordingly.

    --
    Jaime Cardoso

    These tools get their data from the same kernel statistic, it reports on every schedulable entity it can see, which means that every Hyperthread looks like a CPU.

    ReplyDelete
  2. hi any news on the exacct source? just curious! take care

    ReplyDelete
  3. Great post!

    Some comments:
    You wrote:
    "Hyperthreading should be disabled for anything other than Windows XP"
    ->actually, Windows 2003 uses the same hyper-threading-aware logic as XP, so it is an excellent choice.

    "From a hardware point of view, the benefit of Hyperthreading increases as CPU clock rates and pipeline lengths increase."
    -> you can look at it as 'the faster the CPU is versus the RAM access time, the more the CPU is doing nothing, so the more taking complex steps to make sure the CPU is busy gets useful"

    ->Also, one point to make about CPU utilization/capacity planning is: if you have hyperthreading enabled, CPU utilization is more than just the utilization %.

    You can find more details in my recent MSDN Magazine article discussing hyperthreading performance (June 05).

    // Yaniv Pessach

    ReplyDelete
  4. Americans everywhere humor A detention wow gold notice was written like this: a wow power leveling police car with stones, to win wow gold the detention center for seven wow power leveling days all-inclusive accommodation replica rolex Tour Value; hit send 2 a beautiful bracelet, wow power level fashionsuit, police transport; more more surprises , the former can enjoy free shaved 10; before the 100 can play with power leveling the dogs, the guests were presented massage sticks, electric shocks to CHEAPEST power leveling the dead skin beauty care services.
    asdasdasd

    ReplyDelete