Monday, December 31, 2007
Some New Performance Monitoring Tools
There is a simple and extensible open source C based daemon called collectd that writes to RRD files, an alternative to Orca/procallator for people who don't want the Perl based memory footprint of procallator. I'll check it out on my Gumstix millicomputer sometime.
There is yet another open source full-function monitoring tool called Zabbix that looks similar to Cacti in scope, possibly with more features, and with a SQL database backend. It has a commercial company backing it with support contracts etc, somewhat like the XE Toolkit.
The most interesting commercial tool I saw at CMG earlier this month is a capacity monitoring tool called PAWZ from Perfcap Corporation. The key thing they have worked on is taking the human out of the loop as much as possible with sophisticated capacity modelling algorithms and a simple and scalable operational model. It is very similar in concept to the capacity planning research I was working on and publishing in 2002-2004. The core idea is that you care about "headroom" in a service, and anything that limits that headroom is taken into account. Running out of CPU power, network bandwidth, memory, threads etc. will increase response time of the service, so monitor them all, track trends in headroom and calculate the point in time where lack of headroom will impact service response time. At eBay we used to call this the "time to live" for a service. You can easily focus on the services that have the shortest time to live, and proactively make sure that you have a low probability of poor response time. I'm going to take a closer look at this one...