Monday, April 30, 2007

Jobs at Netflix

A few people have asked me if there are interesting jobs available at Netflix. I haven't started there yet, but they do seem to be looking for senior developers who have a clue about performance.

Here are some job descriptions, let me know if you apply to them...

http://jobs.netflix.com/DetailFlix.asp?flix1449

http://jobs.netflix.com/DetailFlix.asp?flix1576

http://jobs.netflix.com/DetailFlix.asp?flix1440

Friday, April 27, 2007

Leaving eBay to join Netflix

I recently gave notice to eBay and start at Netflix on May 7th. I've had a lot of fun working at eBay Research Labs, but I'm making a strategic move to a smaller company (about a tenth of the number of people as a technology organization) which makes it easier to take a broader role and develop skills and experience in new areas. I will be directing a few senior engineers who develop part of the Netflix web site as my primary role. Its an exciting new challenge!

Tuesday, April 24, 2007

Mobile Phones at Maker Faire

The Silicon Valley Homebrew Mobile Phone Club is going to be featured at Maker Faire this May 19/20th. One of my phone design pictures was used on the site :-)

I've also been continuing to develop the general purpose millicomputing concepts and have been documenting them on my companion blog. I have bought a Gumstix module for use in a phone, and I'm working on benchmarking it at the moment.

Saturday, April 21, 2007

Load Average Differences Between Solaris and Linux

A lot of people monitor their servers using load average as the primary metric. Tools such as Ganglia colorize all the nodes in a cluster view using load average. However there are a few things that aren't well understood about the calculation and how it varies between Solaris and Linux.

For a detailed explanation of the algorithm behind the metric, Neil Gunther has posted a series of articles that show how Load Average is a time-decayed metric that reports the number of active processes on the system with a one, five and fifteen minute decay period.

The source of the number of active processes can be seen in vmstat as the first few columns, and this is where Solaris and Linux differ. For example, some Linux vmstat from a busy file server is shown below.
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 4 43      0  32384 2993312 3157696  0    0  6662  3579 11345 7445  7 65  0 27


The first two columns show the number of processes that are in the run queue waiting for CPU time and in the blocked queue waiting for disk I/O to complete. These metrics are calculated in a similar manner in both Linux and Solaris, but the difference is that the load average calculation is fed by just the "r" column for Solaris, and by the "r" plus the "b" column for Linux. This means that a Linux based file server that has many disks could be running quite happily from a CPU perspective but show a large load average.

The logic behind the load average metric is that it should be a kind of proxy for responsiveness on a system. To get a more scalable measure of responsiveness, it is common to divide the load average by the number of CPUs in the system, since more CPUs will take jobs off the run queue faster. For disk intensive workloads on Linux, it may also make sense to divide the load average by the number of active disks, but this is an awkward calculation to make.

It would be best to take r/CPU count and b/active disk count then average this combination with a time decay and give it a new name, maybe the "slowed average" would be good?

Monday, April 16, 2007

SEtoolkit and XEtoolkit releases

The SEtoolkit was developed in 1993 by Rich Pettit, and I used it as a way to prototype many new tools and ideas over the years. Its a Solaris specific performance tool scripting language that supports very rapid development of new tools. The SEtoolkit has been widely deployed as the Solaris collector for the popular system performance monitor Orca. Rich gave up development of the SEtoolkit a few years ago, put the code into open source under GPL, and its now available via sourceforge, where it is being maintained by Dagobert Michelsen. A bug in the SEtoolkit was causing it to crash when used with complex disk subsystems, and this has now been fixed in the SE3.4.1 release (April 10th, 2007).

Meanwhile, Rich has been trying to make a multi-platform (Solaris, Windows, Linux, FreeBSD, OSX, HP-UX, AIX) version of SE for a long time, and finally gave up trying to implement his own language, and based his latest development, the XEtoolkit, on Java 5. The first full release XEtoolkit 1.0 came out on April 15th, 2007. The code is released and supported under both open source and commercial licenses, by Rich's new company - Captive Metrics. The GPL license allows full free use of the provided tools, and development of new and derived tools that are also contributed to the community. The commercial license allows custom XEtoolkit development for proprietary tools, with a higher level of support.

The XEtoolkit 1.0 release doesn't support HP-UX or AIX, but AIX support is coming soon. I encourage you to try it out, give Rich some feedback and make it worth his while to continue. He's one of the very best programmers and performance tool architects I've ever met....

Thursday, April 12, 2007

myPhone 2.0 Case comes off the 3D printer



More pictures of the latest myPhone 2.0 case design.

This version has a deeper case, and was printed at a simple angle, the previous attempt ended up warping, so I hope these changes will keep it flat. It also has a slot in the bottom end to take an iPod connector, which carries power, USB, stereo line level output etc. an antenna mounting hole at the top, and a retaining clip design to hold the LCD in place.

This contains 2.6 cubic inches of ABS plastic, and used 1.4 cubic inches of support material, which costs about $40 in materials and took 8 hours to print at Techshop.