Monday, December 31, 2007

Some New Performance Monitoring Tools

There is a simple and extensible open source C based daemon called collectd that writes to RRD files, an alternative to Orca/procallator for people who don't want the Perl based memory footprint of procallator. I'll check it out on my Gumstix millicomputer sometime.

There is yet another open source full-function monitoring tool called Zabbix that looks similar to Cacti in scope, possibly with more features, and with a SQL database backend. It has a commercial company backing it with support contracts etc, somewhat like the XE Toolkit.

The most interesting commercial tool I saw at CMG earlier this month is a capacity monitoring tool called PAWZ from Perfcap Corporation. The key thing they have worked on is taking the human out of the loop as much as possible with sophisticated capacity modelling algorithms and a simple and scalable operational model. It is very similar in concept to the capacity planning research I was working on and publishing in 2002-2004. The core idea is that you care about "headroom" in a service, and anything that limits that headroom is taken into account. Running out of CPU power, network bandwidth, memory, threads etc. will increase response time of the service, so monitor them all, track trends in headroom and calculate the point in time where lack of headroom will impact service response time. At eBay we used to call this the "time to live" for a service. You can easily focus on the services that have the shortest time to live, and proactively make sure that you have a low probability of poor response time. I'm going to take a closer look at this one...




Sunday, December 30, 2007

Thoughts on iPhone 1.1.3, Macworld, 3G and business users

Reports indicate the the 1.1.3 update is likely to ship before MacWorld in January, it includes an update to Google Maps that adds the same location feature as recently shipped on other non-GPS platforms (basically locating to the nearest cell-tower). It also provides mechanisms to select and arrange the applications installed on the iPhone. This is a key new feature, since the iPhone has been updated by Apple as a monolithic all-or-nothing set of applications so far. It greatly reduces the need to hack into the iPhone to customize it.

The updates themselves are fairly small, and to me it makes sense to get them installed before MacWorld. We will then see announcements at MacWorld of a set of optional applications that users can pick and choose to install on their iPhones. This could include the ability to add the Mail application and other "missing" iPhone apps to the iPod Touch. It could include brand new applications (I'm going to nominate iChat as a candidate yet again...). The public announcement of the SDK is due at MacWorld, with shipping in February, which will open up anyone to build officially sanctioned applications. However key vendors will have been testing the SDK over the last few months so I expect a bunch of third party applications to be announced or ready to ship at MacWorld.

The other leaks and rumors indicate that there is likely to be a second generation iPhone with 3G support shipping in the spring, and announced at MacWorld. This would also support launching the iPhone in Asian markets like Japan, where there is no GSM support.

I also expect that Apple will start to make moves towards business use of the iPhone, with some tools and upgrades provided by Apple, and others by key third parties.

I currently carry a Verizon Blackberry 8703e for work use, and my iPhone for personal use and iPod functionality. In order to use the iPhone as my work phone I need a few key features.

  • Firewall support - the BB is inside the corporate firewall, the iPhone can't access it. We use Juniper Network Connect which is a Java based VPN solution on MacOS/XP.
  • WiFi support - we use LEAP to login to WiFi at work, need support for LEAP on iPhone, it works fine on my MacOS X laptop, should be a simple feature to add.
  • Exchange support - I can't use the IMAP workaround due to firewall issues, properly integrated Exchange email and calendar support is what everyone is asking for.
  • Ideally RIM will port the Blackberry application suite to the iPhone, like they did for the Treo...
The two other biggest missing features are Flash and Java support. I know there are lots of issues with CPU/memory/battery life. Perhaps the next generation iPhone will be based on a more advanced ARM CPU (e.g. the ARM Cortex based Qualcomm Scorpion) with more performance and more memory so it can run Flash and Java apps alongside the existing apps?

We'll find out in a few weeks...
Happy new year.

Friday, December 28, 2007

MacOS X Leopard, iPhone and Stereo Bluetooth Headphones - A2DP

Over a year ago I bought stereo bluetooth headphones on eBay, its a multifunction unit the OMIZ OMS600, and it includes an MP3 player (with MicroSD slot), FM radio, Stereo A2DP headphone and Headset with Mono audio/Microphone. When I tried to configure it on MacOS X Tiger it didn't work because there was no A2DP headphone support.

MacOS X Leopard now supports A2DP and "just works" with this headset. The Mac sees both the Headset (Mono audio/mic for Skype etc) and Headphone (A2DP Stereo) as separate devices. After the usual Bluetooth device wizard setup, simply put the OMS600 into the headphone mode and pick "Use Headphone" from the Bluetooth dropdown menu on the Mac, the Mac's internal speakers mute, and the Headphones play. It worked over a 10 foot range walking around a room, crackled a bit at the limit and dropped the connection if I went too far away.

I don't see the OMS600 for sale any more, but there are plenty of A2DP headphones out there now, and its nice to see that Apple finally got the devices to work, and made it "Just Work" as usual.

The headset mode also works on my iPhone, but the A2DP mode doesn't. I paired the headset with the iPhone and was able to make calls and receive them. When using the FM radio or the MP3 player built-into the headset it paused and resumed for incoming calls. However when the Headphone mode was paired with my Leopard machine the iPhone didn't route calls to the Headset. The OMS600 headset has the microphone built into the left side earpiece with no voice tube or boom down nearer my mouth, and it doesn't pick up very clearly. The noise cancelling Jawbone headset works far better.

I listened to some music on iTunes (Radiohead In Rainbows and Gorillaz D-Sides are my current albums for serial listening) and waited forever for XP to start up in Parallels so I could fire up IE7 and see a Netflix Watch Instantly show about the Pixies reunion tour (called LoudQuietLoud). [Yes I know it would be nice if it worked natively on the Mac, but the studios only approve Windows DRM, and the alternatives all have issues that are taking way too much time to sort out].

Happy new year...

Sunday, December 09, 2007

A. A. Michelson Award Acceptance Speech

Last week I was selected as this year's winner of the A. A. Michelson Award for Lifetime Achievement and Contribution by the Computer Measurement Group. Here is a transcript of my acceptance speech.


Thanks to CMG, the Michelson Award winners who voted to add me to their ranks, and everyone who I have met here since I first attended CMG in 1994. I treat the CMG conference as my annual training class, a chance to mingle with my peers, learn and share the things I have learned.


There are some people who I have met at CMG or while working in this field who have become special friends, not just people I see at the conference each year. Cathy Nolan is not only national CMG president this year, she is president of my local CMG group in Northern California and was also instrumental in the process of getting me nominated for this award. We have all learned a lot over the years from Neil Gunther. Neil and I have jointly presented many training classes so I have had plenty of opportunity to learn his material. He has made queuing theory accessible and useful for very many people. Mario Jauvin has been my CMG conference buddy for many years, and for the last three years we have jointly presented the Capacity Planning with Free Tools workshop. Yefim (Fima) Somin encouraged me to attend CMG in the first place, we met in 1994 while working to port BGS Best/1 to Solaris, and we have kept in touch over the years, even though he doesn't get to come to CMG nowadays. Fima and Henry Newman were part of the Universal Measurement Architecture standards body that I joined in 1995, and I have also kept in touch with Henry. We worked to get some of his ideas (like extended system accounting) to be implemented in Solaris. Bob Sneed worked with me at Sun and is now the main contact point and conscience for CMG at Sun.

Outside CMG there are some very significant people I would also like to thank. Brian Wong is one of my closest friends, responsible for getting me to move to the USA in 1993, and is also author of the Capacity Planning and Configuration for Solaris Servers book. Brian introduced me to Rich Pettit, and we worked together for many years to build and extend the SE Toolkit, the vehicle for most of my performance measurement ideas. Allan Packer was the co-author of my first CMG paper on Database Sizing. He knows everything about database performance and wrote the book Configuring and Tuning Databases on the Solaris Platform. We borrowed Allan from Sun's office in Adelaide Australia for a month or two, and he introduced me to Richard McDougall, who we also borrowed from Adelaide. Allan and Richard both ended up working for Sun's Performance group in California. Richard built the first tools that could measure memory usage, fixed the Solaris 8 memory system, and co-authored the Solaris Internals books. He became a Sun Distinguished Engineer and recently joined VMware as Chief Performance Architect. Jim Mauro is the other author of Solaris Internals, we borrowed him from Sun's New Jersey office a few times before he also joined the performance group full time. Finally Dave Fisk worked with Brian Wong and I at Sun, and became one of my closest friends. He knows more about Unix storage performance than anyone else I've met, and I've called on him many times (he's now a consultant) to help figure out what's really going on.

I would also like to thank my family. My father is a retired Statistics Professor from the University of Hertfordshire (a.k.a. Hatfield Polytechnic) in the UK. He was programming himself in the 1960's and gently encouraged me to tinker with computers from an early age. My son recently graduated with his BSc in Computer Science from the same University, and now works as a project manager at eBay in California. My daughter has just started a degree course in European Literature at Royal Holloway College, London, so I'm no help at all with her assignments :-)  I'm very proud of them. I moved to the USA on my own a few years ago and met my lovely wife Laurel in 2003. I'm very happy that she has also been able to attend CMG with me and meet many of you.


I have always been glad that I studied Physics, the work we do in performance and the diagnosis and measurement of performance leans heavily on my training as a Physicist and I think that pure Computer Scientists are lacking some of the tools they need to understand measurement and performance issues. Since A. A. Michelson was a Physicist this award is especially gratifying for me.


In 1972 my High School had a link to the computer that my Father used at Hatfield Polytechnic. I taught myself BASIC and Algol during recess....


This is what most people know me for, its getting a little out of date now, since the second edition was published in 1998, but a big thank-you to my co-author Rich Pettit and everyone that bought a copy!


The book documented metrics and performance rules, and they were implemented as an SE Toolkit script called Virtual Adrian. I spent a lot of time with all the tools vendors helping them to manage Solaris better, and with Solaris engineering getting them to make Solaris more manageable. This is what justified my involvement with CMG during my time at Sun, and its the legacy that I am most proud of.


To finish, some of the things I have learned over the years that you may find useful...


If you ever try to write a book, you need to pay attention to Cockcroft's Law of Book Writing. It seems counter-intuitive to people who have never attempted to write a book, but it is the most important thing to know if you ever intend to finish a book. For example, you scope a 200 page book and start writing. After you have written 50 pages you re-scope and find your book is now a 300 page book. You write another 50 pages and find you are now writing a 400 page book. The number of pages you have left to write to finish the book is increasing! You have to reduce scope continuously and plot a trend line that slopes downwards if you ever hope to be published.


If you collect information and share it freely, you will become an expert. You become a magnet for questions, issues and information. Some people tightly control their expertise, this is a huge mistake as it gives others incentives to look for alternative experts or to become experts themselves. If you give out your expertise freely, you become the go-to person in your field, and gather many more recommendations from the people you help. Try it!



Some observations that may help you deal with executive level management. The first one is a big problem that Sun and other manufacturers have, because the things they build take many years to go from conception to delivery. If the managers in charge of these projects don't stick around then the project is likely to be tinkered with and delayed further or dropped. Its nice being at Netflix, where the executives have been there for years and the projects take months to deliver. The second observation is that Capacity Planners have a vertical role in the organization, they gather the lowest level metrics from operations and have to present findings to upper management. We love our detailed metrics, but the only metric that upper management should have to deal with is the dollar. If you can present everything in terms of return on investment (spend $100K to save or grow by $1M) then your executive presentations will be much more effective.


What am I working on now? I'm managing a team that develops the Netflix web site. Its a personalized web service and I think that is the future of the computer industry. The scalability and performance challenges are very interesting. I'm also pushing an idea I call "Millicomputing" which could have a disruptive effect on the industry as storage moves from disks (spinning rust) to flash, and the computers move to ultra low power system-on-a-chip devices. I gave a paper on Millicomputing at CMG07 and have a separate blog on the subject.



My paper at last year's CMG was called "Utilization is Virtually Useless as a Metric", and this answer sums it up. Those who ask questions about utilization don't understand that their questions have no meaning so the answers are irrelevant :-)

Thanks again!

Adrian

CMG07 and the A. A. Michelson Award

From the Computer Measurement Group website:

Each year, CMG considers exceptional individuals who have made significant contributions to our profession over their entire professional careers as possible recipients of the A.A. Michelson Award.

Albert Abraham Michelson, for whom the award is named, was known for his outstanding technical accomplishments in measuring the speed of light as well as for his role as a teacher and inspirer of others. CMG presents this lifetime achievement award to a single individual to recognize and encourage the same combination of technical excellence and professional contributions found in only an exceptional few.

The recipient is nominated and chosen by the past winners, and it has been granted to one person every year since 1974. This year I was very honored to be the recipient! It was announced on Monday 3rd December, at the start of the CMG07 conference. I gave an acceptance speech, that I have summarized into a blog post.

Thanks to everyone who has supported me over the years, reading the blogs, papers and books; coming to training classes; the engineers at Sun who fixed my bugs and added the features I lobbied for to make Solaris more measurable; and the friends, mentors and mentees that I have worked with for many years.

Cheers!
Adrian



Saturday, December 01, 2007

Jobs at Netflix - One down, one to go...

I have been working on hiring for the last month or so. We found and hired one good match and had a bunch of near misses, so I have revised a reissued job description that is clearer about what we are looking for. I need one more senior Java/SQL developer with experience of large scale consumer web site development. We are looking for someone who has a lot of experience in product development, may have spent some time in management or team lead roles, but still "has their head in the code". We run small agile teams, you get to architect and code new product features yourself and deliver to the site every few weeks.

My group develops core personalization algorithms that are used to generate lists of candidate movies and filter them down to the top few to be shown in any particular block on the site. We work on the pages grouped under the main "Browse DVDs" tab at www.netflix.com. We collaborate with a sister group that collects and predicts star ratings and runs the Netflix Prize, they work on pages under the "Movie's You'll Love" tab.




The business logic is written in Java, pulling its data out of Oracle and via middle tier services. The front end presentation layer (javascript/CSS etc.) work is done by engineers working for Bill Scott, Director of UI Engineering, who used to work at Yahoo! where he was chief AJAX evangelist. He is also hiring....

Development is very rapid, agile and iterative. Features go through rigorous statistical A/B testing, and have to show a significant benefit before all the users get to see them. There is a great deal of freedom to try out ideas and an extremely analytical approach to picking the winners.

Netflix is also something of a social experiment, the company culture is unique and very employee oriented. There is minimal process, a huge reliance on personal judgement, and zero tolerance for antisocial behaviors. What would normally be a fat binder of HR policies has been summarized into a few lines: "Act in Netflix's best interest" and the vacation policy is "take some". If you don't have the personal judgement to do the right thing, we say goodbye...

Web Engineering consists of a relatively small number of senior people working very efficiently and productively. We like to hire the best "stars" we can find and build up "bench strength" like a championship winning sports team. We are looking for some more star talent...

I'm hiring one engineer at this time - apply at the above link or find me in LinkedIn or FaceBook to discuss.