I was one of the beta testers for Sun's new x4500 high density storage server, and it turned out pretty well. I was able to hire Dave Fisk as a consultant to help me do the detailed evaluation using his in-depth tools, and it turned into a fascinating investigation of the detailed behavior of the ZFS file system.
ZFS is simple to use, has lots of extremely useful features, and the price is right (bundled with Solaris 10 6/06 or OpenSolaris). However its doing lots of clever things under the hood and it behaves like nothing else. Its far more complicated to predict its performance than any other file system we've looked at. It even baffled Dave at first, he had to change his tools to support ZFS, but he's got it pretty well figured out now.
For a start, its a write anyware file system layout (WAFL) which is similar in some ways to a NetApp filer. This means that random writes are batched up, sorted by file, file system etc. and every few seconds a big burst of sequential writes commits the data to disk as a transaction. Since sequential writes to disk are always much more efficient than random writes, this mean that it gets much more performance per disk than UFS/VxFS etc for random writes.
The combination of the x4500 and ZFS works well, since ZFS knows that the firmware on the 48 SATA drives in the x4500 have a write cache that can safely be enabled and flushed on demand. This greatly improves performance and fixes an issue that I have been complaining about for years. Finally a safe way to use the write caches that exist in every modern drive.
Its actually easier to list the things that ZFS on the x4500 doesn't have.
- No extra cost - its bundled in a free OS
- No volume manager - its built in
- No space management - file systems use a common pool
- No long wait for newfs to finish - we created a 3TB file system in a second
- No fsck - its transactional commit means its consistent on disk
- No rsync - snapshots can be differenced and replicated remotely
- No silent data corruption - all data is checksummed as it is read
- No bad archives - all the data in the file system is scrubbed regularly
- No penalty for software RAID - RAID-Z has a clever optimization
- No downtime - mirroring, RAID-Z and hot spares
- No immediate maintenance - double parity disks if you need them
- No hardware failures in our testing - we didn't get to try out some of these features!
and finally, on the downside
- No way to know how much performance headroom you have
- No way to get at the disks without taking the top off the x4500
- No clustering support - I guess they couldn't put everything on the wish list...
The performance is actually very good, and in normal use its going to be fine, but when we tried to drive ZFS to its limit, we found that the results were less consistent or predictable than more conventional file systems. Some of the issues we ran into are present in the Solaris 10 6/06 release, but when the x4500 ships it will have an update to ZFS that includes performance fixes to speed things up in general and reduce the impact of the worst case issues, so it should be more consistent.
We've put ZFS on some of our internal file servers, to see how it goes in light usage. However, it always takes a while to build up confidence in a large body of new code, especially if its storage related. If we can add this one to the list:
- No nasty bugs or surprises?
Then ZFS looks like a good way to take a lot of cost out of the storage tier.
I'm interested to hear how other people are getting on with ZFS, especially mission critical production uses.
Blogged with Flock