Tuesday, August 30, 2011

I come to use clouds, not to build them...

[Update: Thanks for all the comments and Ryan Lawler's GigaOM summary - also I would like to credit James Urquhart's posting on Competing With Amazon Part 1. for giving me the impetus to write this.]

My question is what are the alternatives to AWS from a developer perspective, and when might they be useful? However I will digress into a little history to frame the discussion.

There are really two separate threads of development in cloud architectures, the one I care about is how to build applications on top of public cloud infrastructure, the other is about how to build cloud infrastructure itself.

In 1984 I didn't care about how the Zilog Z80 or the Motorola 6809 microprocessors were made, but I built my own home-brew 6809 machine and wrote a code generator for a C compiler because I thought it was the best architecture, and I needed something to distract me from a particularly mind-numbing project at work.

In 1988 I joined Sun Microsystems and was one of the people who could argue in detail how SPARC was better than MIPS or whatever as an instruction set, or how Solaris and OpenLook window system were better. However I never designed a RISC microprocessor, wrote kernel code or developed window systems libraries. I helped customers use them to solve their own problems.

In 1993 I moved to the USA and worked to help customers scale their applications on the new big multiprocessor systems that Sun had just launched. I didn't re-write the operating system myself, but I figured out how to measure it and explain how to get good performance in some books I wrote at the time.

In 1995 when Java was released and the WWW was taking off, I didn't work on the Java implementation or IETF standards body, I helped people to figure out how to use Java and to get the first web servers to scale, so they could build new kinds of applications.

In 2004 I made a career change to move from the Enterprise Computing market place with Sun to the Consumer Web Services space with eBay. At the time eBay was among the first sites to have a public web API. It seemed to me that the interesting innovation was now taking place in the creation and mash-up of web services and APIs, no-one cared about what operating system they ran, what hardware that ran on, or who sold those computers.

Over time, the interesting innovation that matters has moved up the food chain to higher and higher levels of abstraction, leveraging and taking for granted the layers underneath. A few years ago I had to explain to friends who still worked at Sun, how I was completely uninterested in whether my servers ran Linux or Solaris, but I did care what version of Java we were using.

Now I'm working on a cloud architecture for Netflix, we don't really care which Content Delivery Network is used to stream the TV shows over the Internet to the customers, we interchangeably use three vendors at present. I also don't care how the cloud works underneath, I hear that AWS uses Xen, but it's invisible to me. What I do care about is how the cloud behaves, i.e. does it scale and does it have the feature set that I need.

That brings me back to my original question, what are the alternatives to AWS and when might they be useful.

Last week I attended an OpenStack meetup, thinking that I might learn about its feature set, scale and roadmap as an alternative to AWS. However the main objective of the presenters seemed to be to recruit the equivalent of chip designers and kernel developers to help them build out the project itself, and to appeal to IT operations people who want to build and run their own cloud. There was no explanation or outreach to developers who might want to build applications that run on OpenStack.

I managed to get the panel to spend a little while explaining what OpenStack consists of, and figured out a few things. The initial release is only really usable via the AWS clone APIs and doesn't have an integrated authentication system across the features. The "Diablo" release this fall should be better integrated and will have native APIs, it is probably suitable for proof of concept implementations by people building private clouds. The "Essex" version targeted at March next year is supposed to be the first production oriented release.

There are several topics that I would like to have seen discussed, perhaps people could discuss them in the comments to this blog post? One is a feature set comparison with AWS, and a discussion of whether OpenStack plans to continue to support the AWS clone APIs for equivalent features as it adds them. So far I think OpenStack has a basic EC2 clone and S3 clone, plus some networking and identity management that doesn't map to equivalent AWS APIs.

The point of my history lesson in the introduction is that a few very specialized engineers are needed to build microprocessors, operating systems, servers, datacenters, CDNs and clouds. It's difficult and interesting work, but in the end if its done right it's a commodity that is invisible to developers and their customers. One of the slides proudly showed how many developers OpenStack had, a few hundred, mostly building it from scratch. There wasn't room on the slide to show how many developers AWS has on the same scale. Someone said recently that the far bigger AWS team has more open headcount than the total number of people working on OpenStack. When you consider the developer ecosystem around AWS, there must be hundreds of thousands of developers familiar with AWS concepts and APIs.

Some of the proponents of OpenStack argue that because it's an open source community project it will win in the end. I disagree, the most successful open source projects I can think of have a strong individual leader who spends a lot of time saying no to keep the project on track. Some of the least successful are large multi-vendor industry consortiums.

The analogy that seems to fit is Apple's iOS vs. Google's Android in the smartphone market. The parts of the analogy that resonate with me are that Apple came out first and dominates the market, taking most of the profit and forcing it's competitors to try and band together to compete, changing the rules of the game and creating new products like the iPad that leave their competition floundering. By adding together all the incompatible fragmented Android market together it's possible to claim that Android is selling in a similar volume to iPhone. However it's far harder for developers to build Android apps that work on all devices, and then they usually make much less money from them. Apple and it's ecosystem is dominant, growing fast, and extremely profitable.

In the cloud space, OpenStack appears to be the consortium of people who can't figure out how to compete with AWS on their own. AWS is dominant, growing its feature set and installed capacity very fast. Every month that passes, AWS is refining and extending it's products to meet real customer needs. Measured by the reserved IP address ranges used by its instances AWS has more than doubled in the last year and now has over 800,000 IP addresses assigned to its regions worldwide.

The problem with a consortium is that it is hard to get it to agree on anything, and Brooks law applies (The Mythical Man-Month - adding resources to a late software project makes it later). While it seems obvious that adding more members to OpenStack is a good thing, in practice, it will slow the project down. I was once told that the way to kill a standards body or consortium is to keep inviting new people to join and adding to its scope. With the huge diversity of datacenter hardware and many vendors with a lot to lose if they get sidelined I expect OpenStack to fracture into multiple vendor specific "stacks" with narrow test matrixes and extended features that lock customers in and don't interoperate well.

I come to use clouds, because I work for a developer oriented company that has decided that building and running infrastructure on a global scale is undifferentiated heavy lifting, and we can leverage outside investment from AWS and others to do a better job than we could do ourselves, while we focus on the real job of developing global streaming to TVs.

Operations oriented companies tend to focus on costs and ways to control their developers. They want to build clouds, and may use OpenStack, but their developers aren't going to wait, they may be allowed to use AWS "just for development and testing" but when the time comes to deploy on OpenStack, it's lack of features is going to add a significant burden of complexity to the development team. OpenStack's lack of scale and immaturity compared to AWS is also going to make it harder to deploy products. I predict that the best developers will get frustrated and leave to work at places like Netflix (hint, we're hiring).

I haven't yet seen a viable alternative to AWS, but that doesn't mean I don't want to see one. My guess is that in about two to three years from now there may be a credible alternative. Netflix has already spent a lot of time helping AWS scale as we figured out our architecture, we don't want to do that again, so I'm also waiting for someone else (another large end-user) to kick the tires and prove that an alternative works.

Here's my recipe for a credible alternative that we could use:

AWS has too many features to list, we use almost all of them, because they were all built to solve real customer problems and make life easier for developers. The last slide of my recent cloud presentations at http://slideshare.net/adrianco contains a partial list as a terminology glossary. AWS is adding entirely new capabilities and additional detailed features every month, so this is a moving target that is accelerating fast away from the competition...

From a scale point of view Netflix has several thousand instances organized into hundreds of different instance types (services), and routinely allocates and deallocates over a thousand new instances each day as we autoscale to the traffic load and push new code. Often a few hundred instances are created in a few minutes. Some other cloud vendors we have talked to consider a hundred instances a large customer, and their biggest instances are too small for us to use. We mostly use m2.4xl and we need the 68GB RAM for memcached, Cassandra or our very large Java applications, so a 15GB max doesn't work.

In summary, although the CDN space is already a commodity with multiple interchangeable vendors, we are several years from having multiple cloud vendors that have enough features and scale to be interchangeable. The developer ecosystem around AWS concepts and APIs is dominant, so I don't see any value in alternative concepts and APIs, please try to build AWS clones that scale. Good luck :-)

24 comments:

  1. I think part of the problem is the amount of investment that must be made to be able to provide the capacity that you are requesting. Amazon has the advantage of using the tools internally, verify that they are scaling in the way needed, and then slowly build up their capacity over the years to be in 5 regions with multiple zones in each region.

    ReplyDelete
  2. At this point, the only person I expect who would build an AWS clone is an existing large customer who decides it would be cheaper to move their infrastructure to a new platform, but who doesn't want to rewrite all their code.

    Rackspace don't want to build an AWS clone - I think if they did, they would've done it by now, and the same goes for everyone else I see on the Openstack project. Hopefully I'm wrong, but Openstack's focus is (from what I can see) on building a core dynamic compute platform with it's own APIs and functionality, and mapping that onto the AWS feature set is, as you say, always going to end in gaps and mismatches.

    If you were spending perhaps $50-$100 million per year on AWS, it might become interesting financially to build an alternative platform to reduce your own costs, including selling capacity to increase scale and help with your own peak load requirements.

    ReplyDelete
  3. You are always an interesting read Adrian. My only concern (in general and not only applicable to this specific AWS / Openstack discussion) is that cloning/overlaying an interface has never been an easy task. Especially at the pace AWS is innovating (new stuff every month as you pointed out).

    An interesting space to monitor indeed. Lots of things going on.

    Massimo Re Ferre' (VMware)

    ReplyDelete
  4. I have to agree with much of what you write. Where I disagree is that as far as I can observe your conclusions don't apply to everyone.

    By using "almost all" of the AWS features you have consciously chosen to effectively lock yourself in to AWS. Is is very clear that using services like SimpleDB or features like IAM increases one's switching cost to another cloud. One could easily draw analogies to the operating systems world where using proprietary features makes it harder to switch OS. I'm not judging whether your choice is a good one or bad one, I'm just stating what I see.

    Take one of AWS' other very top customers. They have chosen to limit which features they use and they have successfully completed the build-out of a private cloud based on CloudStack. They don't want to leave AWS, far from it, but they want to have alternatives from a risk mitigation point of view, from a cost point of view, and also from a flexibility point of view. They evidently decided that the cost of not using some nifty AWS feature was preferable to not having the alternative.

    Also, while what you write certainly applies to users of your size, it's not at all clear how much applies to smaller users (i.e. a large fraction of the market by $$ and virtually 100% by user count). For many users other factors come into play. Most of the up-coming cloud providers are not trying to compete with AWS head-on, they offer somewhat different value props.

    Anyway, I'm not trying to say that you're wrong, but that there are other equally valid perspectives on what's happening.

    Thorsten - CTO RightScale

    ReplyDelete
  5. "In the cloud space, OpenStack appears to be the consortium of people who can't figure out how to compete with AWS on their own."

    My first question is http://www.datacenterknowledge.com/archives/2009/02/02/is-amazons-cloud-profitable/ ??

    Does anyway think EC2 is profitable? We can say youtube is the largest video site, but without a large parent company paying all it's bills what does its success mean?

    Maybe no one can "compete" with ec2 because no one is sitting ontop of $300,000,000 in unclaimed gift cards, and everyone else has to make a profit.

    Also while I will say that amazon's technology and implementation is great. It is not a revolutionary business concept. http://en.wikipedia.org/wiki/Cobalt_RaQ http://www.ensim.com/ using an XML api to create a new instance is not really a stretch from having ensim create a jail after filling out a form and making you an ssh account.

    I here stuff like this a lot "we are in a paradigm shift; infrastructure moves from IT to DEV"

    Off topic rant coming...
    Good luck with that! I have worked in data center for a long time do you know what I always see? IT having to buy more hardware because of bad code! How many databases I have seen with terrible indexing, or improper use of relational design.

    Dev's do not see it that way. IT are just these guys that are supposed to prep more and more servers for them, because there waterfall says they have to be done by today, or because the 500 unneeded jars in their classpath are blowing out permgen.

    All devs want control of infrastructure, but are so far abstracted from hardware they do not know what IOwait is. Put this type of person into Amazon and there only answer will be "It scales, buy more instances" Everyone will be happy till the end of the month bill comes.

    Anyway back to open stack. Most small companies only need a small overhead in terms of gear to be "elastic" Maybe if you have a 30 nodes you need 5 free ones. If you can get the good parts of cloud computing like shared storage without having to pay someone else for the privilege that is win.

    ReplyDelete
  6. Thanks for all the comments. In reply to Thorsten, I don't think its necessary to implement every AWS feature, but building simple versions of EC2 instance provisioning and S3 object store and declaring AWS compatibility doesn't cut it. To pick a specific example, the Simple Queue Service (SQS) is very useful, easy to use, scalable and reliable. We ran our own message queue service in our datacenter for many years, and it wasn't a happy experience. I'm very glad it's someone else's problem, and we use SQS heavily in our architecture.

    Each application developer is going to want a slightly different set of features. The full set of features AWS provides is driven by requests from their customer base, and in part by external productization of things Amazon has built to support its own web site.

    If we decided that portability was more important than functionality to avoid cloud vendor lock-in, then we would be swapping vendor lock-in for abstraction layer lock-in. We would be tied to Rightscale, Smartscale, Enomaly or whatever. There is also nothing to stop developers reaching around the abstraction layer to use specific AWS features, so we would end up locked into a lowest common denominator abstraction layer and to the underlying platform.

    In stead we have built our own Java based abstraction layer that is tuned to meet our needs and scale. We could port it to another cloud vendor if we had to, but would prefer to just point it at another AWS compatible cloud. It's not a lot of work to build this ourselves, because AWS is doing the heavy lifting underneath.

    What I would like to see is a table of all the AWS features and APIs in detail, with the alternative's matched alongside so developers can figure out what is common, and cloud providers can work on filling in the gaps.

    ReplyDelete
  7. +1, I agree with your conclusions and also long for a flexible alternative to AWS.

    Taking Rightscale as an example, it's not currently possible to create a complete operating system image from scratch and provision it as a virtual machine in the Rightscale cloud. Coming from an information security background, it's simply not an option for us to take an existing Rightscale image and customize it. We want to start from our own binaries and build up from there. Not to just pick on Rightscale because really none of the others offer this in a meaningful way. Azure for example has zero support for Linux of any kind. IMO that one is a political decision more than anything else - if that was a standalone company they would 100% support Linux as an option. Perhaps the same person that made the decision for no IE9 support on Windows XP is behind that one. Joyent has similar limitations, and they have a longer term commitment. There's also no good EBS-like option to scale up a small instance with a huge amount of POSIX-compliant direct attach storage (IE, not S3-like object storage.) It makes it very challenging to support a horizontally scaled application with huge storage requirements but low CPU+memory requirements.

    This is just scratching the surface of limitations of other cloud vendors. I'm a customer of many of them, mostly because I want to have an option for multi-provider redundancy. Every time I look outside of the AWS ecosystem I start immediately banging into limitations. They will eventually be dealt with but for now the "public cloud" really lives in a data center in Virginia.

    ReplyDelete
  8. Adrian - I'm curious to what you think of Netflix relying on a cloud provider that competes with Netflix's core business (watching movies). Don't you think this leads to a conflict of interest that you have to worry about?

    Also, it is typically unwise for any business to have a single supplier for the core of their business. I might be biased because I used to work on a competing platform sometime ago but if I were a multi-billion dollar company running solely on AWS, I would be looking to mitigate that supplier-sourcing risk by straddling multiple cloud providers.

    ReplyDelete
  9. RJB: there is no "RightScale cloud", RightScale is a management system for many clouds. Also, you can fully build your own image and use it in RightScale. We provide our agent in source form. IMHO we don't make it as easy as we should to put it all together, but building production quality images from scratch isn't easy no matter how you slice it in our experience. Sigh.

    But you're completely correct that no matter how hard it is to create images for AWS, creating them for other clouds is even more difficult and often impossible. One of the first things we ask cloud providers is how we can build an image from scratch (as opposed to launching a server from some existing image, hacking it up, and then snapshotting). Nine times out of ten we get blank stares. "Why would you want to do that?" Duuuh.

    ReplyDelete
  10. I mis-spoke, I mean "rackspace" not "rightscale" - doofus typo on my part.

    ReplyDelete
  11. Your analogy is flawed. Your claim is that Android will be the less dominant, slower growing and less profitability platform then iOS just because it is currently the single largest player in the market. Android's current 41.8% vs iPhones 27% market share coupled with Androids high growth rate, currently +5.4% vs +1% over a 3 months period will allow Android to far exceed iOS's market over time. That coupled with the target demographic of the iOS platform, Pareto's principle and the Long tail make it more logical that Android will be the more dominant, higher growth and more profitable platform. That is the principles behind Amazon's and even Netflix's business models. Also, additional time spent developing on a more fragmented platform is relative to the market share that you are trying to target.

    While I think that AWS will be a dominate player for a while, it will not last forever as technology adoption and trends change over time and we are only beginning to see the potential of the cloud. With so many contributors to OpenStack and Citrix's recent announcement of CloudStack being open sourced and merged with OpenStack, I see this project having huge potential. RedHat recently started a project called Aeolus that uses some of OpenStack's code base, so there are large industry players focusing their attention to an open cloud solution. More competition is good for the industry as a whole and we can only reap the rewards of all the advancements invested in what will serve to grow the general public cloud ecosystem.

    As for portable VM images, isn't that what the purpose of the OpenStack image service (Glance) is for? I need to look over the code, but I was looking into moving our KVM images over to the Rackspace cloud using that solution in a future POC I had planned.

    ReplyDelete
  12. Agree with just about everything you said. I'm sure your architects were largely aware of the vendor lock-in issues and still decided to proceed, given TCO and the lack of alternatives. I have some experience with that.

    OpDemand happens to offer a management tool similar to RightScale, called C2. C2 uses a cloud-agnostic technology much like AWS's CloudFormation to distill cloud infrastructure management to its simplest form.

    However, in order to facilitate our levels of orchestration and resulting savings on runtime IaaS costs, C2 must rely heavily on (for example) Elastic Addressing and boot-from-EBS. I've yet to see credible alternatives from other cloud providers.

    While we are actively developing against OpenStack and others, the bottom line is our users will continue to see the best orchestration and the most cost savings when they deploy C2 platforms to AWS.

    Most of our users would rather leverage EIP/EBS and save up to 90% on hosting costs for their complex environments, than receive a promise of hosting provider portability.

    "To the [technological] victor go the spoils".

    Gabriel Monroy
    CTO - OpDemand

    ReplyDelete
  13. Great comments, thanks. On the Android vs. iOS analogy, it took about 2-3 years for Android to reach a competitive aggregate installed base and feature set, but it has so far failed to make any real dent in the tablet market against iPad. That's partly why I think it may be 2-3 years until there are viable alternatives to AWS at scale.

    However, Apple is currently making more profit from iOS devices than the whole of the rest of the mobile/tablet industry put together, not just Android.

    I'm an iPhone developer in my spare time, I've got a good installed base and it takes little or no effort to update my app as new iPhone's come out. Netflix's iOS apps were written quickly and rolled out to many users in one go. In contrast Netflix on Android took forever to develop, and is rolling out piecemeal month by month to individual devices with a far bigger effort in testing, and a much lower usage rate in terms of movie viewing. The ROI for Android is clearly much lower, which will slow down growth of it's developer ecosystem.

    ReplyDelete
  14. For the single supplier and competitor issues raised by Sriram, there is nothing new here, we went into this with our eyes open 2-3 years ago. Before that we had a single supplier of databases from Oracle and datacenter hardware from IBM, so it's not unprecedented. As I said in the blog post, in a few years time we expect a credible alternative to appear.

    It is far more important to Netflix to leverage AWS as much as possible to accelerate our business plans. It would not have made sense to wait for a year before going international because we wanted to build up an alternative supplier or build our own datacenter. The prize of business agility is bigger than the risk of single supplier.

    If AWS treated Netflix in an unfavorable manner because Amazon Prime is a competitor it would be a big issue for them. AWS supports many competitors to Amazon businesses, and they would lose credibility if they discriminated. In fact the opposite is true, AWS is extremely supportive of Netflix, we have much better access to AWS management and engineering than typical AWS customers because we are pushing their limits and giving them so much feedback on how to improve their product. This is also a great reason to work at Netflix....

    ReplyDelete
  15. About the 2-3 year time frame. Eucalyptus under Rich Wolski at UCSB was mimicking and bench-marking AWS compatibility 2-3 years ago. After metamorphosis from NSF funded UCSB CS project for researchers to Marten Mickos led Eucalyptus seems to have good AWS coverage...

    http://www.eucalyptus.com/resources/AmazonAWS

    ReplyDelete
  16. Yes I have been disappointed that no competitor with the full breath of features to AWS has emerged as this would create price competition and drive the market.
    OpenStack looks more like a commodity open source cloud infrastucture that can be used for cloud suppliers to build new products to compete in new or specialized markets like Platform as a Service.
    Most conservative large companies (finance, manufacturing etc) continue to use IBM, Oracle, SAP in their your own data centres. The major costs are people costs so unless a cloud can make a big reduction in those then they won't move to amazon to save a little on the server costs. So cloud companies are investing higher up the cloud stack in Platform as a Service and in applications (Saas) where the actually Infrastructure is irrelevant to the customer and the customer could make bigger reductions in the number of IT personnel they need.

    http://ec2dream.blogspot.com

    ReplyDelete
  17. If I got to advise/beg an AWS competitor, I'd ask them to do stuff that AWS hasn't and won't -- on the hardware side:

    - Have instances with disk arrays or conventional SANs. You don't want to implement EBS.

    - Offer instances with Flash storage, maybe in the hybrid SSD/HDD pool setups offered by, for example, Sun and (now) LSI (CacheCade 2.0).

    On the software side, I'd love to see some vendor do for frontend hosting/PaaS what Amazon did for VM hosting -- do it so well, scalably, and cheaply that customers who would have never considered app hosting will have to take it seriously.

    - Scalability and cost. I want to be able to run a ton of frontends and have a ton of DB servers or a scalable datastore back there. Automatic and cheap are good.

    - Familiar environment. This is what hurt App Engine and helped EC2: App Engine makes you write apps for only their platform, and those apps won't run elsewhere. Give us familiar frameworks and libraries, and support a range of stacks.

    - VMs and app hosting on the same local network. Often just a couple of bits of a product are too nonstandard to run in application hosting, but they're totally non-negotiable pieces. (Certainly true at my work.) If I can get a low-latency secure connection to my special-sauce VM from my bog-standard commodity frontend, that makes the app hosting much more interesting.

    If they build it, I'm sure some large sites will decide sysadmin'ing frontend and database clusters isn't a core competency, much the way AWS clients have decided that managing physical machines is best left to somebody else.

    ReplyDelete
  18. Adrian,
    Great write up. I was at the meetup on OpenStack where you asked the last, but most important question - What is OpenStack? I thought you were being sarcastic, but I agree, someone should have started the session with what is OpenStack instead of going all about the history. This is not about that meetup feedback. This is a great post on OpenStack, AWS, and mainly I caught the abstraction layer you mentioned. As someone helping customers on their move to the cloud, APIs and abstraction layers, and question of vendor independence and standards comes up a lot. As we look into figuring out which abstraction layer needs to prioritized for support, I was wondering if your team has looked into jClouds and others out there. It would be good to hear your opinion on them. Also, since you already have abstraction layer, are there any considerations on making it available outside of Netflix in some form?

    Thank you for the great write up.

    Rag
    (API, software integration guy at Savvis)

    ReplyDelete
  19. In saying we don't want to use the feature set to avoid vendor lockin, then what do you do when you need a specific feature? Do you reinvent the wheel and write your own services and capabilities if Amazon is providing all of them (like Adrian mentioned with the messaging service?). Why dedicate developer resources to those mundane tasks when they can focus on the applications they need to write?

    You picked AWS and pay money to them for a reason. Take advantage of what they have to offer and don't treat them like any other black box.

    ReplyDelete
  20. @Neill

    For some companies the traditional IT vendors are the answer, but for many companies it's not, and for many others, the answer is a mix of both.

    ReplyDelete
  21. Hi Rag, I'm planning to make my QConSF talk in November be a more in depth view of the platform Netflix has built, and how it leverages specific AWS APIs. Haven't started to write the slides yet, but that's the concept.

    ReplyDelete
  22. I think you forget that there is such a thing as systems engineering. And that Netflix seems to have lacked a strong Sys Eng counterpart.

    Netflix seems to have built an operational and sys eng culture of "working around" AWS flaws and shortcomings. And somehow, this is better than not going around those? I fail to see how.

    Netflix's case seems to be of a company that had horribly implemented physical operations, and they "ran to the cloud" because they could not figure out how to "fix it".

    This is an HR issue, not a technology one.

    You guys used to run IBM hardware, horrible expensive networks, and oracle DB's. Of course things are going to seem inexpensive, when you were running possible the most expensive SW/HW combination known to the IT world. Again, and HR issue, not a technology one ;)

    Netflix's scale is such, that you CAN run your own efficient and modular infrastructure. A good knowledge of hardware and its lifecycle, and using the correct mix of OSS technologies and home grown technologies goes a long way. Using virtualization were it makes sense, physical hardware were it does.

    Again, its an HR issue. Seems the tech org said "we can't do this". This is not the same for every company and startup out there, and there are HUGE productivity and COST savings by running efficient and agile System Operations.... You guys should try it ;)


    PS: AWS can be great for many small to medium startups, if they use it correctly. I just fail to see, how at netflix's scale, you guys can't do equal or better. Amazon's tech staff is likely similar in size to yours ;)

    And yes, I know I deviated. But it seems Netflix as a company, fell for the "smoke and mirrors" trick, looking for the "clouds". Get your head off the clouds, and become a technology company.

    ReplyDelete
  23. @OneMinuteRationalist - when we architected for cloud we did not have scale, that was a result of the cloud migration. Our datacenter has a few tens of web servers, our initial cloud deployment was a few hundred instances and it's now many thousands about a year later. Amazon started out with 100 times as many people as we did working on IT automation, probably still has 10 times as many as we do now.

    We tried working the HR side of this, the management in datacenter ITops was replaced a few times over the last few years, then we gave up and outsourced most of it.

    ReplyDelete
  24. I worked for Amazon for 5 years as a sysadmin from 2001-2006.

    I successfully built a roughly 800-server datacenter recently and did so significantly cheaper than going with AWS. After doing the math it appears that after AWS is 3-4 times more expensive than doing it yourself in the datacenter (cheaply). And you get benefits, like going with hard servers and you can rely on your IOPS to your disks for those instead of having to deal with interesting performance problems with EBS storage (and having to throw lots of RAM-heavy instances at the problem, which looks to be your solution based on your comments about 68GB RAM servers -- I don't find that comment particularly impressive, I find it more telling...).

    What you can't do is replace AWS if you don't understand how to do things cheaply. If you buy a big SAN and drop fat VMWare instances on it and hook it all up with 10G ethernet bought from Cisco and then drop $100k on every IT management vendor who gives you some free schwag, then it'll be substantially more costly. Also, if you have to support massive legacy in your infrastructure and can't get any of the product owners to change anything and unify your management of simple things (e.g. DNS should be well structured and trivial to manage at 800-servers), then your costs go up.

    So, if you're willing to make decisions to make IT cheap and simple, then you can easily beat Amazon. If not, then what outsourcing to Amazon does is force you to become cheap and simple.

    The real good part of the Amazon cloud is all the dinosaurs who think IT needs to be done expensively and have survived inside of large organization with large IT budgets -- in companies with otherwise healthy business models -- are going to have competition and will ultimately get flushed out of the industry.

    ReplyDelete