Adrian Cockcroft's Blog: google

Showing posts with label google. Show all posts

Thursday, April 03, 2014

Public Cloud Instance Pricing Wars - Detailed Context and Analysis

As part of my opening keynote at Cloud Connect in Las Vegas I summarized the latest moves in cloud, the slides are available via the new Powered by Battery site as "The Good the Bad and the Ugly: Critical Decisions for the Cloud Enabled Enterprise". This blog post is a detailed analysis of just part of what happened.

Summary points

AWS users should migrate from obsolete m1, m2, c1, c2 to the new m3, r3, c3 instances to get better performance at lower prices with the latest Intel CPUs.
Any cloud benchmark or cost comparison that uses the AWS m1 family as a basis should be called out as bogus benchmarketing.
AWS and Google instance prices are essentially the same for similar specs.
Microsoft doesn’t appear to have the latest Intel CPUs generally available and only matches prices for obsolete AWS instances.
IBM Softlayer pricing is still higher, especially on small instance types
Google's statement that prices should follow Moore’s law implies that we should expect prices to halve every 18-24 months
Pricing pages by AWS, Google Compute Engine, Microsoft Azure, IBM Softlayer
Adrian’s spreadsheet summary of instances from the above vendors at http://bit.ly/cloudinstances
Analysis of the prices by Rightscale

On Tuesday 25^th March 2014 Google announced some new features and steep price cuts, the next day Amazon Web Services also announced new features and matching price cuts. On Monday 31^st March Microsoft Azure also reduced prices. Many pundits repeated talking points from press releases in blog posts but unfortunately there was little attempt to understand what really happened, and explain the context and outcome. When I wrote up a summary for my opening keynote at Cloud Connect on 31^st March I looked at the actual end result and came up with a different perspective and a list of gaps.

I’m only going to discuss instance types and on-demand prices here. There was a lot more in the announcements that other people have done a good job of summarizing. The Rightscale blog linked above also gives an accurate and broader view on what was announced. I will discuss other pricing models beyond on-demand in future blog posts.

There are some things you need to know to get the right background context for the instance price cuts. The most important is to understand that AWS has two generations of instance types, and is in a transition from Intel CPU technology they introduced five or more years ago to a new generation introduced in the last year. The new generation CPUs are based on an architecture known as Sandybridge. The latest tweak is called Ivybridge and has incremental improvements that give more cores per chip and slightly higher performance. Since Google is a recent entrant to the public cloud market, all their instances types are based on Sandybridge. To correctly compare AWS prices and features with Google, there is a like-for-like comparison that can be made. AWS is encouraging the transition by pricing its newer faster instances at a lower cost than the older slower ones. In the recent announcement, AWS cut the prices by obsolete instance type families by a smaller percentage than the newer instance type families, so the gap has just widened.

Old AWS instance types have names starting with m1, m2 and c1, c2. They all have newer replacements known as m3, r3 and c3 except the smallest one – the m1.small. The newer instances have a similar amount of RAM and CPU threads, but the CPU performance is significantly higher. The new equivalents also replace small slow local disks with smaller but far faster and more reliable solid-state disks, and the underlying networks move from 1Gbit/s to 10Gbit/s. The newer instance families should also have lower failure rates.

Most people are much more familiar with the old generation instance types, and competitors write their press releases they are able to get away with claiming that they are both faster and cheaper than AWS, by comparing against the old generation products. This is an old “benchmarketing” trick – compare your new product against the competitions older and more recognizable product.

For the most commonly used instance types there is a close specification match between the AWS m3 and the Google n1-standard. They are also exactly the same price per hour. Since AWS released its changes after Google, this implies that AWS deliberately matched Google’s price. The big architectural difference between the vendors is that Google instances are diskless, all their storage is network attached, while AWS have various amounts of SSD included. The AWS hypervisor also makes slightly more memory available per instance, and ratings for the c3 imply that AWS is supplying a slightly higher CPU clock rate for that instance type. I think that this is because AWS has based its compute intensive c3 instance types on a higher clock rate Ivybridge CPU rather than the earlier Sandybridge specification. For the high memory capacity instance types it is a little different. The Google n1-himem instances have less memory available than the AWS r3 equivalents, and cost a bit less. This makes intuitive sense as this instance type is normally bought for its memory capacity.

Microsoft previously committed to match AWS prices, and in their announcement their comparisons matched the m1 range exactly at it’s new price, and they compared their memory oriented A5 instance as cheaper than an old m2.xlarge, but the A5 is an older slower CPU type, more expensive ($0.22 vs $0.18) and has less memory (14GB vs. 15GB) than the AWS r3.large. The common CPU options on Azure are aligned with the older AWS instance types. Azure does have Intel Sandybridge CPUs for compute use cases as the A8 and A9 models, but I couldn't find list pricing for them and they appear to be a low volume special option. The Azure pricing strategy ignores the current generation AWS product, so the price match guarantee doesn’t deliver. In addition the Google and AWS price changes were effective from April 1^st, but Azure takes effect May 1^st.

IBM Softlayer has a choose-what-you-want model rather than a specific set of instance types. The smaller instances are $0.10/hr where AWS and Google n1-standard-1 are $0.07/hr. As you pick a bigger instance type on Softlayer the cost doesn’t scale up linearly, while Google and AWS double the price each time the configuration doubles. The Softlayer equivalent of the n1-standard-16 is actually slightly lower cost than Google. Softlayer pricing on most instances is in the same ballpark as AWS and Azure were before the cuts, so I expect they will eventually have to cut prices to match the new level.

Gaps and Missing Features

The remaining anomaly in AWS pricing is the low-end m1.small. There is no newer technology equivalent at present, so I wouldn’t be surprised to see AWS do something interesting in this space soon. Generally AWS has a much wider range of instances than Google, but AWS is missing an m3.4xlarge to match Google's n1-standard-16, and the Google hicpu range has double the CPU to RAM ratio of the AWS c3 range so they aren’t directly comparable.

Google has no equivalent to the highest memory and CPU AWS instances, and has no local disk or SSD options. Instead they have better attached disk performance than AWS Elastic Block Store, but attached disk adds to the instance cost, and can never be as fast as local SSD inside the instance.

Microsoft Azure needs to refresh its instance type options, it has a much smaller range, older slower CPUs, and no SSD options. It doesn’t look particularly competitive.

Conclusion

If you buy hardware and capitalize it over three years, and later on there is a price cut; you don’t get to reduce your monthly costs. Towards the end your CPUs are getting old, leading to less competitive response times and higher failure rates. With public cloud vendors driving the costs down several times a year and upgrading their instances, your model of public vs. private costs needs to factor in something like Moore’s law for cost reductions and a technology refresh more often than every three years. Google actually said we should expect Moore’s law to apply in their announcement, which I interpret to mean that we can expect costs to halve about every 18-24 months. This isn’t a race to zero; it’s a proportional reduction every year. Over a three-year period the cost at the end is a third to a quarter of the cost at the start.

I still hear CIOs worry that cloud vendor lock-in would let them raise prices. This ruse is used to justify private cloud investments. Even without switching vendors, you will see repeated price reductions for the public cloud systems you are already using. This was the 42^nd price cut for AWS, the argument is ridiculous.

I’ve previously published presentation materials on costoptimization with AWS. I’m researching this area and over the coming months will publish a series of posts on all aspects of cloud optimization.

Tuesday, January 01, 2013

Looking back at 2012, with pointers to 2013

A collection of things that seem to have pivoted in 2012.

Mobile Bandwidth Greater than Fixed Bandwidth

I've been talking about LTE and the growth in mobile since 2008, but I started 2012 with a Verizon iPhone 4 which maxed out at 2Mbit/s over 3G and at home in the mountains I would get less than 1Mbit/s. I ended 2012 with a Verizon iPhone 5 which is about ten times faster at home, I regularly see 8-9Mbits/s, and the best speed I have seen anywhere so far was in downtown Los Gatos at over 50Mbit/s. My home fixed wire Internet is a 3Mbit/s DSL that has neighborhood congestion at peak times. I now find it works better to have WiFi turned off on my iPhone at home.

This is one of those pivotal changes, similar to the change from having predominantly fixed wire telephone service at home, to having many people use mobile phones exclusively. It costs more, but if you already have a high bandwidth connection to your phone with a high data cap because you use it a lot, why pay to also have a low bandwidth connection to your house? Bandwidth caps and data usage plans will slow the switchover, but the writing is on the wall.

Cutting The Cable/Satellite TV Feed

In 2013 we finally turned off our TiVo and shut down our DirecTV account. We weren't using it enough to make it worth while. For some of the sports events (Laurel follows the Stanford Cardinals), we go to a sports bar to watch, which is more fun anyway. Everything else that we have time to watch, we can watch online, and we get all our news updates from Twitter, RSS feeds and Facebook posts. By the time it's on TV or in a newspaper, it's already old news.

The TV has an AppleTV connected to it, which gets almost all the usage. We watch a few things on laptops, and sometimes I connect a laptop to the TV. I also stream music from my iPhone to the AppleTV because I can't get Pandora or Spotify on it. Come on Apple, where's the AppleTV App Store? Maybe that's a 2013 thing.

The Netflix Open Source Cloud Platform Got Traction

We started the year with a handful of disconnected projects, and ended it with a large chunk of the platform on Github, and some high profile users. Most people are still picking it up piecemeal but in 2013 we plan to get the whole thing put together as an installable bundle. This is the Alan Kay approach, "The best way to predict the future is to invent it". Netflix has been out in front of the industry in terms of cloud adoption, inventing the future. Next we make it easier for others to join us in that future, and have some ideas for how to drive adoption to new heights.

Netflix Cloud Architecture Presentations

I was going to list all the talks I gave, but there are too many, so go see the slides I posted at http://www.slideshare.net/adrianco. Highlights were QConSF, QConLondon, Gluecon in Colorado, GOTO in Aarhus and of course AWS Re:Invent in Las Vegas. The impact of these talks grew through the year, reaching a peak at Re:Invent, where we had lots of speakers and attention to the way the Netflix cloud and open source story was bringing value to the company and reaching out into the technical community. A big thanks to everyone who came to my talks, and all the other Netflix speakers who have been out there broadening the story. It's almost impossible to write an article or do a presentation about cloud without mentioning Netflix. In 2013 there will be even more talks, I focus on local and US based events that are strongly developer oriented like QCon, Gluecon, and GOTO. We will definitely be back at AWS Re:Invent next November.

The Concept of Anti-Fragility Took Off

Nassim Taleb's latest book crystallizes the way I tend to approach things and gives it a name. The Netflix cloud architecture is anti-fragile, we run "Chaos Monkey's" continuously to try and break it, and that makes it stronger. The Netflix culture is anti-fragile, it's decentralized with as little process and rules as possible and a lot of local autonomy. Netflix management is not afraid of change or of being first to do something and tends to navigate disruptive transitions well. From the outside this can look chaotic or confusing, but it works, and recovers well from missteps, which are always going to happen. If you're not failing occasionally you aren't trying hard enough, and you are missing opportunities. Getting stronger through failure is the basis of anti-fragility. Avoiding failure at all costs (as many people try to do) makes you brittle and vulnerable to unexpected Black Swan events that will have a much bigger impact.

Cloud, Open Source, SaaS and the End of Enterprise Computing

Taleb makes the point that big companies become increasingly fragile as they lose agility and the ability to move with the markets, and we are seeing that play out in the Enterprise Computing space. There is still money to be made from the late adopter customers, but the trend is clearly towards development using exclusively open source tools, with applications and infrastructure delivered as a service. There is zero revenue for traditional Enterprise Computing vendors in this model. The current interest in building out private cloud infrastructure is real and will continue to support traditional vendors into 2013, but it's a short term investment. At best you end up with a much better automated datacenter, but it isn't elastic and it has far fewer features than AWS, so it's going to be marginalized over time. At worst, you discover just how hard it is to run a reliable private cloud based on immature software, with incompatible upgrade paths, and it turns out to be much more expensive to run.

The Enterprise Computing vendors haven't been able to build a public cloud that competes with AWS on scale, price or features, and AWS is now focused on building everything their customers need to take the next generation of application investment out of the datacenter, so the high margin revenue is going to gradually go away for the traditional vendors.

The most interesting development in 2012 was the re-launch of Google as a public cloud infrastructure vendor, and the mini-price-war between AWS and Google over instance and storage costs makes it clear where the real action is. During 2013 we will see if Google manages to invest heavily and execute well enough to build up a big user base. In mobile, as I predicted years ago, we are now in an iPhone vs. Android battle that is wiping out everyone else. I personally think in 2014 we will likely see a similar effect as the scale, features and price point of AWS and Google clouds make everyone else irrelevant. The only question in my mind is whether AWS runs away with this on their own, or Google manages to get some traction as the alternative.

Note to sales reps (who won't listen), I'm not interested in anything to do with datacenters, private cloud, or other public clouds in 2013. I'm only interested in SaaS apps, things that run on AWS, and interesting open source projects.

Solar Powered Electrics Cars Are For Real Now

We drive our Nissan Leaf all the time, it's fun to drive and the first car we pick for most trips, adding up to almost 1000 miles a month. The marginal cost of running the car is near zero. New tires and a cabin air filter at 15K miles is all the maintenance it needs. We have an excess of solar power generation that added up to $500 in unused electricity over the year. At 10c/KWh and 3.5KWh/mile that's plenty for us run a second electric car before we start paying for the power, and there are a lot more choices coming in 2013. There are many charging stations around the Bay Area, lots of other people running Leafs, and the Tesla Model S got car of the year awards. It takes a test drive to realize what fun it is to have instant torque and no gear shifts. This is a case of the future being unevenly distributed. If you don't live in California, it's a bit further out, but it's coming.

A friend recently got a quote for Solar Power installation which was about half what we paid two years ago, and we got a good deal then. Prices have dropped fast and are much lower than most people think. If you don't already have solar panels on your roof, you should get them. If you don't use enough electricity to justify solar panels, get an electric car as well, and save at the gas pump.

Global Warming Arrived in the USA in 2012

The well funded Merchants of Doubt (read the book) managed to confuse and suppress public discussion of global warming in the USA for the last few years, but the effects just became too obvious this year and it broke through, creating the scenarios that James Hansen warned of in Storms of My Grandchildren. The arctic ice cap melt continues to accelerate, seas are warming and rising, drought and record heat hit most of the USA, and everything wrapped up with Hurricane Sandy, pushing the topic onto the front page. The dice are all loaded now, and 2013 is already rolling those dice as the drought continues and the Mississippi river is empty. I've been saying for the last few years that if you own property at sea level, you should find someone who doesn't believe in global warming to sell it to, because it's going to become increasingly uninsurable and end up as worthless as the houses along the New Jersey shoreline that were swept away.

The Republican party is still in denial, a combination of funding from big oil companies and an inability to accept or admit that their demonized Al Gore could have been right all along. In 2013 it will be interesting to see how they deal with losing the election, and perhaps there will be a split into a group of Republicans that see the path to re-election in 2014 as needing to accept reality by voting for some Global Warming related legislation, versus the hard core that are trying to pray their way out. The current battle is over stopping the Keystone XL pipeline that would move the dirtiest kind of tar oil from Alberta Canada to Texas. It may be symbolic, but if KXL is stopped, the tide will have turned. Carbon needs to be left in the ground. For 2013, I'm going to try and re-balance my 401K retirement accounts to divest from oil companies. Many students are now pressuring their colleges to divest from oil as well.

Twitter and Snapchat

Personally, 2012 was an excellent year for me, I've made lots of new friends and learned a lot by being active on twitter, ending the year with about 6500 followers. I joked on twitter that I posted my new years resolutions for 2013 to Snapchat, but you missed them. If you don't know what Snapchat is for, ask a teenager. You'll probably hear a lot more about it in 2013, then, when their parents figure it out and join too, the teens will be onto the next thing....

Friday, November 16, 2012

Cloud Outage Reports

The detailed summaries of outages from cloud vendors are comprehensive and the response to each highlights many lessons in how to build robust distributed systems. For outages that significantly affected Netflix, the Netflix techblog report gives insight into how to effectively build reliable services on top of AWS. I've included some Google and Azure outages here because they illustrate different failure modes that should be taken into account. Recent AWS and Azure outage reports have far more detail than Google outage reports.

I plan to collect reports here over time, and welcome links to other write-ups of outages and how to survive them. My naming convention is {vendor} {primary scope} {cause}. The scope may be global, a specific region, or a zone in the region. In some cases there are secondary impacts with a wider scope but shorter duration such as regional control planes becoming unavailable for a short time during a zone outage.

This post was written while researching my AWS Re:Invent talk.
Slides: http://www.slideshare.net/AmazonWebServices/arc203-netflixha
Video: http://www.youtube.com/watch?v=dekV3Oq7pH8

November 18th, 2014 - Azure Global Storage Outage

Microsoft Reports

http://azure.microsoft.com/blog/2014/11/19/update-on-azure-storage-service-interruption/

http://azure.microsoft.com/blog/2014/12/17/final-root-cause-analysis-and-improvement-areas-nov-18-azure-storage-service-interruption/

January 10th, 2014 - Dropbox Global Outage

Dropbox Report

https://tech.dropbox.com/2014/01/outage-post-mortem/

April 20th, 2013 - Google Global API Outage

Google Report

http://googledevelopers.blogspot.com/2013/05/google-api-infrastructure-outage.html

February 22nd, 2013 - Azure Global Outage Cert Expiry

Azure Report

http://blogs.msdn.com/b/windowsazure/archive/2013/03/01/details-of-the-february-22nd-2013-windows-azure-storage-disruption.aspx

December 24th, 2012 - AWS US-East Partial Regional ELB State Overwritten

AWS Service Event Report

http://aws.amazon.com/message/680587/

Netflix Techblog Report

http://techblog.netflix.com/2012/12/a-closer-look-at-christmas-eve-outage.html

October 26th, 2012 - Google AppEngine Network Router Overload

Google Outage Report

http://googleappengine.blogspot.com/2012/10/about-todays-app-engine-outage.html

October 22, 2012 - AWS US-East Zone EBS Data Collector Bug

AWS Outage Report

http://aws.amazon.com/message/680342/

Netflix Techblog Report

http://techblog.netflix.com/2012/10/post-mortem-of-october-222012-aws.html

June 29th 2012 - AWS US-East Zone Power Outage During Storm

AWS Outage Report

http://aws.amazon.com/message/67457/

Netflix Techblog Report

http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html

June 13th, 2012 - AWS US-East SimpleDB Region Outage

AWS Outage Report

http://aws.amazon.com/message/65649/

February 29th, 2012 - Microsoft Azure Global Leap-Year Outage

Azure Outage Report

http://blogs.msdn.com/b/windowsazure/archive/2012/03/09/summary-of-windows-azure-service-disruption-on-feb-29th-2012.aspx

August 17th, 2011 - AWS EU-West Zone Power Outage

AWS Outage Report

http://aws.amazon.com/message/2329B7/

April 2011 - AWS US-East Zone EBS Outage

AWS Outage Report

http://aws.amazon.com/message/65648/

Netflix Techblog Report

http://techblog.netflix.com/2011/04/lessons-netflix-learned-from-aws-outage.html

February 24th, 2010 - Google App Engine Power Outage

Google Forum Report

https://groups.google.com/forum/#!topic/google-appengine/p2QKJ0OSLc8

July 20th, 2008 - AWS Global S3 Gossip Protocol Corruption

AWS Outage Report

http://status.aws.amazon.com/s3-20080720.html

Adrian Cockcroft's Blog

Archive

Thursday, April 03, 2014

Public Cloud Instance Pricing Wars - Detailed Context and Analysis

Tuesday, January 01, 2013

Looking back at 2012, with pointers to 2013

Mobile Bandwidth Greater than Fixed Bandwidth

Cutting The Cable/Satellite TV Feed

The Netflix Open Source Cloud Platform Got Traction

Netflix Cloud Architecture Presentations

The Concept of Anti-Fragility Took Off

Cloud, Open Source, SaaS and the End of Enterprise Computing

Solar Powered Electrics Cars Are For Real Now

Global Warming Arrived in the USA in 2012

Twitter and Snapchat

Friday, November 16, 2012

Cloud Outage Reports

November 18th, 2014 - Azure Global Storage Outage

Microsoft Reports

January 10th, 2014 - Dropbox Global Outage

Dropbox Report

April 20th, 2013 - Google Global API Outage

Google Report

February 22nd, 2013 - Azure Global Outage Cert Expiry

Azure Report

December 24th, 2012 - AWS US-East Partial Regional ELB State Overwritten

AWS Service Event Report

Netflix Techblog Report

October 26th, 2012 - Google AppEngine Network Router Overload

Google Outage Report

October 22, 2012 - AWS US-East Zone EBS Data Collector Bug

AWS Outage Report

Netflix Techblog Report

June 29th 2012 - AWS US-East Zone Power Outage During Storm

AWS Outage Report

Netflix Techblog Report

June 13th, 2012 - AWS US-East SimpleDB Region Outage

AWS Outage Report

February 29th, 2012 - Microsoft Azure Global Leap-Year Outage

Azure Outage Report

August 17th, 2011 - AWS EU-West Zone Power Outage

AWS Outage Report

April 2011 - AWS US-East Zone EBS Outage

AWS Outage Report

Netflix Techblog Report

February 24th, 2010 - Google App Engine Power Outage

Google Forum Report

July 20th, 2008 - AWS Global S3 Gossip Protocol Corruption

AWS Outage Report