Showing posts with label cassandra. Show all posts
Showing posts with label cassandra. Show all posts

Saturday, June 16, 2012

Cloud Application Architectures: GOTO Aarhus, Denmark, October

Thanks to an invite via Michael Nygard, I ended up as a track chair for the GOTO Aarhus conference in Denmark in early October. The track subject is Cloud Application Architectures, and we have three speakers lined up. You can register with a discount using the code crof1000.

We are starting out with a talk from the Citrix Cloudstack folks about how to architect applications on open portable clouds. These implement a subset of functionality but have more implementation flexibility than public cloud services. [Randy Bias of Cloudscaling was going to give this talk but had to pull out of the trip due to other commitments].

To broaden our perspective somewhat, and get our hands dirty with real code, the next talk is a live demonstration by Ido Flatow, Building secured, scalable, low-latency web applications with the Windows Azure Platform.
In this session we will construct a secured, durable, scalable, low-latency web application with Windows Azure - Compute, Storage, CDN, ACS, Cache, SQL Azure, Full IIS, and more. This is a no-slides presentation!
Finally I will be giving my latest update on Globally Distributed Cloud Applications at Netflix.
Netflix grew rapidly and moved its streaming video service to the AWS cloud between 2009 and 2010. In 2011 the architecture was extended to use Apache Cassandra as a backend, and the service was internationalized to support Latin America. Early in 2012 Netflix launched in the UK and Ireland, using the the combination of AWS capacity in Ireland and Cassandra to create a truly global backend service. Since then the code that manages and operates the global Netflix platform is being released as a series of open source projects at netflix.github.com (Asgard, Priam etc.). The platform is structured as a large scale PaaS, strongly leveraging advanced features of AWS to deploy many thousands of instances. The platform has primary language support for Java/Tomcat with most management tools built using Groovy/Grails and operations tooling in Python. Continuous integration and deployment tooling leverages Jenkins, Ivy/Gradle, Artifactory. This talk will explain how to build your own custom PaaS on AWS using these components.

There are many other excellent speakers at this event, which is run by the same team as the global series of  QCon conferences, unfortunately, the cloud track runs at at the same time as Michael Nygard and Jez Humble on Continuous Delivery and Continuous Integration, however I'm doing another talk in the NoSQL track, (along with Martin Fowler and Coda Hale). Running Netflix on Cassandra in the Cloud.
Netflix used to be a traditional Datacenter based architecture using a few large Oracle database backends. Now it is one of the largest cloud based architectures, with master copies of all data living in Cassandra. This talk will discuss how we made the transition, how we automated and open sourced Cassandra management for tens of clusters and hundreds of nodes using Priam and Astyanax, backups, archiving and performance and scalability benchmarks.

I'm looking forward to meeting old friends, getting to know some new people, and visiting Denmark for the first time.  See you there!

Monday, March 19, 2012

Ops, DevOps and PaaS (NoOps) at Netflix

There has been a sometimes heated discussion on twitter about the term NoOps recently, and I've been quoted extensively as saying that NoOps is the way developers work at Netflix. However, there are teams at Netflix that do traditional Operations, and teams that do DevOps as well. To try and clarify things I need to explain the history and current practices at Netflix in chunks of more than 140 characters at a time.

When I joined Netflix about five years ago, I managed a development team, building parts of the web site. We also had an operations team who ran the systems in the single datacenter that we deployed our code to. The systems were high end IBM P-series virtualized machines with storage on a virtualized Storage Area Network. The idea was that this was reliable hardware with great operational flexibility so that developers could assume low failure rates and concentrate on building features. In reality we had the usual complaints about how long it took to get new capacity, the lack of consistency across supposedly identical systems, and failures in Oracle, in the SAN and the networks, that took the site down too often for too long.

At that time we had just launched the streaming service, and it was still an experiment, with little content and no TV device support. As we grew streaming over the next few years, we saw that we needed higher availability and more capacity, so we added a second datacenter. This project took far longer than initial estimates, and it was clear that deploying capacity at the scale and rates we were going to need as streaming took off was a skill set that we didn't have in-house. We tried bringing in new ops managers, and new engineers, but they were always overwhelmed by the fire fighting needed to keep the current systems running.

Netflix is a developer oriented culture, from the top down. I sometimes have to remind people that our CEO Reed Hastings was the founder and initial developer of Purify, which anyone developing serious C++ code in the 1990's would have used to find memory leaks and optimize their code. Pure Software merged with Atria and Rational before being swallowed up by IBM. Reed left IBM and formed Netflix. Reed hired a team of very strong software engineers who are now the VPs who run developer engineering for our products. When we were deciding what to do next Reed was directly involved in deciding that we should move to cloud, and even pushing us to build an aggressively cloud optimized architecture based on NoSQL. Part of that decision was to outsource the problems of running large scale infrastructure and building new datacenters to AWS. AWS has far more resources to commit to getting cloud to work and scale, and to building huge datacenters. We could leverage this rather than try to duplicate it at a far smaller scale, with greater certainty of success. So the budget and responsibility for managing AWS and figuring out cloud was given directly to the developer organization, and the ITops organization was left to run its datacenters. In addition, the goal was to keep datacenter capacity flat, while growing the business rapidly by leveraging additional capacity on AWS.

Over the next three years, most of the ITops staff have left and been replaced by a smaller team. Netflix has never had a CIO, but we now have an excellent VP of ITops Mike Kail (@mdkail), who now runs the datacenters. These still support the DVD shipping functions of Netflix USA, and he also runs corporate IT, which is increasingly moving to SaaS applications like Workday. Mike runs a fairly conventional ops team and is usually hiring, so there are sysadmin, database,, storage and network admin positions. The datacenter footprint hasn't increased since 2009, although there have been technology updates, and the over-all size is order-of-magnitude a thousand systems.

As the developer organization started to figure out cloud technologies and build a platform to support running Netflix on AWS, we transferred a few ITops staff into a developer team that formed the core of our DevOps function. They build the Linux based base AMI (Amazon Machine Image) and after a long discussion we decided to leverage developer oriented tools such as Perforce for version control, Ivy for dependencies, Jenkins to automate the build process, Artifactory as the binary repository and to construct a "bakery" that produces complete AMIs that contain all the code for a service. Along with AWS Autoscale Groups this ensured that every instance of a service would be totally identical. Notice that we didn't use the typical DevOps tools Puppet or Chef to create builds at runtime. This is largely because the people making decisions are development managers, who have been burned repeatedly by configuration bugs in systems that were supposed to be identical.

By 2012 the cloud capacity has grown to be order-of-magnitude 10,000 instances, ten times the capacity of the datacenter, running in nine AWS Availability zones (effectively separate datacenters) on the US East and West coast, and in Europe. A handful of DevOps engineers working for Carl Quinn (@cquinn - well known from the Java Posse podcast) are coding and running the build tools and bakery, and updating the base AMI from time to time. Several hundred development engineers use these tools to build code, run it in a test account in AWS, then deploy it to production themselves. They never have to have a meeting with ITops, or file a ticket asking someone from ITops to make a change to a production system, or request extra capacity in advance. They use a web based portal to deploy hundreds of new instances running their new code alongside the old code, put one "canary" instance into traffic, if it looks good the developer flips all the traffic to the new code. If there are any problems they flip the traffic back to the previous version (in seconds) and if it's all running fine, some time later the old instances are automatically removed. This is part of what we call NoOps. The developers used to spend hours a week in meetings with Ops discussing what they needed, figuring out capacity forecasts and writing tickets to request changes for the datacenter. Now they spend seconds doing it themselves in the cloud. Code pushes to the datacenter are rigidly scheduled every two weeks, with emergency pushes in between to fix bugs. Pushes to the cloud are as frequent as each team of developers needs them to be, incremental agile updates several times a week is common, and some teams are working towards several updates a day. Other teams and more mature services update every few weeks or months. There is no central control, the teams are responsible for figuring out their own dependencies and managing AWS security groups that restrict who can talk to who.

Automated deployment is part of the normal process of running in the cloud. The other big issue is what happens if something breaks. Netflix ITops always ran a Network Operations Center (NOC) which was staffed 24x7 with system administrators. They were familiar with the datacenter systems, but had no experience with cloud. If there was a problem, they would start and run a conference call, and get the right people on the call to diagnose and fix the issue. As the Netflix web site and streaming functionality moved to the cloud it became clear that we needed a cloud operations reliability engineering (CORE) team, and that it would be part of the development organization. The CORE team was lucky enough to get Jeremy Edberg (@jedberg - well know from running Reddit) as its initial lead engineer, and also picked up some of the 24x7 shift sysadmins from the original NOC. The CORE team is still staffing up, looking for Site Reliability Engineer skill set, and is the second group of DevOps engineers within Netflix. There is a strong emphasis on building tools too make as much of their processes go away as possible, for example they have no run-books, they develop code instead,

To get themselves out of the loop, the CORE team has built an alert processing gateway. It collects alerts from several different systems, does filtering, has quenching and routing controls (that developers can configure), and automatically routes alerts either to the PagerDuty system (a SaaS application service that manages on call calendars, escalation and alert life cycles) or to a developer team email address. Every developer is responsible for running what they wrote, and the team members take turns to be on call in the PagerDuty rota. Some teams never seem to get calls, and others are more often on the critical path. During a major production outage con call, the CORE team never make changes to production applications, they always call a developer to make the change. The alerts mostly refer to business transaction flows (rather than typical operations oriented Linux level issues) and contain deep links to dashboards and developer oriented Application Performance Management tools like AppDynamics which let developers quickly see where the problem is at the Java method level and what to fix,

The transition from datacenter to cloud also invoked a transition from Oracle, initially to SimpleDB (which AWS runs) and now to Apache Cassandra, which has its own dedicated team. We moved a few Oracle DBAs over from the ITops team and they have become experts in helping developers figure out how to translate their previous experience in relational schemas into Cassandra key spaces and column families. We have a few key development engineers who are working on the Cassandra code itself (an open source Java distributed systems toolkit), adding features that we need, tuning performance and testing new versions. We have three key open source projects from this team available on github.com/Netflix. Astyanax is a client library for Java applications to talk to Cassandra, CassJmeter is a Jmeter plugin for automated benchmarking and regression testing of Cassandra, and Priam provides automated operation of Cassandra including creating, growing and shrinking Cassandra clusters, and performing full and incremental backups and restores. Priam is also written in Java. Finally we have three DevOps engineers maintaining about 55 Cassandra clusters (including many that span the US and Europe), a total of 600 or so instances. They have developed automation for rolling upgrades to new versions, and sequencing compaction and repair operations. We are still developing our Cassandra tools and skill sets, and are looking for a manager to lead this critical technology, as well as additional engineers. Individual Cassandra clusters are automatically created by Priam, and it's trivial for a developer to create their own cluster of any size without assistance (NoOps again). We have found that the first attempts to produce schemas for Cassandra use cases tend to cause problems for engineers who are new to the technology, but with some familiarity and assistance from the Cloud Database Engineering team, we are starting to develop better common patterns to work to, and are extending the Astyanax client to avoid common problems.

In summary, Netflix stil does Ops to run its datacenter DVD business. we have a small number of DevOps engineers embedded in the development organization who are building and extending automation for our PaaS, and we have hundreds of developers using NoOps to get their code and datastores deployed in our PaaS and to get notified directly when something goes wrong. We have built tooling that removes many of the operations tasks completely from the developer, and which makes the remaining tasks quick and self service. There is no ops organization involved in running our cloud, no need for the developers to interact with ops people to get things done, and less time spent actually doing ops tasks than developers would spend explaining what needed to be done to someone else. I think that's different to the way most DevOps places run, but its similar to other PaaS enviroments, so it needs it's own name, NoOps. [Update: the DevOps community argues that although it's different, it's really just a more advanced end state for DevOps, so lets just call it PaaS for now, and work on a better definition of DevOps].

Thursday, January 19, 2012

Thoughts on SimpleDB, DynamoDB and Cassandra

I've been getting a lot of questions about DynamoDB, and these are my personal thoughts on the product, and how it fits into cloud architectures.

I'm excited to see the release of DynamoDB, it's a very impressive product with great performance characteristics. It should be the first option for startups and new cloud projects on AWS. I also think it marks the turning point on solid state disks, they will be the default for new database products and benchmarks going forward.

There are a few use cases where SimpleDB may still be useful, but DynamoDB replaces it in almost all cases. I've talked about the history of Netflix use of SimpleDB before, but it's relevant to the discussion on DynamoDB, so here goes.

When Netflix was looking at moving to cloud about three years ago we had an internal debate about how to handle storage on AWS. There were strong proponents for using MySQL, SimpleDB was fairly new, and other alternatives were nascent NoSQL projects or expensive enterprise software. We started some pathfinder projects to explore the two alternatives and decided to port an existing MySQL app to AWS, while building a replication pipeline that copied data out of our Oracle datacenter systems into SimpleDB to be consumed in the cloud. The MySQL experience showed that we would have trouble scaling, and SimpleDB seemed reliable, so we went ahead and kept building more data sources on SimpleDB, with large blobs of data on S3.

Along the way we put memcached in front of SimpleDB and S3 to improve read latency. The durability of SimpleDB is its strongest point, we have had Oracle and SAN data corruption bugs in the Datacenter over the last few years, but never lost or corrupted any SimpleDB data. The limitations of SimpleDB are its biggest problem. We worked around limits on table size, row and attribute size, and per-request overhead caused by http requests needing to be authenticated for every call.

So the lesson here is that for a first step into NoSQL, we went with a hosted solution so that we didn't have to build a team of experts to run it, and we didn't have to decide in advance how much scale we needed. Starting again from scratch today, I would probably go with DynamoDB. It's a low "friction" and developer friendly solution.

Late in 2010 we were planning the next step, turning off Oracle and making the cloud the master copy of the data. One big problem is that our backups and archives were based on Oracle, and there was no way to take a snapshot or incremental backup of SimpleDB. The only way to get data out in bulk is to run "SELECT * FROM table" over HTTP and page the requests. This adds load, takes too long, and costs a lot because SimpleDB charges for the time taken in SELECT calls.

We looked at about twenty NoSQL options during 2010, trying to understand what the differences were, and eventually settled on Cassandra as our candidate for prototyping. In a week or so, we had ported a major data source from SimpleDB to Cassandra and were getting used to the new architecture, running benchmarks etc. We evaluated some other options, but decided to take Cassandra to the next stage and develop a production data source using it.

The things we liked about Cassandra were that it is written in Java (we have a building full of Java engineers), it is packed full of state of the art distributed systems algorithms, we liked the code quality, we could get commercial support from Datastax, it is scalable and as an extra bonus it had multi-region support. What we didn't like so much was that we had to staff a team to own running Cassandra for us, but we retrained some DBAs and hired some good engineers including Vijay Parthasarathy, who had worked on the multi-region Cassandra development at Cisco Webex and who recently became a core committer on the Apache Cassandra project. We also struggled with the Hector client library, and have written our own (which we plan to release soon). The blessing and a curse of Cassandra is that it is an amazingly fast moving project. New versions come fast and furiously, which makes it hard to pick a version to stabilize on, however the changes we make turn up in the mainstream releases after a few weeks. Saying "Cassandra doesn't do X" is more of a challenge than a statement. If we need "X" we work with the rest of the Apache project to add it.

Throughout 2010 the product teams at Netflix gradually moved their backend data sources to Cassandra, we worked out the automation we needed to do easy self service deployments and ended up with a large number of clusters. In preparation for the UK launch we also made changes to Cassandra to better support multi-region deployment on EC2, and we are currently running several Cassandra clusters that span the US and European AWS regions.

Now that DynamoDB has been released, the obvious question is whether Netflix has any plans to use it. The short answer is no, because it's a subset of the Cassandra functionality that we depend on. However that doesn't detract from the major step forward from SimpleDB in performance, scalability and latency. For new customers, or people who have outgrown the scalability of MySQL or MongoDB, DynamoDB is an excellent starting point for data sources on AWS. The advantages of zero administration combined with the performance and scalability of a solid state disk backend are compelling.

Personally my main disappointment with DynamoDB is that it doesn't have any snapshot or incremental backup capability. The AWS answer is that you can extract data into EMR then store it in S3. This is basically the same answer as SimpleDB, it's a full table scan data extraction (which takes too long and costs too much and isn't incremental). The mechanism we built for Cassandra leverages the way that Cassandra writes immutable files to get a callback and compress/copy them to S3 as they are written, it's extremely low overhead. If we corrupt our data with a code bug and need to roll back, or take a backup in production and restore in test, we have all the files archived in S3.

One argument against DynamoDB is that DynamoDB is on AWS only, so customers could get locked in, however it's easy to upgrade applications from SimpleDB, to DynamoDB and to Cassandra. They have similar schema models, consistency options and availability models. It's harder to go backwards, because Cassandra has more features and fewer restrictions. Porting between NoSQL data stores is trivial compared to porting between relational databases, due to the complexity of the SQL language dialects and features and the simplicity of the NoSQL offerings. Starting out on DynamoDB then switching to Cassandra when you need more direct control over the installation or Cassandra specific features like multi-region support is a very viable strategy.

As early adopters we have had to do a lot more pioneering engineering work than more recent cloud converts. Along the way we have leveraged AWS heavily to accelerate our own development, and built a lot of automation around Cassandra. While SimpleDB has been a minor player in the NoSQL world DynamoDB is going to have a much bigger impact. Cassandra has matured and got easier to use and deploy over the last year but it doesn't scale down as far. By that I mean a single developer in a startup can start coding against DynamoDB without needing any support and with low and incremental costs. The smallest Cassandra cluster we run is six m1.xlarge instances spread over three zones with triple replication.

I've been saying for a while that 2012 is the year that NoSQL goes mainstream, DynamoDB is another major step in validating that move. The canonical CEO to CIO conversation is moving from 2010: "what's our cloud strategy?", 2011: "what's our big data strategy?" to 2012: "what's our NoSQL strategy?".

Wednesday, March 02, 2011

Maslow's Hierarchy of NoSQL Reads (and Writes)

I tried out Prezi to create this presentation, it was more fun to create than powerpoint but has a few limitations. I would like better drawing tools. The talk itself puts some of the main NoSQL solutions into context.



Friday, October 29, 2010

NoSQL Netflix Use Case Comparison for Cassandra

Jonathan Ellis @spyced of Riptano kindly provided a set of answers that I have interspersed with the questions below.

The original set of questions are posted here. Each NoSQL contender will get their own blog post with answers, when there are enough to be interesting, I will write some summary comparisons.

If you have answers or would like to suggest additional questions, comment here, tweet me @adrianco or blog it yourself.

Use Case Scenario for Comparison Across NoSQL Contenders
While each NoSQL contender has different strengths and will be used for different things, we need a basis for comparison across them, so that we understand the differences in behavior. Here is a sample scenario that I am publishing to put to each vendor to get their answers and will post the results here. The example is non-trivial and is based on a simplified Netflix related scenario that is applicable to any web service that reliably collects data from users via an API. I assume that is running on AWS and use that terminology, but the concepts are generic.

Use Case
A TV based device calls the API to add a movie to its favorites list (like the Netflix instant queue, but I have simplified the concept here), then reads back the entire list to ensure it is showing the current state. The API does not use cookies, and the load balancer (Amazon Elastic Load Balancer) is round robin, so the second request goes to a different API server, that happens to be in a different Amazon Availability Zone, and needs to respond with the modified list.

Favorites Storage
Favorites store is implemented using a NoSQL mechanism that persistently stores a single key=user value=movielist record on writes, and returns the movielist on reads.
The most natural way to model per-user favorites in Cassandra is to have one row per user, keyed by the userid, whose column names are movie IDs. The combination of allowing dynamic column creation within a row and allowing very large rows (up to 2 billion columns in 0.7) means that you can treat a row as a list or map, which is a natural fit here. Performance will be excellent since columns can be added or modified without needing to read the row first. (This is one reason why thinking of Cassandra as a key/value store, even before we added secondary indexes, was not really correct.)

The best introduction to Cassandra data modeling is Max Grinev's series on
basics, translating SQL concepts, and idempotence.

Question 1: Availability Zones
When an API reads and writes to a queue store using the NoSQL mechanism, is the traffic routing Availability Zone aware? Are reads satisfied locally, or spread over all zones, is the initial write local or spread over the zones, is the write replication zone aware so data is replicated to more than one zone?
Briefly, both reads and writes have a ConsistencyLevel parameter controlling how many replicas across how many zones must reply for the request to succeed. Routing is aware of current response times as well as network topology, so given an appropriate ConsistencyLevel, reads can be routed around temporarily slow nodes.

On writes, the coordinator node (the one the client sent the request to) will send the write to all replicas; as soon as enough success messages come back to satisfy the desired consistency level, the coordinator will report success to the client.

For more on consistency levels, see
Ben Black's excellent presentation.
Question 2: Partitioned Behavior with Two Zones
If the connection between two zones fails, and a partition occurs so that external traffic coming into and staying within a zone continues to work, but traffic between zones is lost, what happens? In particular, which of these outcomes does the NoSQL service support?
  • one zone decides that it is still working for reads and writes but half the size, and the other zone decide it is offline
  • both zones continue to satisfy reads, but refuse writes until repaired
  • data that has a master copy in the good zone supports read and write, slave copies stop for both read and write
  • both zones continue to accept writes, and attempt to reconcile any inconsistency on repair
Cassandra has no 'master copy' for any piece of data; all copies are equal. The other behaviors are supported by different ConsistencyLevel values for reads (R) and writes (W):

R=QUORUM, W=QUORUM: One zone decides that it is still working for reads and writes, and the other zone decides it is offline
R=ONE, W=ALL: Both zones continue to satisfy reads, but refuse writes
R=ONE, W=ONE: Both zones continue to accept writes, and reconcile any inconsistencies when the partition heals

I would also note that reconciliation is timestamp-based at the column level, meaning that updates to different columns within a row will never conflict, but when writes have been allowed in two partitions to the same column, the highest timestamp will win. (This is another way Cassandra differs from key/value stores, which need more complex logic called vector clocks to be able to merge updates to different logical components of a value.)
Question 3: Appending a movie to the favorites list
If an update is performed by read-modify-write of the entire list, what mechanisms can be used to avoid race conditions? If multiple attribute/values are supported for a key, can an additional value be written directly without reading first? What limits exist on the size of the value or number of attribute/values, and are queries by attribute/value supported?
Cassandra's ColumnFamily model generally obviates the need for a read before a write, e.g., as above using movie IDs as column names. (If you wanted to allow duplicates in the list for some reason, you would generally use a UUID as the column name on insert instead of the movie ID.)

The maximum value size is 2GB although in practice we recommend using 8MB as a more practical maximum. Splitting a larger blob up across multiple columns is straightforward given the dynamic ColumnFamily design. The maximum row size is 2 billion columns. Queries by attribute value are supported with secondary indexes in 0.7.
Question 4: Handling Silent Data Corruption
When the storage or network subsystem corrupts data without raising an error, does the NoSQL service detect and correct this? When is it detected and corrected, on write, on read or asynchronously?
Cassandra handles repairing corruption the same way it does other data inconsistencies, with read repair and anti-entropy repair.
Question 5: Backup and Restore
Without stopping incoming requests, how can a point in time backup of the entire dataset be performed? What is the performance and availability impact during the backup? For cases such as roll-back after a buggy application code push, how is a known good version of the dataset restored, how is it made consistent, and what is the performance and availability impact during the restore? Are there any scalability limits on the backed up dataset size, what's the biggest you have seen?
Because Cassandra's data files are immutable once written, creating a point-in-time snapshot is as simple as hard-linking the current set of sstables on the filesystem. Performance impact is negligible since hard links are so lightweight. Rolling back simply consists of moving a set of snapshotted files into the live data directory. The snapshot is as consistent as your ConsistencyLevel makes it: any write visible to readers at a given ConsistencyLevel before the snapshot will be readable from the snapshot after restore. The only scalability problem with snapshot management is that past a few TB, it becomes impractical to try to manage snapshots centrally; most companies leave them distributed across the nodes that created them.

Wednesday, October 27, 2010

Comparing NoSQL Availability Models

let's risk feeding the CAP trolls, and try to get some insight into the differences between the many NoSQL contenders. I have circulated an earlier version of this to a few people and got at least one good response. If you have answers, or would like to suggest additional questions, comment here, tweet me @adrianco or blog it yourself.


Use Case Scenario for Comparison Across NoSQL Contenders
While each NoSQL contender has different strengths and will be used for different things, we need a basis for comparison across them, so that we understand the differences in behavior. Here is a sample scenario that I am publishing to put to each vendor to get their answers and will post the results here. The example is non-trivial and is based on a simplified Netflix related scenario that is applicable to any web service that reliably collects data from users via an API. I assume that is running on AWS and use that terminology, but the concepts are generic.

Use Case
A TV based device calls the API to add a movie to its favorites list (like the Netflix instant queue, but I have simplified the concept here), then reads back the entire list to ensure it is showing the current state. The API does not use cookies, and the load balancer (Amazon Elastic Load Balancer) is round robin, so the second request goes to a different API server, that happens to be in a different Amazon Availability Zone, and needs to respond with the modified list.

Favorites Storage
Favorites store is implemented using a NoSQL mechanism that persistently stores a single key=user value=movielist record on writes, and returns the movielist on reads.

Question 1: Availability Zones
When an API reads and writes to a queue store using the NoSQL mechanism, is the traffic routing Availability Zone aware? Are reads satisfied locally, or spread over all zones, is the initial write local or spread over the zones, is the write replication zone aware so data is replicated to more than one zone? 

Question 2: Partitioned Behavior with Two Zones
If the connection between two zones fails, and a partition occurs so that external traffic coming into and staying within a zone continues to work, but traffic between zones is lost, what happens? In particular, which of these outcomes does the NoSQL service support?
  • one zone decides that it is still working for reads and writes but half the size, and the other zone decide it is offline
  • both zones continue to satisfy reads, but refuse writes until repaired
  • data that has a master copy in the good zone supports read and write, slave copies stop for both read and write
  • both zones continue to accept writes, and attempt to reconcile any inconsistency on repair
Question 3: Appending a movie to the favorites list
If an update is performed by read-modify-write of the entire list, what mechanisms can be used to avoid race conditions? If multiple attribute/values are supported for a key, can an additional value be written directly without reading first? What limits exist on the size of the value or number of attribute/values, and are queries by attribute/value supported?

Question 4: Handling Silent Data Corruption
When the storage or network subsystem corrupts data without raising an error, does the NoSQL service detect and correct this? When is it detected and corrected, on write, on read or asynchronously?

Question 5: Backup and Restore
Without stopping incoming requests, how can a point in time backup of the entire dataset be performed? What is the performance and availability impact during the backup? For cases such as roll-back after a buggy application code push, how is a known good version of the dataset restored, how is it made consistent, and what is the performance and availability impact during the restore? Are there any scalability limits on the backed up dataset size, what's the biggest you have seen?