These answers are for the Translattice Application Platform (TAP)'s database component. Unlike other stores that have answered this question set, TAP contains a relational database that scales out over identical nodes. TAP further allows applications written to the J2EE platform to scale out across the same collection of nodes.
Use Case
A TV based device calls the API to add a movie to its favorites list (like the Netflix instant queue, but I have simplified the concept here), then reads back the entire list to ensure it is showing the current state. The API does not use cookies, and the load balancer (Amazon Elastic Load Balancer) is round robin, so the second request goes to a different API server, that happens to be in a different Amazon Availability Zone, and needs to respond with the modified list.
Favorites Storage
Favorites store is implemented using a NoSQL mechanism that persistently stores a single key=user value=movielist record on writes, and returns the movielist on reads.
When an API reads and writes to a queue store using the NoSQL mechanism, is the traffic routing Availability Zone aware? Are reads satisfied locally, or spread over all zones, is the initial write local or spread over the zones, is the write replication zone aware so data is replicated to more than one zone?
In the Translattice Application Platform, data in relational tables is transparently sharded behind the scenes by the data store. These shards are stored redundantly across the nodes. Reads are satisfied with the most local copy of data available on the network, unless that resource is currently overloaded in which case the system may fall back to reads from more distant locations.When it comes to writes, applications have the choice on the durability and isolation levels for changes. Each transaction may be made in a fully synchronous, serializable isolation level, or may be made in a locked eventually consistent mode that provides ACID serializable semantics except that durability may be sacrificed if an availability zone fails. A further asynchronous mode allows potentially conflicting changes to be made and allows a user-provided reconciliation function to decide which change "wins". A final commit requires a majority of nodes storing a shard to be available; in the case of the fully synchronous mode this would delay or prevent the return of success if a critical subset of the cluster fails.Policy mechanisms in the system allow administrators to specify how physical and cloud database instances correspond to administratively-relevant zones. An administrator can choose to require, for instance, that each piece of information is replicated to at least three database nodes across a total of two availability zones. An administrator may also use these mechanisms to require that particular tables or portions of tables must or must not be stored in a given zone (for instance, to meet compliance or security requirements). Within the constraints set by policy, the system tracks usage patterns and places information in the most efficient locations.
Question 2: Partitioned Behavior with Two Zones
If the connection between two zones fails, and a partition occurs so that external traffic coming into and staying within a zone continues to work, but traffic between zones is lost, what happens? In particular, which of these outcomes does the NoSQL service support?
- one zone decides that it is still working for reads and writes but half the size, and the other zone decide it is offline
- both zones continue to satisfy reads, but refuse writes until repaired
- data that has a master copy in the good zone supports read and write, slave copies stop for both read and write
- both zones continue to accept writes, and attempt to reconcile any inconsistency on repair
Assuming that the SQL transaction in question is running in the fully synchronous or eventually consistent locked mode, writes will only be allowed in one of the two zones. Reads will continue in both zones, but will only be able to satisfy requests for which at least one replica of the requested data exists in the local zone (policy can be specified to ensure that this is always the case). In the eventually consistent mode, multiple partitioned portions of the system can accept writes and reconcile later. Essentially, any of the above desired modes can be used on a transaction-by-transaction basis depending on application and performance requirements.
Because fully relational primitives are provided, there can easily be one row in the database per favorite. Read-modify-write of the whole list is not required, and the only practical limits are application-defined.Any SQL queries are supported against the store, and are transformed by the query planner into an efficient plan to execute the query across the distributed system. Of course, how efficient a query is to execute will depend on the structure of the data and the indexes that an administrator has created. We think this allows for considerable flexibility and business agility as the exact access methods that will be used on the data do not need to be fully determined in advance.
Network activity is protected by cryptographic hash authentication, which provides integrity verification as a side benefit. Distributed transactions also take place through a global consensus protocol that uses hashes to ensure that checkpoints are in a consistent state (this is also how the system maintains transactional integrity and consistency when changes cross many shards). Significant portions of the on-disk data are also presently protected by checksums and allow the database to "fail" a disk if corrupt data is read.
A good portion of this relational database's consistency model is implemented through a distributed multi-version concurrency control (MVCC) system. Tuples that are in-use are preserved as the database autovacuum process will not remove tuples until it is guaranteed that no one could be looking at them anymore. This allows a consistent version of the tables as of a point in time to be viewed from within a transaction; so BEGIN TRANSACTION; SELECT ... [or COPY FROM, to backup] ; COMMIT; works. We provide mechanisms to allow database dumps to occur via this type of mechanism.
In the future we are likely to use this mechanism to allow quick snapshots of the entire database and rollbacks to previous snapshot versions (as well as to allow the use of snapshots to stage development versions of application code without affecting production state).