Tuesday, June 14, 2011

Scalable Datastores


Personal notes on

"10 Rules for Scalable performance in 'Simple Operation' Datastores"
By Michael Stonebraker and Rick Cattell.
From
JUNE 2011 VOL. 54 NO.6 COMMUNICATIONS OF THE ACM 75


The Dominant data storage systems all look about the same today but many new ones are coming.


Systems - Link

Asterdata  http://asterdata.com
BigTable  http://labs.google.com/papers/bigtable.html 
Clustrix  http://clustrix.com
CouchDB  http://couchdb.apache.org
DB2  http://ibm.com/software/data/db2
Dynamo  http://portal.acm.org/citation.cfm?id=1294281
Exadata  http://oracle.com/exadata
Greenplum  http://greenplum.com
HBase  http://hbase.apache.org
HyperTable  http://hypertable.org
MongoDB  http://mongodb.org
MySQL  http://mysql.com/products/enterprise
MySQL Cluster  http://mysql.com/ products/database/cluster
Netezza  http://netezza.com
NimbusDB  http://nimbusdb.com
Oracle  http://oracle.com
Oracle RAC  http://oracle.com/rac
Paraccel  http://paraccel.com
PNUTs  http://research.yahoo.com/pub/2304
PostgreSQL  http://postgresql.org
Riak  http://basho.com/Riak.html
Scalaris  http://code.google.com/p/scalaris
SimpleDB  http://amazon.corn/sirnpledb
SQL Server  http://microsoftcom/sqlserver
Teradata  http://teradata.com
Terrastore  http://code.google.com/p/terrastore
Tokyo Cabinet  http://1978th.net/tokyocabinet
Vertica  http://vertica.com
Voldemort  http://project-voldemort.com
VoltDB  http://voltdb.com


New kinds of stores for simple-operation (SO) databases

Key-value Stores - each object has a key and a payload
Dynamo
Voldemort

Document Stores - objects with a variable number of attributes
CouchDB
MongoDB

Extensible Record Stores
  - variable width record sets, partitioned vertically and horizontally
BigTable
Cassandra

SQL DBMSs - retain SQL and ACID (Atomicity, Consistency, Isolation, Durability
MySQL Cluster


Ten Rules

Look for shared-nothing scalability
High level languages are good and need not hurt performance
Plan to carefully leverage main memory databases
High availability and automatic recovery are essential
Online everything
Avoid multi-node operations
Don't try to build ACID consistency yourself
Look for administrative simplicity
Pay attention to node performance
Open source gives you more control over your future

No comments:

Post a Comment