NoSQL For Dummies – Rise Of Non Relational Database Engines

Apache has recently released Cassandra 0.6 – Large Scale distributed database system formerly maintained by Facebook but currently supported by Apache foundation. Cassandra is popular, which is being used by biggies like Rackspace, Twitter, Digg etc. Is this a threat to MySQL and the like? Cassandra now comes with built-in support for Hadoop (The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing) image

What is NoSQL?

NoSQL (NoREL or Not Only SQL, it is misleading) Movement is catching up and is on the rise to become the most popular emerging next generation database concept in 2010. NoSQL is nothing but rapidly evolving new breed of databases that are clashing with the hoard of traditional relational database systems like MySQL, MS SQL, PostgreSQL etc; they are

  • Non Relational
  • Distributed, Large Scale Databases
  • Horizontally Scalable (More nodes/servers)
  • Open Source
  • Schema Free
  • Eventually Consistent (BASE – Basically Available, Soft state, Eventual consistency)
  • Easy Replication Support
  • Simple API support

Why Non Relational, Distributed Databases?

Relational Database Systems have been around for a while powering many giant e-commerce websites etc. The essence of relational database is non redundancy relations and non redundancy is desired. Database tables are designed in such a way that redundant data is minimized (Normalization). But, this actually becomes a problem for huge database as we need to maintain data redundancy across servers, nodes etc. So, it is not possible to have efficient redundancy and parallelism in relational database systems (at least not trivial). This leads to single point of failure.

image

So, for huge databases running into multiple terabytes, relational database is not good. This is the reason, why Amazon, Google, Facebook started working on Non relational databases. In Distributed databases, information is distributed in a redundant manner across ring of identical computers/nodes or servers. Data will be queried with  key map. This reduces the risk of single point of failure. Data is redundant and stripped across nodes. So changes in one place, eventually will be propagated (asynchronous) to other nodes, thus the name Eventually Consistent.

image

Notable Proprietary implementation of NoSQL

  • Amazon’s Dynamo – Distributed storage system, unlike relational database system, it does not break data in to tables. Instead all objects are stored and looked up via a key map.
  • Google’s BigTable – BigTable is a compressed, high performance database built on Google proprietary platform. BigTable is an extremely large DBMS capable of handling several thousands servers, nodes with several petabytes range of database size.

Notable Open Source implementation of NoSQL

  • Apache’s Cassandra - The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together  Dynamo’s fully distributed design and Bigtable’s  ColumnFamily-based data model.
  • HBase - HBase is the Hadoop database. Use it when you need random, real-time read/write access to your Big Data. This project’s goal is the hosting of very large tables — billions of rows X millions of columns — atop clusters of commodity hardware.
  • Hypertable - Hypertable is an open source project based on published best practices in solving large-scale data-intensive tasks. It tries to bring the benefits of new levels of both performance and scale to many data-driven businesses who are currently limited by previous-generation platforms.

However, it should be noted that all the technology discussed here applies to very large scale database systems. Our usual small scale web systems still continue to use RDMS like MySQL. It is more than sufficient to handle couple GBs of data and non distributed environment.

Via [ReadWriteWeb] and other sources

Related Posts with Thumbnails
  • Pingback: Tweets that mention NoSQL For Dummies – Rise Of Non Relational Database Engines | Open Source Technology Blog -- Topsy.com

  • German

    Hi.

    Maybe you should also mention object databases here, they are also NoREL.
    For example, db4o is mentioned in this blog post about NoSQL:
    http://blog.wekeroad.com/2010/02/06/nosql-a-practical-approach-part-1
    OODBs share many of the features described in the NoSQL movement (although they deal with real objects rather than key/value pairs or other abstractions). Versant’s VOD (http://www.versant.com) for example is very scalable and could be used in many of the scenarios described for Cassandra, MongoDB, etc.

    Best!

    German

    • OpenTube

      Thanks for your time, we shall update in the coming days. There are many basically. Object Databases, Key Value tuple store like Amazon SimpleDB, Chordless, Redis etc, Document Stores like CouchDB, MongoDB etc, Graph databases, Grid Databases and many.

  • David

    “The essence of relational database is non redundancy”. I think not. The essence of the relational database model is the concept of RELATIONS. Non-redundancy is frequently desirable, but that’s also true of most database designs. There is no reason why the relational model should not be used for scalable, distributed databases. In many ways it is better suited for such purposes than graph-based data models for example. NoSQL specifically addresses limitations of the current crop of SQL DBMSs but it does not follow that No SQL must always mean non-relational as well.

    • OpenTube

      Good points David, you are right. Relations are the essence of RDBMS. and yeah, they can definitely be used for large scale or distributed environments. But there are some improvements in these NoSQL implementations. Definitely guys at Amazon or Facebook would have pondered whether column, key map based stuff is really needed or not :) So would not debate on whether or not it is needed! This blog is anyway on MySQL

  • Pingback: links for 2010-05-10 | my own log

  • Nikita

    hey!!!Thanks a lot…really cleared up a lot of concepts..good work:):):)

  • Pingback: No SQL for Dummies