Skip to Navigation

MongoDB (NoSQL): An Architectural Overview

MongoDB is one of the forerunners in the NoSQL movement, an effort to promote non-relational, schema-free data stores. It lacks any table JOINs, which avoids performance bottlenecks seen with traditional SQL servers. 

Why NoSQL?

NoSQL servers are not meant to replace traditional SQL servers. They are meant to handle problems without heavy transactional requirements, but with the potential to massively scale if needed. They work well for providing quick access to large number of documents, serving pages on high-traffic web sites, or delivering streaming media.

Document Storage: Database > Collection > Document

MongoDB is a document database. Documents are stored in collections, and collections are in turn stored in a database. A collection is similar to a table in MySQL (it's a named group of documents), but a collection lacks any schema. 

Documents

A document contains the actual data, and is stored as a binary JSON object (called BSON). A single document has a storage limit of 4 MB. Queries are expressed as JSON-style objects, making it pretty painless to save and retrieve data:

var data = {first_name: "John", last_name: "Doe", lottery: [42, 16, 29]};
db.users.save(data);
 

Files can be stored by MongoDB by using the GridFS specification. Since a document can be a maximum of 4 MB, GridFS works by splitting a large file into (usually 256k) chunks before storing it into a files collection. Unfortunately, MongoDB doesn't do any sort of automatic cleanup in the event of a processing error (your collection would be stuck with fragments of the corrupted file). 

Horizontal Scalability

MongoDB uses (auto) sharding, not replication, as a way of achieving high scalability. Sharding essentially involves "breaking your database down into smaller chunks called shards and spreading those across a number of distributed servers. With Mongo, we tend to think of replication as a way to gain reliability/failover rather than scalability. (See this article for more.)

Performance

  • There's a client driver per language
    • CouchDB uses REST
  • Documents have a maximum size of 4MB
    • This is not changeable
  • Memory-mapped files for data storage
    • This means that data is limited to around 2GB on 32-bit systems
    • MongoDB stores as much data in RAM as possible
  • Update-in-place (instead of MVCC, as in CouchDB)
  • Written in C++ 

Stay tuned for my next post, which will cover the creation of a simple web app using Mongo and the PHP driver.