Approaching MongoDB Design - Basic Principles

Introduction 

 
In this article, we will go through some basic principles for building a good design for your MongoDB database. 
 
MongoDB is a NoSQL database that works without imposing any kind of schema. It stores data in a JSON-like format and can contain different kinds of structures. Two MongoDB objects from the same collection can have different fields and components, just like the example below:
  1. {  
  2. person_id: “4”  
  3. name: ”John”  
  4. age: “54”  
  5. addresses : [  
  6.      { street: '123 Church St', city: 'Miami', cc: 'USA' },  
  7.      { street: '123 Mary Av', city: 'Los Angeles', cc: 'USA' }  
  8.   ]  
  9. }  
  10.   
  11.   
  12. {  
  13. person_id: “34”  
  14. name: “Samantha”  
  15. department: “New Business”  
  16. example: “samantha@example.com”  
  17. }  
To get the best out of MongoDB, you have to understand and follow some basic database design principles. Before getting to some tips on design & performance, we have to first understand how MongoDB structures the data.
 
Following some basic database design principles will help you get the best out of MongoDB. But first, let’s just have a quick recap on how MongoDB stores data.
MongoDB stores data in collections, documents, and fields. In the diagram below you can see the difference between the classic way of storing data in Relational Databases and MongoDB.
 
Approaching MongoDB Design - Basic Principles 
Design Principles
 

How do I store data? Normalization vs Denormalization


Understanding the concepts of normalization and denormalization is a key point in building an efficient database.
 
Denormalization - is storing multiple data in a single JSON document. For example, you can have a document for persons in which you also embed the addresses of each person. Denormalization will perform better on reads but will be slower on writes and take up more space.
 
Normalization - is storing data into multiple collections with references between them. For example, storing the persons in a document and the addresses in another document. Normalization defines the data only once making the writes (update) tasks easier. When it comes to reading tasks, normalization has its downsides. If you want to receive data from multiple collections, you have to perform multiple queries making the reads slower.
 
Choosing how to store your data depends on how you’ll use the database:
  • On one hand, if your database doesn’t need regular updates, has small documents that grow slowly in size, immediate consistency on the update is not very important, but you need a good performance on reads, then denormalization may be the smart choice.

  • On the other hand, if your database has large documents with constant updates and you want good performance on writes, then you may want to consider normalization.

“One-to-N” Relationships

 
MongoDB offers more options for modeling “One-to-N” relations than a Relational Database. In the beginning, you can be very attracted to denormalize data by embedding an array of documents into the parent table, but this is not always the best move. As we’ve seen above, understanding when to use the two concepts is the key. Before-starting, everyone should consider the cardinality of the relation. Is it “one-to-few”, “one-to-many” or “one-to-squillions”? Each relationship will have a different modeling approach.
 
For example, below we have a “One-to-few” cardinality example. The best choice here is to embed the N side (addresses) in the parent document (persons),
  1. > db.person.findOne()  
  2. {  
  3.   name: 'Joseph Langer',  
  4.   ssn: '2223-234-75554',  
  5.   addresses : [  
  6.      { street: '123 Church St', city: 'Miami', cc: 'USA' },  
  7.      { street: '123 Mary Av', city: 'Los Angeles', cc: 'USA' }  
  8.   ]  
  9. }  
In a “One-to-many” example, we may consider two collections, the product collection, and the parts collection. Every part will have a reference “ObjectID” that will be present in the product collection:
  1. > db.parts.findOne()
  2. {  
  3.     _id : ObjectID('AAAA'),  
  4.     partno : '1224-dsdf-2215',  
  5.     name : 'bearing',  
  6.     price: 2.63  
  7.   
  8. > db.products.findOne()  
  9. {  
  10.     name : 'wheel',  
  11.     manufacturer : 'Fiat',  
  12.     catalog_number: 2244,  
  13.     parts : [     // array of references to Part documents  
  14.         ObjectID('AAAA'),    // reference to the bearing above  
  15.         ObjectID('F17C'),    // reference to a different Part  
  16.         ObjectID('D2AA'),  
  17.         // etc  
  18.     ]  
  19. }  

Visualize Data & Schema

 
Even though MongoDB is “schemaless”, there are still ways to visualize the collections as diagrams. Being able to view the diagram will have a great impact on how you understand the database and the relations between them. The easiest way is to use ERD tools and draw your database from scratch.
 
Creating virtual foreign keys for MongoDB can be also very helpful for data visualization.

Indexes

 
A good indexing system will also contribute to better performance of your database. Keep in mind that MongoDB has a limit of 32MB in holding documents for a sort operation. If you don’t use indexes, then the database is forced to hold a various amount of documents (depending on your database) while sorting. If MongoDB hits the limitation, then the database will return an error or an empty set.
 

Conclusion

 
A thorough understanding of MongoDB combined with a clear view of what you want to achieve with the database can be the recipe for good database design and best performance.