MongoDB - Day 18 (Data Models)

Before reading this article, I highly recommend reading the following previous parts of the series:

MongoDB contains flexible schema for data storage. Collections don’t enforce document’s structure. MongoDB provides several types of data-modeling. We can choose any data-model for our application that match our application and it’s performance requirement.

Introduction to Data Modeling

In RDBMS we must define the structure(schema) of tables but in MongoDB we don’t require to define the schema of collections. MongoDB contains flexible schema. Collection in MongoDB doesn’t enforce the documents structure. Dynamic schema makes MongoDB development faster. But it is only one side of coin, another side is that dynamic schema required high data balancing of application and performance characteristics of database engine. During designing of data model always consider the application usage of data such as queries, update and processing of data as well as structure of data itself.

Document Structure

Main decision in designing a data model for MongoDB application is that what will be the structure of documents and how application represent the relationship between data. There are two tools in MongoDB that represents the relationship between data.

  • Embedded Documents
  • References

Embedded Documents

Embedded documents maintain relationship between data by storing the related data in a single document structure. In MongoDB we can embed a documents structure either in the form of a single field or in the form of array within the document. Main advantage of embedded document is that it allows application to retrieve and manipulate related data within a single query.

Documents

Embedded data model allow to store related information in same document in the form of embedded documents. Due to embedded documents MongoDB required fewer query to retrieve or manipulate the data. Embedded data model make it possible to update the related data in a single atomic query. Embedded documents are generally used for one to one and one to many relationship.

References

References maintain relationship between data by including links or references from one document to another document. MongoDB or application can use these links or references to retrieve the related data.

References maintain relationship

Embedded documents contain duplicate data but references doesn’t contain duplicate data. References provide sufficient performance for read operations. References implement many to many relationship.

Data Model Examples and Patterns

MongoDB provides various data modeling patterns and common design considerations. MongoDB provides the following data models.

  • Model Relationship Between Documents
  • Model Tree Structure

Each data model provides a specific structure. We can select any data model as per the requirement of applications performance.

Model Relationship Between Documents

Model relationship represent relationship between data using either embedded documents or references. The following are the types of model relationship between documents.

  • Model One-to-One Relationship with Embedded Documents
  • Model One-to-Many Relationship with Embedded Documents
  • Model One-to-Many Relationship with Documents References

Model One-to-One Relationship with Embedded Documents: In One-to-One relational model relationship between documents is represented in the form of embedded documents.

Example:

We take an example of state and district relationship. With respect to each district one and only state will exist. In this example we make relationship using referencing and embedding model and describe the advantages of embedding over relational in case of One-to-One relational model.

Firstly consider the relational model using references.

references

Now we take an example of embedded model:

take an example of embedded model

After observation of both the examples, we can see that embedded model require extra space compared to reference model. But if our application frequently retrieve district with state name then referencing requires some extra query to retrieve data. For One-to-One relational model embedded documents are a better choice because we can retrieve and update complete record in a single query.

Model One-to-Many Relationship with Embedded Documents:
In One-to-Many relationship with Embedded Documents relationship between documents is represented in the form of embedded documents.

Example:

We take an example of state and district relationship. A state contains multiple district. In this example we make relationship using referencing and embedding model and describe the advantages of embedding over referencing in case of One-to-Many relational model.

Firstly, consider the relational model using references.

consider the relational model

Now we take an example of embedded model:

example of embedded model

If our application frequently retrieves the district data with state name information and we are using reference model for relationship, then application requires multiple queries to retrieve or update documents. Instead of reference model we can use embedded model, using embedded model application can retrieve complete data in a single query.

Model One-to-Many Relationship with Documents References: Embedded model is not suitable for One-to-Many relationship every time. In some case we use reference model instead of embedded model.

Example:

We take an example of book and publisher. A publisher published multiple books. For this example we make relationship using referencing and embedding model and describe the advantages of referencing model over embedded in case of One-to-Many relational model.

Firstly, consider the relational model for embedded model.

embedded model

Now consider the relational model for reference model.

If number of books per publisher is small and bounded, then we can store books references into publisher’s document like the following:

reference model

If number of books per publisher is larger and unbounded then it will generate a mutable array. So in such type of case we store the reference of publisher into books documents.

reference of publisher

If a document contains multiple embedded documents and size of each embedded documents is larger than each document, it will occupy larger space and increase the system cost. So in such type of scenario we should use reference instead of embedded relational model.

Model Tree Structure

In model tree structure, MongoDB store documents in a tree node and allows various ways to use tree data structure to implement large hierarchical or nested data relationship. MongoDB provides the following methods to implement the model tree structure.

  • Parent Reference
  • Child Reference
  • Array of Ancestor
  • Materialized Paths
  • Nested Sets

We use above model tree structure models for the following tree structure.

model tree structure

Now we will observe the above maintained five methods for this tree. We use “Map_Information” collection for above tree.

Parent Reference

In parent reference pattern approach each node of tree will store in document and each node contain the reference of parent node. The Parent links pattern provides a simple solution to tree storage but requires multiple queries to retrieve sub trees.

We store each node of above tree as a document of “Map_Information” collection using parent reference method. When we create a collection for above tree, it will be like the following:

  1. { _id : "India", Type : "Country", Parent : null }    
  2. { _id : "Rajasthan", Type : "State", Parent : null }    
  3. { _id : "Haryana", Type : "State", Parent : "India" }    
  4. { _id : "Jaipur", Type : "District", Parent : "Rajasthan" }    
  5. { _id : "Alwar", Type : "District", Parent : "Rajasthan" }    
  6. { _id : "Chandigarh", Type : "District", Parent : "Haryana" }    
  7. { _id : "Rajgarh", Type : "City", Parent : "Alwar" }    
  8. { _id : "Bhiwadi", Type : "City", Parent : "Alwar" }   
Consider some queries:

 

    Query: Find parent of “Haryana”.

    1. db.Map_Information.findOne({_id:"Haryana"}).Parent   
    Output:

    India

    Query: Find children of “Alwar”.
    1. db.Map_Information.find({Parent:"Alwar"})   
    Output:

    { _id : "Rajgarh", Type : "City", Parent : "Alwar" }
    { _id : "Bhiwadi", Type : "City", Parent : "Alwar" }

     


Child Reference

In child reference pattern approach each node of tree will store in document and each node contain the reference of child node in form of array. This pattern provides a suitable solution for graph where a node may have multiple parents. In children reference method we don’t require operations on sub trees and provide a suitable solution for tree storage. We store each node of above tree as a document of “Map_Information” collection using child reference method. When we create a collection for above tree, it will be like the following:
  1. { _id : "India", Type : "Country", Children : [ "Haryana""Rajasthan" ] }    
  2. { _id : "Rajasthan", Type : "State", Children : [ "Jaipur""Alwar" ] }    
  3. { _id : "Haryana", Type : "State", Children : [ "Chandigarh" ] }    
  4. { _id : "Jaipur", Type : "District", Children : [ ] }    
  5. { _id : "Chandigarh", Type : "District", Children : [ ] }    
  6. { _id : "Alwar", Type : "District", Children : [ "Rajgarh""Bhiwadi" ] }    
  7. { _id : "Rajgarh", Type : "City", Children : [ ] }    
  8. { _id : "Bhiwadi", Type : "City", Children : [ ] }    
Consider some queries:

    Query: Find children of “Rajasthan” state.

    1. db.Map_Information.find({_id:"Rajasthan"})

    Output:

    { _id : "Rajasthan", Type : "State", Children : [ "Jaipur", "Alwar" ] }

    Query: Find Parent of “Bhiwadi”.

    1. db.Map_Information.find({Children:"Bhiwadi"},{_id:1})   
    Output:

    { _id : "Alwar" }

    We can create an index on the Children field to enable fast search.
    1. db.Map_Information.createIndex({Children:-1})  

     

Array of Ancestor

In array of ancestor method nodes are stored in documents and each node also contain parents id and array that store ancestor. Array of ancestor is most useful pattern for tree storage, because we can find ancestor and descendant of a node. Using index on “Ancestor” field we can retrieve fast result.

The “Map_Information” collections will look as in the following snippet for Array ancestor method.

  1. { _id : "India", Type : "Country", Ancestor : [ ], Parent : null }    
  2. { _id : "Rajasthan", Type : "State", Ancestor : [ "India" ], Parent : "India" }    
  3. { _id : "Haryana", Type : "State", Ancestor : [ "India" ], Parent : "India" }    
  4. { _id : "Jaipur", Type : "District", Ancestor : [ "India""Rajasthan" ], Parent : "Rajastha    
  5. n" }    
  6. { _id : "Alwar", Type : "District", Ancestor : [ "India""Rajasthan" ], Parent : "Rajasthan    
  7. " }    
  8. { _id : "Chandigarh", Type : "District", Ancestor : [ "India""Haryana" ], Parent : "Haryan    
  9. a" }    
  10. { _id : "Rajgarh", Type : "District", Ancestor : [ "India""Rajasthan""Alwar" ], Parent :    
  11. "Alwar" }    
  12. { _id : "Bhiwadi", Type : "District", Ancestor : [ "India""Rajasthan""Alwar" ], Parent :    
  13. "Alwar" }   
Consider some queries:

 

           Query: Find ancestors of “Alwar”.
    1. db.Map_Information.find({_id:"Alwar"},{Ancestor:1})   
    Output:
    1. { _id : "Alwar", Ancestor : [ "India""Rajasthan" ] }   
    Query: find descendants of “Rajasthan”.
    1. db.Map_Information.find({Ancestor:"Rajasthan"},{_id:1})   
    Output:

    { _id : "Jaipur" }
    { _id : "Alwar" }
    { _id : "Rajgarh" }
    { _id : "Bhiwadi" }


    We can create index on Ancestor field to retrieve fast result.
    1. db.Map_Information.createIndex({Ancestor:1})  
Materialized Paths

In materialized paths methods nodes of a tree are stored in documents and each documents also contain a string of ancestors. In array of ancestor method we use an array to store ancestors of a node but in materialized method we use a string. Benefits of string is that we can use regular expression for string that also provides more flexibility in working with the path, such as finding nodes by partial paths. The Array of Ancestors pattern is slightly slower than the Materialized Paths but more straight forward to use.

The “Map_Information” collections will look like the following for materialized method.
  1. { _id : "India""Path" : null }    
  2. { _id : "Rajasthan""Path" : ",India," }    
  3. { _id : "Haryana""Path" : ",India," }    
  4. { _id : "Jaipur""Path" : ",India,Rajasthan," }    
  5. { _id : "Alwar""Path" : ",India,Rajasthan," }    
  6. { _id : "Chandigarh""Path" : ",India,Haryana," }    
  7. { _id : "Rajgarh""Path" : ",India,Rajasthan,Alwar," }    
  8. { _id : "Bhiwadi""Path" : ",India,Rajasthan,Alwar," }   
Consider some queries:

    Query: find descendent of “Rajasthan”.

    1. db.Map_Information.find({Path:/,Rajasthan,/})   
    Output:

    { _id : "Jaipur", "Path" : ",India,Rajasthan," }
    { _id : "Alwar", "Path" : ",India,Rajasthan," }
    { _id : "Rajgarh", "Path" : ",India,Rajasthan,Alwar," }
    { _id : "Bhiwadi","Path" :",India,Rajasthan,Alwar," }


    Query: Find descendent of India where India is at the top most element of hierarchy.
    1. db.Map_Information.find({Path:/^,India,/})   
    Output:

    { _id : "Rajasthan", "Path" : ",India," }
    { _id : "Haryana", "Path" : ",India," }
    { _id : "Jaipur", "Path" : ",India,Rajasthan," }
    { _id : "Alwar", "Path" : ",India,Rajasthan," }
    { _id : "Chandigarh", "Path" : ",India,Haryana," }
    { _id : "Rajgarh", "Path" : ",India,Rajasthan,Alwar," }
    { _id : "Bhiwadi", "Path" : ",India,Rajasthan,Alwar," }


    We can create index on Path field to retrieve faster result.
    1. db.Map_Information.createIndex({Path:-1})  

     

Nested Sets

In Nested Sets pattern we perform a round trip traversal of the tree and each node is treated as a stop in this traversal. Each node travels two times during traversal. First time during initial trip and second time return trip. Value of initial stop is mentioned on the left side and value of return stop is mentioned on the right side of node.

In Nested Sets pattern each node is stored in document, these documents also contain id of parent node, value of initial stop and value of return stop.

If we create Nested Sets tree for above tree then it will be like the following:

create Nested Sets

The “Map_Information” collections will be as in the following for Nested Sets Method.

  1. { _id : "India", Parent : 0, "Left" : 1, "Right" : 16 }    
  2. { _id : "Rajasthan", Parent : "India""Left" : 2, "Right" : 11 }     
  3.     
  4.     
  5. { _id : "Haryana", Parent : "India""Left" : 12, "Right" : 15 }     
  6. { _id : "Jaipur", Parent : "Rajasthan""Left" : 3, "Right" : 4 }    
  7. { _id : "Alwar", Parent : "Rajasthan""Left" : 5, "Right" : 10 }     
  8. { _id : "Chandigarh", Parent : "Haryana""Left" : 13, "Right" : 14 }    
  9. { _id : "Rajgarh", Parent : "Alwar""Left" : 6, "Right" : 7 }    
  10. { _id : "Bhiwadi", Parent : "Alwar""Left" : 8, "Right" : 9 }   
Consider some queries:

 

    Query: Find descendent of “India”,

    1. var ParentNode=db.Map_Information.findOne({_id:"India"})    
    2. db.Map_Information.find({Left:{$gt:ParentNode.Left},Right:{$lt:ParentNode.Right}},{_id:1})   
    Output:

    { _id : "Rajasthan" }
    { _id : "Haryana" }
    { _id : "Jaipur" }
    { _id : "Alwar" }
    { _id : "Chandigarh" }
    { _id : "Rajgarh" }
    { _id : "Bhiwadi" }


    Query: find ascendant of “Bhiwadi”.
    1. var ParentNode=db.Map_Information.findOne({_id:"Bhiwadi"})    
    2. db.Map_Information.find({Left:{$lt:ParentNode.Left},Right:{$gt:ParentNode.Right}},{_id:1, Type:1}   
    Output:

    { _id : "India" }
    { _id : "Rajasthan" }
    { _id : "Alwar" }

     

Today we read several types of data models. Selection of an appropriate data model can increase the performance of application and also wrong selection of data model can reduce the performance of application. So we should choose an appropriate data-model for our application that match our application and it’s performance requirement.

Thanks for reading this article.