Introduction To MongoDB

Introduction

The MongoDB name is derived from the humongous. MongoDB is a scalable and high-performance open source database designed to handle a document-oriented storage. MongoDB was written in C++. MongoDB is open source and a stand alone product. It was started in 2007 and initially released in 2007. In March 2010 MongoDB launched version 1.4. The latest version of MongoDB is 2.6, launched in April 8, 2014.

The topics to be covered in this chapter are:

  • What MongoDB is
  • What a NoSQL Database is
  • What Big Data is 
  • The need for a Document Oriented Database
  • Differences between a RDBMS and MongoDB
  • How to install MongoDB in Windows
     

MongoDB

MongoDB is an open source database. (Open source is a certification mark owned by the Open Source Initiative (OSI). The development of the software is intended to be shared freely and distributed by others. MongoDB is available for free under the GNU Affero General Public License. And the language is available under an Apache License.) The database uses a document-oriented data model. MongoDB was first developed by the 10gen (now MongoDB Inc.). MongoDB is built on an architecture of collections and documentations. Documents comprise sets of key-value pairs and are the basic unit of data in MongoDB. A collection contains sets of documents and functions as the equivalent of relational database tables.

mongoDBDef

MongoDB supports a dynamic schema (pronounced SKEE-Mah; they are the organization or structure of a database) design allowing the documents in a collection to have multiple fields and structure fields and structure. The database uses document storage and BSON (a binary representation of JSON) format. MongoDB spread collections distributed across multiple systems for horizontal scalability as data volume is increased.

mongoArc

 We can understand MongoDB using  the following simple words:

  • Open source database that uses a Document Oriented Data Model
  • NoSQL
  • Follows the architecture of Collections and Documents instead of tables, rows and RDBMS
  • A Document contains sets of key-value pairs and this is the basic unit of data in MongoDB
  • A Collection contains sets of documents and functions as the equivalent of a relationship database table

NoSQL Databases

A NoSQL database is also called a Not Only SQL database. NoSQL is an approach to database management and database design useful for large sets of distributed data. It does not prevent use of SQL (Structured Query Language) and it is non-relational. This avoids selected relational functionality such as fixed table schemas and join operations. It is the first alternative of relational databases, with scalability and fault tolerance. This is a very flexible and schema-less data model, is horizontally scalable and uses a distributed architecture (NoSQL databases are sometimes refered to as a Cloud Database, Big Database, non-relational database stored and analyzed by user-generated data and machine-generated data).

Types of NoSQL Databases

There are mainly four different types of NoSQL database and they have their own specific attributes.

  • Key-Value Store
  • Column store
  • Document Database
  • Graph Database

Key value Store

These database are store data in schema-less way, in this all data within consist of an indexed key and a value. Cassandra, DyanmoDB, Azure Table Storage(ATS), Riak, BerkelyDB.

Column Store

These database are also known as wide-column stores. This kind of database is mainly designed to store data tables as sections of columns of data, rather than as rows of data. Wide columns stores offer very high performance and highly scalable tables. Some databases that use this architecture are HBase, BigTable and HyperTable..

Document Database

This type of database works on key-value pairs where “documents” contain complex data and each document is assigned a unique key used to retrieve the document. The main features of this type of database are storing, retrieving and managing the document. This database is also known as semi-structured data. Some of the databases are MongoDB and CouchDB.

Graph Database

This type of database uses graph theory. This type of database mainly works by making a graph on the basis of data and relationships. The data and relationships are interconnected, with an undermined number of relationships. Some of the database are Neo4J and Polyglot.

Big Data

Big Data is a term indicating a voluminous amount of structured , semi-structured and unstructured data for getting information. It also does not refer to any specific quantity. Big Data has a key feature to make NoSQL popular. Suppose when we have a limitless array of data, in that scenario we remember Big Data. There are some more scenarios where the definition gets completed.

Velocity: When a huge amount of data is coming from a different location and the data is obtained very quickly.
Variety: Data variety means, data should be in structured, semi structured, or might be unstructured.
Volume: Data volume means that sometimes data comes from the user into the database in a huge volume; it might be terabyte or petabytes in size.
Data Complexity: Data complexity tells us that we can replicate our data or database into different locations or different databases.

Big data

The need for a Document Oriented Database

Some of the reasons for choosing MongoDB over the any RDBMS are the following:

  • Document-Oriented Storage
  • Continuous Data Availability
  • Real Location Independence
  • Flexible Data Models
  • Full Index Support
  • Replication and High Availability
  • Auto-Sharding

Document-Oriented Storage

A Document-Oriented Storage architecture follows the paradigm of a Document-Oriented Database. A Document-Oriented Database is a new breed of database. It is designed for storing, retrieving and managing document-oriented information. The main objective of this database is to store data in some Standard format or encoding. The encoding used includes XML, YAML, JSON and BSON as well as binary forms like PDF and Microsoft Office documents (Microsoft Word and Excel).

Continuous Data Availability

In the present scenario, where any database suffers the problem of downtime, hardware fails. Downtime is the deadly cause for any website or any application that becomes paralyzed. The same for hardware failure. In those scenarios a NoSQL database plays a vital role. If one database server or node goes down then another database server or nodes will able to make the website or web application continue, as in the preceding description we can understand it how much NoSQL helps us rather than any RDBMS.

Real Location Independence

Location independence means the ability to read and write to a database regardless of where the I/O operation physically occurs and to have write functionality propagated out from the location, so that it is available to users and the machine to another side. That kind of functionality is not available in a RDBMS.

MongoDB can maintain our database copies in separate servers depending on geographical region to improve access times. The response is as good as a local database for those users in the location the data corresponds to.

Flexible Data Models

A RDBMS is based on a defined relationship between tables with columns. A RDBMS schema is very strict and uniform, but in NoSQL there is nothing like a RDBMS. A NoSQL data model are schema-less, it can accept all types of data, whether the data is structured, unstructured or semi-structured and also makes a relationship among them very easily. The data in MongoDB has a flexible schema. This flexibility means we can map our document as an entity or an object and each document can match the data fields of the represented data. The document also follow the same structure.

Full Index Support

As we know, indexes provide high performance for fetching the data. MongoDB also uses a special kind of indexing that MongoDB supports.

Replication and High Availability

A replica (an extract copy or model of something) set in MongoDB is a group of mongod processes that maintain the data set. A replica set provides us redundancy and high availability.

Auto-Sharding


Sharding is the process of storing the data across multiple machines. The mechanism behind this logic is when the data increases in size they will balance the load of the data to also maintain the data across several networks and keep your file saved in any damage or any natural disaster. Sharding solves the problem with horizontal scaling, with sharding we can add more machines to support the data growth and the demands of read and write operations.

Difference between RDBMS and MongoDB

There are some differences mentioned in the following tables. These differences make us a clear view of Document database and relational database, exactly how they are different from each other.

RDBMS MongoDB
This is good for structured data Write once and read many for unstructured data.
Tightly structured with a schema and performance is slower (in other words low latency) with huge growing data. Performs faster for small amounts of data. Faster then RDBMS for growing data on a cluster/cloud in TB or PB
Transaction supported Does not support transactions.

8

How to install MongoDB in Windows

Installation of MongoDB is a three-step process, as shown in the following image.

setup

First step: we need to download the file from the MongoDB website and choose the proper file (32 bit or 64 bit) depending on the Operating system. As the image tells us, we need to download the MongoDB from the MongoDB website. There are many others OSs supported by MongoDB available in the website, like Windows, Linux, Mac OS, Solaris. After downloading, install the file by just double-clicking on the msi file.

selectOS

Second step: If we download the Zip file then extract it otherwise we will get the simple setup file, in the setup file just click it and start the setup.

SteupWizard1

Third Step
: This is the final step where we need to choose the Next button to start the installation process as in the following image.

SteupWizard2
This is the the End User License Agreement window where we need to select the Terms And conditions of the MongoDB.
After selecting the Accept terms and conditions section hit the Next button as shown.

SteupWizard3

At the end of the installation process we can see that a directory in Program Files was created having the MongoDB name. After installation move the file into "c:\mongodb". If you get any issue when you move the installed folder then use a move operation from the command prompt. For example: [move c:\example1 C:\example2] and create a DB file inside the MongoDB.

MongoDBfilelocation

Set up the MongoDB environment

MongoDB requires a data directory to store all the data. MongoDB has a default data directory path ("\data\db"). We need to create a folder, for example:

c:\mongodb\bin\mongod.exe --dbpath d:\testing\mongodb\data

bdpath

Summary


In this article we learned the basics of MongoDB, NoSQL databases and Big Data. Also why there is a need for Document Oriented Databases. Also we have covered  the differences between RDBMS and MongoDB and how to install MongoDB in Windows. (The basic process of installation of MongoDB in the Windows Operating system.)