An Overview Of Cosmos DB - Start From Scratch

Debasis Saha
Jun 22, 2020

266.5k
0
16
- facebook
- twitter
- linkedIn
- Reddit
- WhatsApp
- Email
- Print
- Other Artcile

Introduction

Microsoft provides different types of database storage like Azure SQL, Azure Cosmos DB, etc. Today, in this article we will discuss the overview of the Azure Cosmos DB along with its purpose and architecture. At the end of this article, the reader can be able to get an idea about the below points:

Readers can understand the different concepts of the document, columnar, graph, or key-value store database.
Can explain the process to create an Azure Cosmos DB Instance.
Also, we can explain how to transfer any application data into Azure Cosmos DB with the help of MongoDB API or Azure Table Store API.

Concept of No-SQL Databases

As we all know, NoSQL stands for Not Only SQL. The main meaning of the NoSQL databases are an alternative to SQL databases and can perform all types of query operation like any RDBMS database like Microsoft SQL Server. Mainly NoSQL contains all databases which are not a part of the traditional database management systems (RDBMS). The main purpose of the NoSQL database is a simple design, the possibility of both horizontal and vertical scaling, and above all, easy operational control over the available data. NoSQL database breaks the traditional data structure of the relational database and provides an opportunity for the developer to store the data into the database as same as their programming requirements. In simple words, the NoSQL database can be implemented in such a way that traditional databases could not be structured.

Normally, NoSQL Databases can be categories into four different categories. In the below section, we will provide some brief descriptions related to each category so that we can clearly understand related to each category of databases.

Key-Value Pair

In a very easy form, Key-value pair databases mainly represent an array of values that always have a defined key. Sometimes, these types of array are referred to as a dictionary or hash. This type of database is mainly a schemaless database that provides the facility to save different types and shapes of data. Users can retrieve those data by using a key name. The key in the database must be unique across all the documents. The main reason behind that Key is the only way to receive the value from the data list. The below table demonstrates some sample data as a Key-Value Pair:

Key	Value
K1	123
K2	ABCD
K3	ABC,123
K4	{ id:2, name: Amit, city: New Delhi }

Just like a traditional relational database, Key-value databases do not have any concept of relationships, stored procedure, secondary index, or foreign keys. In these types of databases, data are normally stored in a denormalized way. Due to this, data fetching is much faster compared to any relational database. The main advantages of these types of databases:

It is much simpler to scale. It takes the same time to insert data into an empty table or a table with millions of records.
It is much suitable for distributed systems.
We can store both unstructured and semistructured data.
Data read is very fast if we know the exact key name

Despite the above benefits, it also has some drawbacks:

Relationships between the data need to be maintained externally to the data
Data sort or filter for a non-key data is quite completed. In that case, we need the full table scan.
It maintains only one index i.e. on the key

Document Database

Document databases are mainly stored data in a self-describing structure that can contain sub-documents and collections of the other documents. These documents normally represent a collection of related data encapsulated together, mainly defined as JSON or XML. Some of the data samples have been shown in the below image.

An Overview Of Cosmos DB - Start From Scratch

The main benefit or advantage of using these types of database are:

Unlike a key-value database, we can query and index the structure and content of the value data.
We can store objects directly in documents through Applications. There is no need to translate between an object model and a relational model.
A document database is also a schema-less database.
The document database can be easily scaled up or down across multiple servers.

Columnar Database

As the name indicates, a columnar database always has the feature of a column-based family that differs from other types of NoSQL databases. A Column family is mainly a collection of columns that hold the data for a set of entities. In a simple concept, we can assume that a column family as conceptually a table in a relational database. Since it always has a set of rows and columns. But unlike a table in a relational database, the columns in a column family does not rigidly follow the defined schema for every row. So, in simple concept, we can assume that a column family as a map of name/value pairs where the contents of this map vary on a row by row concept.

Graph Database

The main idea behind the graph database is that every data item can have a relationship with each other. The relational databases also follow this relation in their data. in the case of RDBMS, this type of relationship is implemented by the help of Foreign Key. But in the case of Graph Database, this relationship is much more fluid and can connect multiple different types of data. So, as a result, there is no concept of referential integrity between the connected entities. With this approach, we are free from the rigidity of the relational model.

In a graph database, the terminology used is "vertices" and "edges". Vertices are discrete items or objects in the database—for example a person, a product, or an address. Edges, which are the connections between vertices, describe how the vertices are related to each other. A person lives in a house, a product is owned by a person. Both vertices and edges will have properties and, when all these are combined, they’re known as a property graph.

An Overview of Cosmos DB

Azure Cosmos DB is a multimodel, globally distributed, NoSQL based database service hosted in the Azure platform. Cosmos DB is designed as a Schema-agnostic patter. Cosmos DB automatically indexes all the data stored as a document. Microsoft builds the Cosmos DB based on experience gained from Azure DocumentDb. CosmosDB is a service that can distribute data globally, enable performance to be scaled horizontally, and support multiple levels of consistency. To access the data, the user always specifies a measure of how responsive they need the database to be used (this measure is normally known as a request unit).

In Azure Cosmos DB, Microsoft provides us several APIs which are used for data manipulation and implementation of Cosmos DB. Cosmos DB enables us to implement a key-value store using the Azure Table Storage API, a Document Database using the MongoDB or DocumentDB APIs, a graph database that supports the Apache Gremlin API (which is a part of the TinkerPop framework) or a column-store database which supports the Cassandra API. In the background, Cosmos DB has used the same storage engine for each of the APIs. With the help of these APIs, the developer does not need to make to many changes in their application code to implement Cosmos DB.

Sometimes, the only change needed to make in the application is to change the application’s connection string.

Hierarchy of a Cosmos DB

Cosmos DB always uses a consistent hierarchical model to store and process data. At the highest level, there is an Azure Cosmos DB Account – under the account, a user can add databases. Every database can contain several users, who can grant permissions. Except for the users, every database always contains a combination of collection, documents, and attachments. Collections might also contain related stored procedures, triggers, and user-defined functions. Each of these resources is directly addressable via a logical and stable URI. These URL is accessible via a highly available and efficient TCP protocol. The following are examples of addresses:

Databases: /dbs/{id}
Collections: /colls/{id}
Documents: /docs/{id}

In Cosmos DB, every database resource is categorized as either system or user-defined. System resources always have a fixed schema and are for databases like accounts, databases, collections, users, or permissions. User-defined resources are the documents and attachments added to the database by the user either through Azure Portal or applications. For this type of resource, there are no restrictions on their schema. Both categories of resources are defined and managed as standard-compliant JSON. All resources, either system or user-defined must contain a common set of properties:-

_rid: unique hierarchical identifier generated by the system.
_etag: generated by the system to enable optimistic concurrency control.
_ts: last updated timestamp maintained by the system.
_self: unique addressable URI generated by the system.
Id: user or system specified unique name for a resource.

Implement a Cosmos DB Database using Azure Portal

There are many ways to create a database in Cosmos DB. The primary mechanism is by using the Azure portal—we could also use PowerShell cmdlets, Azure CLI, the Azure Storage Explorer, or the REST API.

Step 1

Now sign-in to Azure Portal with your credential.

Step 2

On the Azure Portal menu, click on Create a Resource.

Step 3

Now click on Database at Azure Cosmos DB

Step 4

On the Create Azure Cosmos DB Account page, enter the settings for the new Azure Cosmos DB account, including the location, database name, etc.

Step 5

Now, click on Review + Create Button

Step 6

After validating the setting, click on Create Button to create the Account.

Step 7

Once the deployment has been a success, click on the Go-To Resource Button.

Step 8

Now, click on the Data Explorer option in the left panel.

Step 9

Now click on the New Database options to create the Database.

Step 10

Now provide the Database Name and then click on OK Button.

Step 11

After the Database has been done, click on the New Container Button to create a new Collections. Here we need to provide the collection ID and partition key ID as shown in the below image.

Step 12

After completing the collection or Container creation, just expand the Database icon where we can see the container or collection name with some other user-defined resources like Stored Procedures, Triggers, etc.

Step 13

Now, click on Items option and then click on New Item Options to insert the data within the collection. The data format must be in JSON format.

How to Migrate Data to Cosmos DB

In this section, we will discuss how to move application data from the existing database to any Cosmos DB database. Cosmos DB is a very flexible database depending on where our current data resides, and the which type of Cosmos DB database API has been chosen. Cosmos DB provides several options for importing data as per the chosen database API.

SQL API

With the help of the Data Migration tool (dtui.exe), a Cosmos DB created with a SQL API can import data from a wide range source which includes:

JSON flat files
MongoDB Database
SQL Server
CSV Files
Azure Table Storage
DynamoDB
HBase
Azure DocumentDB

Except for the above-mentioned tools, we can also develop a custom program to transfer the data from any source to the Cosmos DB with the help of any technology like .NET, Java, Node.Js, Python and Xamarin.

Mongo

In the case of MongoDB, it provides different utilities of syntaxes like mongoimport, mongorestore to transfer data from any existing MongoDB database to the Cosmos DB database. In the beginning, we first need to use mongoimport which helps us to connect with a Cosmos DB database with Mongo API and can perform the import operations from a file created by mongoexport utility. For performing a full MongoDB database backup restore, first, we need to use mongodump utility and then restore the exported data with the help of a mongorestore.

Gremlin

A Commos DB database created with the Gremlin API always has some limited options to import data. One of the most common options is that first need to export our current graph database as 2 MB GraphJSON files and then we can upload that particular file using upload option in Azure Portal under Data Explorer. We also use the Gremlin console to execute add statements on the Azure Cosmos DB Graph.

Table

Microsoft provides us a command-line version of the Data Migration tool (dt.exe). We can use this tool to import the Azure Table storage data into a Cosmos DB database with a Table API. The process or steps are just similar to the other APIs.

Cassandra

For the Cosmos DB database with Cassandra API, we have several different options to import data from an existing Cassandra wide-column document database. The easiest and common process is to use the Cassandra cqlsh COPY command to import data exported into a CSV file. The sample syntax of the cqlsh COPY command is,

COPY columnStore.importTable FROM localFolder/*.csv

Conclusion

In this article, we discussed the basic concept of the NoSQL Database. We also discussed the Cosmos DB overview and the hierarchy structure. Also, we discussed how to create a Cosmos DB database using Azure Portal along with some data migration process or steps which help us to transfer data from any existing database to the newly created Cosmos DB Database. Finally, we saw how to store data using the Asp.Net Core application in Azure Queue Storage. Any suggestions or feedback or query related to this article are most welcome.

Recommended Free Ebook

Printing in C# Made Easy

Download Now!