Learning Azure Cosmos DB

Introduction

This is the basic article in the azure learning series. In this article, we are going to learn about Azure No-SQL offering Cosmos DB, real life use cases and available features. The target audience is young developer who wants to explore, learn and use cosmos db in their project.

What are we covering in this article?

  • Cosmos DB Basics
  • Resource Architecture and data model
  • Supported APIs
  • Container basics

Cosmos DB Basics

Azure Cosmos DB is a fully managed NoSQL database platform as service offering from Microsoft which is horizontally scalable, globally distributed, fully managed, low latency, multi-model, multi query-API database for managing data at large scale.

Advantages of cosmos DB

1. Massive and Horizontal scaling

Cosmos DB can be scaled horizontally to support single digit millisecond reads and writes. It has the ability to handle the increased load by adding more servers to the cluster. It can be elastically scaled for both throughput and storage to handle increased loa2

2. Globally Distributed and High Availability

It provides the very efficient feature for replicating data globally. There are read and write regions which can be added or removed dynamically for replication. This help to distribute data globally and will near to the consumer and avoid latency in reading the data. It is designed to provide low latency, elastic scalability of throughput, well-defined semantics for data consistency, and high availability.

3. Multi Model /Multi API support

Cosmos DB supports below multi model techniques

  • Key-value pairs
  • Column family
  • Document and
  • Graph

It has support for multiple APis

  • Core SQL API
  • Cassandra
  • Mongo DB API
  • Tables API
  • Gremlin

Multi Consistency support

Azure Cosmos DB offers 5 consistency models which can be leveraged as the application needs.

Strong

This is very similar to read committed feature of SQL Server where you will be able to see latest version of an item which are durably committed. It is limited to single region.

Bounded-staleness

In bounded-staleness consistency read will lag behind writes, and guarantees global order and not scoped to a single region.

Session

This is default consistency level. It provides consistency guarantees but also has better throughput.

Consistent Prefix

In this consistency application reads the data in the same sequence that has been committed, so updated data set cannot be trusted.

Eventual

In eventual consistency, the data that is written to the primary node is propagated to read-only secondary nodes, which are globally distributed and this introduces latency not suitable for mission critical apps. It gives low latency and better performance

Resource Architecture and Data Model

Data resource architecture is a hierarchical and logical representation for data storage and database operations. A container can be a collection, graph, or table. An item can be a document, edge/vertex, or row: Basically, items compromise the data content inside a container. A cosmos db account can have more than one database which have underlying users, permissions, and containers. An Azure Cosmos DB container is a schema-agnostic container of arbitrary user-generated entities and stored procedures, triggers, and user-defined-functions (UDFs).

Supported APIs - Cosmos DB support

Azure Cosmos DB supports multiple APIs which can interact with all the data regardless of data model.

1. SQL

Azure Cosmos DB's core, or native API for working with documents. Supports fast, flexible development with familiar SQL query language and client libraries for .NET, JavaScript, Python, and Java.

2. Cassandra

Fully managed Cassandra database service for apps written for Apache Cassandra to migrate Cassandra workloads to Azure Cosmos DB.

3. MongoDB

Fully managed database service for apps written for MongoDB usually migrating workloads MongoDB to migrate to Azure Cosmos DB.

4. Gremlin- Graph

Fully managed graph database service using the Gremlin query language, based on Apache TinkerPop project. Recommended for new workloads that need to store relationships between data.

5. Azure Table

Fully managed database service for apps written for Azure Table storage. Recommended if you have existing Azure Table storage workloads that you plan to migrate to Azure Cosmos DB, but do not want to re-write your application to use the SQL API.

Cosmos Account

Azure Cosmos account is the basic unit of global distribution and high availability. For globally distributing our data and throughput across multiple Azure regions, we can add or remove Azure regions from our Azure Cosmos at any time.

I am choosing Core (SQL) for this article, you can choose as per your need.

Put required basic details about subscription, Resource group, Location.

Capacity mode is important while choosing.

Provisioned throughput

This is cosmos DB setting for providing performance read /write operations and storage provided for application uses. User will be charged for the provided Rus like 5000 Rus per month per region.

There are two modes which has different offering for saving cost and serving the required need by looking at the usage model and ideal application time.

  • Standard
  • Auto scale-

Serverless

This is a new offering which is purely consumption-based model where user will change as per the RUs consumed for read and write operations.

Good for small scale applications, testing and development work loads where performance needs are minimal.

This setting can be changed, for changing the capacity mode you need to recreate the account so be careful whole choosing this mode.

Use the recommended setting and get an account created for your application. In my case I have created articledemodb for this article. This will be overview page where all the required information about the account will be handy for review.

Now you have an account then you need to have an container to store your documents for the application.

From this over view page also you can add a new container.

Container: An Azure Cosmos container is the unit of scalability for both provisioned throughput and storage of items. A container is horizontally partitioned and then replicated across multiple regions

You can go to the data explorer option to create add a new container or create a new database also.

When you click on the new container option, you will be navigated to a page when you have option to create container with different settings.

To create a new container, you need to fill in few details.

Database id

ArticleDemo

This is like have a database name in RDBMS databases.

Option to share throughput-check box

Once checked this true, throughput provisioned on the database level and that will be shared across containers for that database.

Database throughput (auto scale)

This is to setup Rus, meaning request units.

RU- Request unit is a performance currency abstracting the system resources such as CPU, IOPS, and memory that are required to perform the database operations supported by Azure Cosmos DB. The cost to do a point read (fetching a single item by its ID and partition key value) for a 1-KB item is 1 Request Unit (or 1 RU). 

1. Auto scale

With auto scale enabled database throughput will automatically scale from 400 RU/s (10% of max RU/s) - 4000 RU/s based on usage considering max RU setup is 4000 for example.

2. Manual

With manual one you are always changed for the provisioned throughput for example if it is 400 Rus for one region.

Microsoft has provided a capacity planning service where we can estimate the RU need and cost associated with this by putting input parameter for this need.

Here is the link for planning capacity need for your app https://cosmos.azure.com/capacitycalculator/

You can play with values and see how much you actual need for your application for read and write for having a good performant application other wise you will end up having throttling and have performance implication in the app

Container Id

Unique identifier for the container like table name in database.

Partition key

This is a key used for automatically distribute data across the partition and helps to scale individual containers in a database to meet the performance needs of your application. I will cover these individual topics later in separate article.

Once a new container is created you can browse through the data explorer tab.

Click on Items, then click on new item to add a new document in cosmos db for manually now for testing.

Using this json, a new item is added in the container

{
    "id": "2",
    "articletopic": "Azure Data bricks ",
    "author": "Abhsihek kumar "
}

Then run the below query to retrieve the documents on the container. Currently I have created below 2 documents manually for demo.

SELECT * FROM c[{
    "id": "1",
    "articletopic": "cosmos DB",
    "author": "Abhsihek kumar",
    "_rid": "JtNaAINOl8gBAAAAAAAAAA==",
    "_self": "dbs/JtNaAA==/colls/JtNaAINOl8g=/docs/JtNaAINOl8gBAAAAAAAAAA==/",
    "_etag": "\"170024ad-0000-2000-0000-62ff6f460000\"",
    "_attachments": "attachments/",
    "_ts": 1660907334
}, {
    "id": "2",
    "articletopic": "Azure Data bricks ",
    "author": "Abhsihek kumar ",
    "_rid": "JtNaAINOl8gCAAAAAAAAAA==",
    "_self": "dbs/JtNaAA==/colls/JtNaAINOl8g=/docs/JtNaAINOl8gCAAAAAAAAAA==/",
    "_etag": "\"17003fad-0000-2000-0000-62ff702c0000\"",
    "_attachments": "attachments/",
    "_ts": 1660907564
}]

Now you can see the cosmos DB has added few more meta columns which is being used for different purposes by azure cosmos db framework.

From Query Stats, you can all the metrics related to query executions like execution time , RU utilized etc.

Conclusion

This article will give a basic idea of azure cosmos DB and how to start suing these PAAS offering from azure for your application needs.

 I will be covering below topics or cosmos db in next article which will be a bit more advance.

  • Partitioning strategy
  • Data modelling and design with cosmos db
  • Sample Application with .Net SDK

Keep learning and keep smiling.

Useful Resources

Thanks for the authors of below links.


Similar Articles