Partitioning In Azure Cosmos DB

Introduction

  • Cosmos DB allows you to store a huge amount of data
  • To query this huge data may impact the performance
  • Partitioning allows you to group data in partitions and provides better performance.

Partition Key

  • Partition key is the JSON property (or path) within your documents that can be used by Cosmos DB to distribute data among multiple partitions
  • Partition key decides the placement of documents
  • All the documents belonging to the same partition value of partition key are group together into a logical shared partition
  • Once you set the partition key, you cannot change it
  • It's a best practice to have a partition key with many distinct values (hundreds to thousands at a minimum).
  • For example, let's say that you're storing JSON data about employees and your partition key is "department." Then all documents with the value of "department" equal to "engineering" will be stored in the same partition. Similarly, all documents with "department" of "marketing" will be stored in the same partition.

Partition

  • Azure Cosmos DB stores data in a number of physical partitions
  • Collection is a logical container of physical partitions
  • Every partition in Azure Cosmos DB has a fixed amount of SSD-backed storage associated with it and is replicated for high availability.
  • Partition management is fully managed by Azure Cosmos DB. So no need to write any code.
  • Each partition hosts one or more Partition Keys
How Does Partition Work?
  • By default Azure creates one default partition
  • While inserting a new document, Azure Cosmos DB hashes the partition key value and uses the hashed result to determine which partition to store the item in.
  • Once the size of partition reached to the threshold, Azure created another physical partition and moves big size logical partition to newly created partition
  • The developer can provide a partition key while performing CRUD operations to optimize query performance.
  • Data belonging to the same value of partition key always logically grouped together and stored in the particular physical partition.
How to Choose the Right Partition Key?
  • Choosing a partition key purely depends on the structure of data
  • It is important to choose a partition key property that has a number of distinct values
  • An ideal partition key is one that appears frequently as a filter in your queries and has sufficient cardinality to ensure your solution is scalable.
  • If the chosen partition key doesn't have many distinct values then all queries will get fired to a single partition which may slow down performance.
  • If you are working on a multi-tenant application, then choosing TenantId as a partition key is a good choice.
  • If you are creating an application for families, then zipping the code as partition key is a good choice
How to create Partitioned Collection
  1. Login to Azure Portal
  2. Go to Cosmos DB account
  3. Select storage capacity as Unlimited (Partitioning is not allowed for fixed storage)
  4. Give partition key value or path (e.g. /address/zipcode)
  5. Select throughput
  6. Click Ok
 
Monitor Partitioned Data
 
You can monitor how data is partitioned across partitions
  1. Login to Azure Portal
  2. Go to Cosmos DB account
  3. Click on Metrics Option
  4. Select Storage tag
  5. Select Collection name for which you want to view data
 
 
As shown in the above image, data of products are partitioned on the departmentId as a partition key.