Be Sure With Azure .NET – Azure Table Storage (Part 1)

Max McCarty
10y
10.4k
0
1

Article

Introduction

This is the second article in an ongoing series on Microsoft Azure features and services. If you're just joining us, you might want to have a look at the previous Microsoft Azure Blob Storage article that covers how to get started with an Azure Storage account, development tools, using the Azure Storage Client SDK and pricing and performance expectations that we'll not be repeating.

For this next article in the series, we're going to be looking at Microsoft Azure's Table Storage service. I have decided to break it into two parts since we'll be covering a lot. In Part 1, we'll covering working with NoSQL databases, the differences among relational databases, designing and creating your Azure Table as well as all the available operations for persisting data. In Part 2, we'll cover writing queries to our tables, retrying failed operations and concurrency and of course security.

Let's get down to business and see what we can learn about working with Azure Tables:

What do you Mean This Isn't a Relation Database?
Firing Up Our Tables
Most Important Question; How to Design Your Table?
Working With Azure Tables (Table Operations)
Batch Operations with Entity Group Transactions
Queries: Efficient and Inefficient
Rats! We Hit a Snag (Working with Retry Policies)
Concurrency
Security

The Right Tool for the Right Job

As is with most tools, whether it's the screwdriver in your toolbox, lawnmower in your garage, or the software development IDE you're using, each has a specific use. This is also true with Azure Table Storage as well. There are a specific set of cases that makes Azure Table Storage a good fit (read: it's not a perfect fit for every job).

When we have the potential of dealing with large amounts of data (read: terabytes) that don't have complex relationships and we want extremely fast retrieval and persistence along with the ability to easily scale out, Azure Tables shines. This is because of a combination of the design features of Azure Tables, that if designed correctly, will provide out-of-the-box, highly scalable tables that are partitioning when under heavy load as well as extremely efficient data retrieval and persistence throughput.

Before getting into the specific details of Azure Tables, let's take a quick stroll through comparing Azure Tables and our popular relational database.

What do you Mean This Isn't a Relation Database?

There is a high probability that you are familiar with Relational Database Management Systems (RDBMSs) and think along those terms when it comes to storing and structuring data in a database. However, an Azure Table Storage is not a relational database, but a NoSQL database, so there is a completely different approach that needs to be taken when we set out to host our data in Azure's Table Storage.

When we think about relational databases, we think about single table schemas, table relationships with foreign keys and constraints, Stored Procedures and columns and rows. All of which we aren't burdened with when dealing with Azure Tables. This is one of the positive characteristics of NoSQL databases that allow us to focus solely on the data. Of course, those same characteristics are what provide a relational database its strengths.

Since Azure Tables are not a perfect fit for every time we need to persist data, let's look at some of the major differences between Azure Tables and relational databases to help draw that defining line.

Azure Tables vs. Relational Databases

This is not an exhaustive list of differences, but the important differences that you need to be aware of to make the right decisions when working with Azure Tables and ultimately will play a significant role when we get to designing Azure Tables:

A Different Vocabulary: In the traditional relational database vocabulary we make references to a table's columns and rows. However, in the world of NoSQL, we store entities (rows) with properties (columns) in our tables.
No schema defined relationships among tables: This lack of relationships means, you are not going to be drawing up relationships among other tables and storing their primary identifiers. This is because of, or a by product of, the fact that you can't do joins on other NoSQL Tables. There is a way to store entities with a different schema in the same table in order to keep related entities close at hand.
All schemas are created equal: Unlike a relational database table that has an associated defined schema that all rows must adhere to, in Azure Tables, a single table can store entities that have a different schema. To take it a step further, we can view Azure Tables as a container of entities, while entities are simply a bag of name/value pairs of data. Each entity within a table can contain multiple name/value pairs of data. The only common denominator shared by a table's entities are the three required properties (Partition key, Row key and Timestamp), two of which we'll be looking at in-depth.
Indices are limited: In a relational database we have the luxury of designing indices to help with query efficiency such as a secondary index. Out of the box, we don't have this luxury and querying Azure Tables by entity properties, other than the required properties said earlier can be very inefficient. We'll talk more about this when looking at important factors when designing our tables.

If you don't have a relational database background, maybe much of the preceding won't make a lot of sense, or if you come from a NoSQL background, I am preaching to the choir. Without further burdening you wiht technical details, let's dive in and get started with Azure Tables. When we get there, we'll cover the most important aspect of Azure Tables, design.

Firing up Our Tables

As a reminder you can see the previous article Azure Blob Storage for getting a storage account setup. Once setup with a Azure Storage account, we can show how easy it is to dump data to an Azure Table in the following 2 simple steps.

Step 1

Create the table as in the following:

CloudTable table = _client.GetTableReference(tableName);
table.CreateIfNotExists();

Step 2

Create and persist a dynamic Table Entity as in the following:

var dynamicEntity = new DynamicTableEntity
{
PartitionKey = "Games",
RowKey = "Outside",
Properties = new Dictionary<string, EntityProperty> { { "Name", new EntityProperty("Corn Hole")} }
};
var tableOperation = TableOperation.Insert(dynamicEntity);
var result = table.Execute(tableOperation);

Now this was just for demonstration purposes only and we're getting ahead of ourselves. But, you can see how easy it is to save data to an Azure Table by just utilizing the tools already available and didn't need to predefine custom entities to host the data I wanted to store. Starting with the next section we'll go over creating tables and persisting data. But, this brings us to the most important point when dealing with Azure Tables, design.

Most Important Question; How to Design Your Table

If you have had anything to do with developing an application with a database backend, you probably have seen how the requirements generally start off with having some data that needs to be persisted. Therefore, the table is designed to house that data. It's only afterwards that people think about how that information needs to be retrieved and utilized. Sound familiar?

We don't have the luxury of answering this question after we're already persisting data to Table Storage. This is because in order to acquire the highly efficient retrieval speeds and out-of-box scale out capabilities afforded by Azure Table Storage, we must first make important decisions on how we will structure the data in an Azure Table. Remember, the entities we are storing are un-schematized, but that doesn't mean they aren't structured. So, let's break down what important decisions we need to make for structuring our data.

Partition Key and Row Key

You have heard me mention earlier that with the NoSQL database we are storing entities that are simply bags of name/value pairs of data. But, in addition, there is a common denominator among all entities in all tables that are the required properties referred to as the Partition Key, Row Key and Timestamp properties. For now, the most important of these are the Partition and Row key.

Partition Key

So, how do Azure Tables automatically afford out-of-the-box scale out capabilities when data is under a heavy load? This is where the Partition Key becomes probably the most important of the two properties we are currently evaluating. The partition key is a string property that is required for all entities. Unlike a primary key on a relational database table, it isn't unique. But, it is the key (no pun indented) to allowing Azure to determine where divisions can be made in a single Azure Table and partition entities that are part of that single partition to its own partition server.

Even though every partition will be served by a Partition Server (that can be responsible for multiple partitions), it is when partitions under heavy load can be designated its own Partition Server. It is this distribution of load across partitions that allow your Azure Table Storage to be highly scalable.

So let's stop for a second and think about this; if you design your table with a single partition key, such as in the case of a table that stores store product information, but you decide to make your partition key “products” and all entities fall under this single partition. How can Azure partition your data so that it can automatically scale out your table for efficient performance? It can't. All 50,000,000 shoe products you store fall under the same partition.

Unfortunately, partition servers also create a boundary that will directly affect performance. Therefore, in contrast to the all-in-one partition approach, creating unique partitions for every entity is a pattern that that will cost you the ability to do batch operations (discussed later), incur performance penalties for insert throughput as well as when queries cross partition boundaries.

Finally, sorting is not something that is controlled after the data has been persisted. Data in your table will be sorted in ascending order first by the partition key, then sorted ascending by the row key. Therefore, if sort is of importance, you will need to determine the ways partition key's and row keys are defined. A good example is how 11 would come before 2 unless padded with 0's.

Row Key

The row key is also a string property that must be unique within the partition. As I said earlier, the idea of secondary indexes has been a feature lacking in Azure Tables. However, the combination of the Row and Partition key creates an entities primary key and forms a single clustered index within a table.

Row keys also provide a second applied ascending sort order after the applied ascending sort order of the partition key. Therefore, depending on your circumstances, further thought might be required on how you want data to be sorted when retrieved.

Therefore, the decision you need to make is how will the data be queried, what are the common queries that you expect to be made? Based on that answer, you need to determine how the data can be grouped into partitions. The following is a list of guidelines that is not exhaustive, but can help making table design decisions easier.

Determine what the common queries will be against the data.
Based on those common queries, determine how the data can be grouped (partitions).
Avoid over-sized partitions that would hinder the ability for scalability.
Avoid extremely small (single entity) partitions that would negate the ability for batch partitions and hinder insert throughput.
Consider your sort order requirements when determining exact partition and row naming.

We will continue to look at all these points as we cover table operations to persist and query the data. There is more to consider than just these starting points, such as how inefficient queries based on table properties are that force a full table's scan. Since this article is not strictly about designing tables, it would recommend reading this helpful MSDN article for more on how various partition sizes affect queries as well as more information on table design.

Working With Azure Tables (Table Operations)

So now that you have decided how you want to structure your tables and you're ready to start persisting data, let's get down to business. We are utilizing the .NET Azure Storage Client SDK that we fully covered in the previous Azure Blob Storage article. There, you can learn your various options for acquiring the Storage Client SDK. Despite having covered the details about utilizing your Storage Account's Access keys, I will cover that small part again here.

Therefore, assuming you are up and running with the .NET Storage Client SDK, let's start with the minimal requirements to get started with working with an Azure Table.

Minimal Requirements

The table itself is associated with a specific Azure Storage Account. Therefore, if we want to do CRUD and query operations on a specific table in our storage account, roughly, we will be required to instantiate objects that represent our storage account, a specific table client object within our storage account and finally, a reference to the table.

Keeping that in mind, the minimal requirements to work with a table would go something like the following:

First, you create a storage account object that represents your Azure Storage Account
Secondly, through the storage account object, you can create a Table Client object
Third, through the Table Client object, obtain an object that references a table within your storage account
Finally, through the specific table reference, you can execute table operations

CloudStorageAccount account = new CloudStorageAccount(new StorageCredentials("your-storage-account-name", "your-storage-account-access-key"), true);
CloudTableClient tableClient = account.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference(tableName);

You can see for the CloudStorageAccount object we are creating a StorageCredentials object and passing two strings, the storage account name and a storage account access key (more on that in a moment). In essence, we create each required parent object, until we eventually get a reference to a table. Despite having covered setting up aCloudStorageAccount object in the previous linked article, the following is a refresher on utilizing your storage's access keys.

Storage Keys

As said earlier, creating the storage credentials object, you need to provide it your storage account name and either the primary or secondary base64 key. This is all information we covered in the article on blob storage, but you can obtain this information from your Azure Portal by selecting “Manage Access Keys” under “Storage”, where it will list your storage accounts you have created.

Yes, as you might have guessed, the access keys are the keys to the kingdom, so you would not want to pass this information out. However, for the sake of simplicity, I am demonstrating the basis for what we need in order to access a specific table storage account. We will take a closer look at security later on in this article.

A better approach to acquiring a CloudStorageAccount would be to look at the “Windows Azure Configuration Manager” NuGet package that can help with abstracting out some of the overhead of creating an account.

Creating Tables

So as we saw earlier, in order to create a table we need to have a reference to a table. Azure has some table naming rules that we will need to follow. Once we have a reference to a table there are a few ways to create a table.

CloudTable table = tableClient.GetTableReference(“footwear”);
table.Create();
Or if your aren't aware of whether or not it has been created yetCloudTable table = tableClient.GetTableReference(“footwear”);
table.CreateIfNotExists();

Insert

In this article, we will look at two of the main ways to persist data to your table. One is by explicitly defining a class in your application that implements the TableEntity class. While a more implicit, second way to persist data to your table is through the use of the DynamicTableEntity.

We can start off by defining classes to represent our table entities, that will derive from TableEntity, creating a new instance of an entity we want to persist, followed by the table operation we want to carry out (insert) and finally executing it against the table reference we have as in the following:

public class Footwear : TableEntity
{
public double Size { get; set; }
public string Brand { get; set; }
public string Name { get; set; }
public string Gender { get; set; }
}
Footwear atheleticShoe = new Footwear()
{
PartitionKey = "Athletics",
RowKey = "038389_7_women",
Brand = "AeroSpeed",
Size = 7.5,
Gender = "women",
Model = 38389
};
TableOperation tableOperation = TableOperation.Insert(atheleticShoe);
TableResult result = table.Execute(tableOperation);

As you might have noticed, we're using a combination of the model#, size and gender for the Row key. Depending on what common queries you determine will be used for data retrieval from your table, this key can have significant importance on the available unique keys within a partition and sort order.

De-normalization (A Short Commercial Break)

Since we are on the topic of inserting data in our tables, this is a good as any time to talk about a common problem with NoSQL databases; de-normalization. We have already pointed out the significance to understanding how your table data will be queried and making the correct table design decisions ahead of time. If you decide that you will have more than one dominant query against the data, a common solution is to insert the data multiple times with multiple row keys within a partition. This will allow for multiple efficient queries to be done against the same data.

A simple example using our Footwear table, where we might want to query by size or by model#, is to prefix the Row key and insert the same data twice. The prefix will allow our application to distinguish among the various values:

Partition Key	Row Key
Athletics	size:07_38389_women
Athletics	model:038389_7_women

The preceding example is just a possible solution, not a de-facto method for creating row keys, only a means for you to think about variouis scenarios of persisting entities. But as said earlier, this is a case where we have de-normalized our table by duplicating the data. This isn't the only case where de-normalization of your data might be required or be a possible solution to a problem.

Another common problem is the need for our application to work with multiple entities together, where the data is different, but closely related. If we can't define relationships among entities, we can use the un-schematized characteristic of NoSQL entities to save associated entities in the same table. A common example would be an application that works with a Contact entity that requires working with Address entities. Saving them all in the same table under the same partition would provide the ability to do a batch transaction when saving a Contact and its Addresses (we'll talk about batch transactions).

Updates

Updating existing properties on an existing entity in our table isn't the only kind of update that can occur. Because entities are no more than a bag of key/value pairs and there is no master schema that an entity must follow, there is no reason why an update to an entity where the PartitionKey and RowKey match, might contain a complete different set of properties and values. Because of this, there are a few ways we might want to handle the update of an entity. This is where Merge or Replace are relevant.

Merge

If we make changes to existing properties on an existing table entity, Merge will accommodate updating the entity where the PartitionKey and RowKeyexists. But in the case where we have an entity with different properties and want to update an existing table entity without losing those existing properties, we can merge the two entities. This will result in a table entity with the combined properties as well as any changes to shared properties.

Before any changes we might have the following table entity:

TableOperation query = TableOperation.Retrieve<Footwear>("Athletics", "38389_7_women");
TableResult results = table.Execute(query);
Footwear footwear = (Footwear)results.Result;
DynamicTableEntity newFootwear = new DynamicTableEntity()
{
PartitionKey = footwear.PartitionKey,
RowKey = footwear.RowKey,
ETag = footwear.ETag,
Properties = new Dictionary<string, EntityProperty>
{
{"PrimaryColor", new EntityProperty("Red")},
{"SecondaryColor", new EntityProperty("White")}
}
};
TableResult result = table.Execute(TableOperation.Merge(newFootwear));

In this example, we are loading an existing entity by the PartitionKey and RowKey. I use a DynamicTableEntity to show where we want to Merge new changes to an existing entity by doing a Merge TableOperation. We can see the final result is the combination of the existing properties as well as the new PrimaryColor andSecondaryColor properties.

PrimaryColor andSecondaryColor

This is just a note that the previous DynamicTableEntity wasn't required to make changes to an existing entity. But you're applications entity POCO's might change, while you want to retain existing property information, or possibly your application has a split persistent model that that needs to merge the data in an entity.

Replace

Perhaps we don't want to retain previous information and we want to completely alter an existing entity. Replace lets us completely replace the existing entity where the PartitionKey and RowKey exists. Taking the previous example we conduct a replace TableOperation as in the following:

TableOperation replaceOperation = TableOperation.Replace(newFootwear);
TableResult result = table.Execute(replaceOperation);

And we can see how it has completely altered the structure and data for the existing table entity as in the following:

Because I have seen some confusion out there of the differences, just remember, Merge retains any current properties and data on an existing entity (data retention). While Replace does just what the name implies, replaces the entity completely with the new entity (data loss)

Delete

Using the same Footwear entity example from above, we can simply delete an entity doing the following:

TableOperation deleteOperation = TableOperation.Delete(footwear);
table.Execute(deleteOperation);

Be aware that though it might appear you can simply pass in a newly constructed entity with the correct PartitionKeyand RowKey, It will require that you provide a value for the ETag property, that we haven't talked about yet and is covered under the Concurrency section. You can force the update by using a wildcard “*” or first load the entity from the Table and pass it as the entity in the delete operation.

Hey Wait! There's More

The Azure Table REST API provides a couple of binary operations if we aren't sure if an entity exists. Through theCloudTableClient we can either InsertOrMerge as in the following:

TableOperation insertOrMergeOperation = TableOperation.InsertOrMerge(newFootwear);
TableResult result = table.Execute(insertOrMergeOperation);

Or we have the option to InsertOrReplace as in the following:

TableOperation insertOrReplaceOperation = TableOperation.InsertOrReplace(newFootwear);
TableResult result = table.Execute(insertOrReplaceOperation);

In all the preceding operations, we were only ever working with one table operation at a time. Azure Tables provide a way to do transactions through the means of Entity Group Transactions.

Batch Operations with Entity Group Transactions

A tip that I pointed out earlier as well as in the last article on Blob Storage is that there is an associated cost with each transaction. You might observe that all previous operations have been single operations involving one table entity. In the case were our application has a relationship among entities that are being stored in the same table under the same partition key we might want to do a batch operation that allows a number of atomic operations to occur successfully or none occur at all.

Entity Group Transactions (EGT) is the Azures answer to such a scenario in which we want to do a series of atomic operations at one time and ensure they are all successful or they all fail. There are a number of restrictions involving EGT's. The primary restriction is that all entities involved in a group transaction require having the same Partition Key, that we said earlier when we were discussing how to design our tables. But let's quickly look at what those restrictions are when using EGTs:

All entities involved must share the same Partition Key.
No more than 100 entities can be involved in a single group transaction.
An entity can only appear once within a group transaction and there can only be one operation against the entity.
A group transaction payload can total no more than 4MB.

The TableBatchOperation actually is a collection of single atomic TableOperations that we went over earlier.

TableBatchOperation batchOperations = new TableBatchOperation
{
TableOperation.Insert(new Footwear
{
PartitionKey = "Athletics",
RowKey = "model:038389_7_women",
Brand = "AeroSpeed",
Size = 7,
Gender = "women",
Model = 38389
}),
TableOperation.Insert(new Footwear
{
PartitionKey = "Athletics",
RowKey = "size:07_38389_women",
Brand = "AeroSpeed",
Size = 7,
Gender = "women",
Model = 38389
})
};
table.ExecuteBatch(batchOperations);

This was a simple operation. But a more common example might be that we have a number of entities that we want to persist and we need to ensure that all batched TableBatchOperation(s) share the same PartitionKey and the count does not exceed 100 per operations. Suppose we had more than 100 Footwear we need to process:

IEnumerable<Footwear> footwears = GetFootwear(); //Get some unknown number of footwear objects
TableBatchOperation batchOperations = new TableBatchOperation();
foreach (var footwearGroup in footwears.GroupBy(f => f.PartitionKey))
{
foreach (var footwear in footwearGroup)
{
if (batchOperations.Count < 100)
{
batchOperations.Add(TableOperation.InsertOrReplace(footwear));
}
else
{
table.ExecuteBatch(batchOperations);
batchOperations = new TableBatchOperation {TableOperation.Insert(footwear)};
}
}
table.ExecuteBatch(batchOperations);
batchOperations = new TableBatchOperation();
}
if (batchOperations.Count > 0)
{
table.ExecuteBatch(batchOperations);
}

Here, we group all the entities by partition key, then process all groups in batches of 100. We then ensure if there are any batched operations left over at the end then those are also processed.

Be Sure with Azure .NET – Azure Table Storage (Part 1) first appeared on LockMeDown.com.

Conclusion

We have managed to cover everything from understanding Azure's NoSQL Table Storage, the differences from relational databases, designing tables and doing persistence operations and finally, how to take advantage of batch operations. This was a lot to cover, but it's far from over. The very next installment due out very soon will cover the second half of the listed agenda at the beginning of this article. This includes writing queries against the data, retrying failed table operations, concurrency and security. So stay tuned.

Be Sure with Azure .NET – Azure Table Storage (Part 1) first appeared on LockMeDown.com.

References