Azure Synapse Analytics Security - Dynamic Data Masking

Introduction

Dynamic data masking is a feature that is available in Azure Synapse Analytics. It restricts the exposure of sensitive data to end users. We can configure data masking to hide sensitive data in the result sets that are queries from users. Using data masking, we can not only restrict, also specify the amount of sensitive data that can be revealed with a lower impact on the application layer.

Dynamic Data Masking – Key points

The dynamic data masking feature works by replacing sensitive data in transit without touching the source data (data at rest). This provides great benefits for data management teams, as they can always preserve the original data.

  • Works in near-real time and throws out the results without too much delay.
  • Useful in protecting data, particularly in reporting/read-only scenarios.
  • No need to process a copy of original data and setup processing to mask all data in advance.
  • Acts as a layer of security for protecting sensitive data stored in the tables.
  • Some performance overhead will be unavoidable, as all traffic to the database will be inspected.
  • Users can bypass the layer by directly connecting to the database, and can query the data unmasked, thereby rendering the purpose useless.

Steps

I have created a dedicated pool in synapse and created a table with few rows for demo. Remember that you can create only an external table, and cannot create a user table, in a built-in pool. So, make sure you create a dedicated pool before proceeding.

On the left side stack in your synapse workspace, select SQL Pools and then open your dedicated pool.

After that, you will see a similar page, which is the dedicated pool settings page. On the left side column, click on the Dynamic Data Masking option under Security tab.

In the next option, you can add your tables into masking using the given options. The numerical banners in the screenshot are explained below.

  1. Add the tables into masking - more about this in the next section.
  2. The masking rules that already exist, that you have created previously.
  3. You can exclude any user, apart from users with administrator roles, from masking. They will see the data unmasked when they try to query the data.

Let us see how to select the tables using an +Add Mask button.

Basically, I am trying to mask two columns from the Employees table. As shown below, once you have chosen the column, you will have the option to select the masking format.

In the above image, I selected the default value format for the column Job and set the masking condition to have the contents on its own for all column values. Similarly, I added one more column, Salary, and set it to be the random number range format so it will randomly fill the data with digits from 0 to 1.

For testing purposes, you can also use a synapse query window, synapse notebook, Azure data studio or SSMS to run queries. For this demo, I connected the dedicated pool in SQL Server management studio to show you how well a synapse database can be handled from SSMS. For SQL database users, this could be especially easy to grasp.

When I selected all the rows from employee table, we see that both the columns Job and Salary do not mask the data. Instead, because the login I did this from has administrator privilege, and by default, all logins with admin privileges can query and view the data without masking. So, I will create a new login with data_reader permission to check.

  1. Create the login from the master database.
  2. Change the database context to our user database to create a user account corresponding to the login we created.
  3. In the same database context, run the add role command to add the privilege to the created user.

In a new window connected by the new user account created, test the command.

Now, you can see how the database is being masked for both columns.

Summary

This is the basic practical implementation of dynamic data masking feature in the azure synapse. This feature is common for all Microsoft data products, such as SQL Server, Azure SQL, Azure SQL managed instance and Azure synapse analytics. I hope the live demo will come handy, in case you want to implement such methods.