Masking Sensitive Data With SQL Server 2016

We'll see how the sensitive data in SQL Server 2016 can be masked from the outer world using Dynamic Data Masking.

Introduction

SQL Server

SQL Server 2016 has many features which you can use to make your data and application more secure, more maintained and easily retrieved as per your requirements. It has introduced both client-side security such as Always Encrypted as well as server-side security, such as Data masking etc. In this post, we’ll examine how Dynamic Data Masking can help secure our data.

Earlier we were masking the sensitive data by applying logic at application end or by replacing characters at SQL Server end and then returning the masked data to the application. In the former case, the risk was at,

  1.  When application fails to mask data
  2. Sensitive data transferred over network
  3. All the users in SQL Server with SELECT permission still can access the unmasked data

In the latter case, the first two risks are eliminated but still, there is a chance that the sensitive data can be revealed by the SQL Server user through which the application retrieves data.

  1. When application fails to mask data
  2. Sensitive data transferred over network
  3. All the users in SQL Server with SELECT permission still can access the unmasked data

So, let’s quickly jump into the next section to understand what Dynamic Data masking is and how it helps to eliminate all these chances of unmasked data getting revealed.

Data Masking

If you search for a definition of Data Masking over the web, you’ll end up with varied descriptions. The simplest and the straightforward definition can be –

“Data masking is the process of hiding original data with random characters or data.”
                                                                                                                   --Wikipedia

If we look further, there are various data masking techniques available such as -

  • Shuffling: Shuffling order of the characters in the value.
    Ex: 12345 -----> 35312

  • Nulling: Substituting characters in the value with null (hash) symbol.
    Ex: 12345 -----> ###45

  • Substitution: Substituting value with another value from a substitution table.
    Ex: Suvendu Giri -----> John Ptak

  • Masking Out: Masks the complete data or a selective part of the data.
    Ex: suvendu@mydomain.com -----> suvendu@xxxx.xxx

Considering many techniques of data masking they can be categorized into 3 types,

  • Static Data Masking: Original data in production changed to masked data.
  • On-the-fly Data masking: Copies data from one source to another source and masking done on the later.
  • Dynamic data masking: Masking done dynamically at runtime.

Dynamic Data Masking

“Dynamic data masking limits sensitive data exposure by masking it to non-privileged users. Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a data protection feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed.”
--- MSDN

So, dynamic data masking alters the result for non-privileged users while streaming and not with data in the production database.

Data Masking in MySQL

As far I know, until the recent version of MySQL, there is no inbuilt mechanism to provide dynamic data masking capabilities. The possible options for MySQL users seems to be -

  1. Use 3rdparty tools
  2. Use custom scripts/functions

Examples

  1. SELECT '****' AS PhoneNumber FROM `Employees`  

--Input: 1234567890        Output: ****

  1. SELECT REPEAT('*', CHAR_LENGTH(PhoneNumber) - 6) AS PhoneNumber FROM `Employees`  

--Input: 1234567890                        Output: **********

  1. SELECT CONCAT(SUBSTR(PhoneNumber, 1, 4), REPEAT('*', CHAR_LENGTH(PhoneNumber) - 4)) AS PhoneNumber FROM `Employees`  

--Input: 1234567890                        Output: 1234******

Data Masking in PostgreSQL

Same as MySQL, it seems they are still lacking this feature. However, we can make use of some custom scripts or user-defined functions.

The Azure SQL Database too supports the dynamic data masking similar to SQL Server and I have heard from my friends that Oracle provides some kind of mechanism for data masking, although I haven’t ever used it.

Dynamic Data Masking in SQL Server 2016

There are four masking functions available in SQL Server 2016 to provide separate ways to mask your data when they are returned by a query. These are,

Default

When you want all the characters to be masked, you can use Default masking function. The characters for masking and the number of masked characters to be returned by a query is determined by the following logic-

  • XXXX – four Xs if the data length is more than or equals to four.
  • 0 in case of numeric datatype
  • 0 in case of the binary data type
  • 01/01/1900 for DateTime datatypes

Syntax

  1. MASKED WITH (FUNCTION = 'default()')  

Creating table

  1. CREATE TABLE [dbo].[Employee](  
  2. [SecretCode] VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),  

-- other fields

Modifying column –

  1. ALTER TABLE [dbo].[Employees]  
  2. ALTER COLUMN [SecretCode] ADD MASKED WITH (FUNCTION = 'default()');  

Example

Let’s check these functionalities using a Table variable.

  1. DECLARE @Employees AS TABLE  
  2. (  
  3.      SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),  
  4.      NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),  
  5.      DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()')                 
  6. )  
  7.   
  8. INSERT INTO @Employees  
  9. SELECT 'ABCDEFGHIJ', 80000,'1990-05-21'  
  10.   
  11. SELECT * FROM @Employees  

Result

SecretCode        NetSalary            DateOfBirth

xxxx              0.00                 1900-01-01 00:00:00.000

NOTE

You need to check this with a non-admin user as admin users are granted to see all the data. So, if you will execute this using an admin user then you’ll see no difference. Let’s say your non-admin user is ‘demouser’, then add the following line at the top of the query.

  1. EXECUTE AS USER='DemoUser'  

Email

It masks partially the characters of an email id. The interesting thing to notice is, it masks characters as well as the length of the email id making it impossible to predict the email id from the few displayed characters.

Syntax

  1. MASKED WITH (FUNCTION = 'email()')  

Example

Let’s modify the same example to add an emailed column as follows.

  1. DECLARE @Employees AS TABLE  
  2. (  
  3.      SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),  
  4.      NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),  
  5.      DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()'),  
  6.      EmailID VARCHAR(50) MASKED WITH (FUNCTION = 'email()')  
  7. )  
  8.   
  9.   
  10. INSERT INTO @Employees  
  11. SELECT 'ABCDEFGHIJ', 80000,'1990-05-21''suvendugiri@mydomain.in'  
  12.   
  13. SELECT * FROM @Employees  

Result

SecretCode    NetSalary     DateOfBirth                EmailID

xxxx          0.00          1900-01-01 00:00:00.000    sXXX@XXXX.com

Partial

It masks the data based on custom inputs to the function. It masks the data partially. Three parameters need to be passed to this masked function such as

  1. Prefix: Number of characters that have to be shown from the start.
  2. Padding: Characters that you need to show in between prefix and suffix
  3. Suffix: Number of characters that have to be shown from the end.

Syntax

  1. MASKED WITH (FUNCTION = 'partial(prefix,padding,suffix)')  

Example

Further adding a new column to the same example as,

  1. DECLARE @Employees AS TABLE  
  2. (  
  3.       SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),  
  4.       NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),  
  5.       DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()'),  
  6.       EmailID VARCHAR(50) MASKED WITH (FUNCTION = 'email()'),  
  7.       FullName VARCHAR(50) MASKED WITH (FUNCTION = 'partial(2,"***",2)')  
  8. )  
  9.   
  10. INSERT INTO @Employees  
  11. SELECT 'ABCDEFGHIJ', 80000,'1990-05-21''suvendugiri@mydomain.in','Suvendu Shekhar Giri'  
  12.   
  13. SELECT * FROM @Employees  

Result

SecretCode   NetSalary   DateOfBirth              EmailID         FullName

xxxx         0.00        1900-01-01 00:00:00.000  sXXX@XXXX.com   Su***ri

Random

Masks any numeric value with a random value based on the provided upper and lower boundary values as parameters to the masked function.

Syntax

  1. MASKED WITH (FUNCTION = ‘random(lower_bound,upper_bound)  

Example

  1. DECLARE @Employees AS TABLE  
  2. (  
  3.      SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),  
  4.      NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),  
  5.      DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()'),  
  6.      EmailID VARCHAR(50) MASKED WITH (FUNCTION = 'email()'),  
  7.      FullName VARCHAR(50) MASKED WITH (FUNCTION = 'partial(2,"***",2)'),  
  8.      Age INT MASKED WITH (FUNCTION = 'random(90,100)')  
  9. )  
  10.   
  11. INSERT INTO @Employees  
  12. SELECT 'ABCDEFGHIJ', 80000,'1990-05-21''suvendugiri@mydomain.in','Suvendu Shekhar Giri',32  
  13.   
  14. SELECT * FROM @Employees  

Result

SecretCode  NetSalary  DateOfBirth              EmailID        FullName  Age

xxxx        0.00       1900-01-01 00:00:00.000  sXXX@XXXX.com  Su***ri   93

If you want a user to see the values stored in database same as admin, you can grant access to the user using the following command.

  1. GRANT UNMASK TO DemoUser  

You can again apply the restriction to see masked the data by the following command.

  1. REVOKE UNMASK TO DemoUser  

To remove the masked function from a column you can drop it similar to other constraints like -

  1. ALTER TABLE Employees  
  2. ALTER COLUMN FullName DROP MASKED  

Conclusion

Dynamic Data Masking in SQL Server 2016 is really a promising feature and helps in avoiding to write so many custom scripts so as to mask the sensitive data. The role/access based viewing data makes perfect sense and is very useful. Currently, the access for a user can be given on database. Would be great if there can be a mechanism to give these access based on table or schema.

I would like to know about your feedback, comments or suggestions.

Thanks for reading :)