Ingest Data To Azure SQL Database Using Azure Databricks

Introduction

In this tutorial, we are going to discuss multiple ways to connect to Azure SQL Databases from Azure Databricks. We will also go through the code for each method.

Azure SQL Database connectivity with Azure Databricks

There are multiple ways to set up connectivity from Azure Databricks to Azure SQL Database.

SQL Database Connectivity using pyodbc with local user authentication and password

We can use pyodbc to establish connectivity and ingest data to Azure SQL Database using pyodbc.

Source code looks as follows,

import pyodbc
# Setup the connection string for Azure SQL Database
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER={servername};DATABASE={databasename};Trusted_Connection=yes;'
# Create a cursor object from the connection string and execute the SQL commands
cursor=connection.cursor()
cursor.execute("{define your sql query}")
# Get the Query Result
while 1:
    row = cursor.fetchone()
    if not row:
        break
    print(row.version)

In order to use the pyodbc to connect and work with Azure SQL Database, we have to first import the pyodbc python package. Once the python package is imported, we have to define the SQL Database connection string using pyodc using SQL Server Name, Azure SQL Database Name. Once the connection configurations are created, we have to create a cursor object to set up the connection strings and execute the SQL commands. We can access the cursor query execution output directly or write a loop to read each line one by one.

SQL DataBase connectivity using pyodbc with Service Principal Authentication

In order to use Azure Service Principal using pyodbc with Azure SQL Database, there are a few pre-requisites,

Azure Service Principal should be created as a user of the Azure SQL Database.

Relevant roles to Azure Service Principal user.

context = adal.AuthenticationContext(authority)
token = context.acquire_token_with_client_credentials(resource_app_id_url, service_principal_id, service_principal_secret)
access_token = token["accessToken"]

jdbc_db = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", url) \
        .option("dbtable", table_name) \
        .option("accessToken", access_token) \
        .option("encrypt", "true") \
        .option("hostNameInCertificate", "*.database.windows.net") \
        .load()

Here, we have to provide Azure AD Service Principal Name and password to generate the Azure AD access token and use this token to connect and query Azure SQL Database.

Using Pyspark to connect to Azure SQL Database

Apache Spark also enables us to easily read and write Parquet files to Azure SQL Database.

df.write
  .mode("overwrite")
  .format("jdbc")
  .option("url", f"jdbc:sqlserver://{servername}.database.windows.net;databaseName={databasename};")
  .option("dbtable", "{tablename}")
  .option("user", {localusername})
  .option("password", {localpassword})
  .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
  .option("customSchema", "sqlschemadetails")
  .save()

We have to specify the JDBC connection string including SQL user name and password along with the schema name. 

But while setting up the connectivity from other services to the Azure SQL database, It is better to follow the below recommendations to configure Azure SQL Database securely.

Create firewall rules for Azure SQL Database

Once we enable Azure firewall rules for Azure SQL Database, it will basically disable all the connectivity from the outside world to Azure SQL Database. Only the connection IP mentioned in the firewall list will be able to connect to the Azure SQL server and database.

Additionally, also set Allow Access to off for the SQL server configurations. All the communication to the SQL server is done via port 1433.

Access to Azure SQL Database using Azure AD Groups

We can create local users inside Azure SQL Database and use this username and password for the authentication to Azure SQL DB. But this is not the recommended way to authenticate Azure SQL DB is via Azure AD Integration using a single sign-on.

Enable Security Features for Azure SQL Database

Azure Defender for SQL

Azure SQL database has an inbuilt feature that will detect all potential threats and provide security alerts,

Ingest Data To Azure SQL Database Using Azure Databricks

Auditing

The auditing feature will automatically track all the logs for various operations that happen in the Azure SQL Database. Audit logs can be stored in the Azure Storage, Monitor logs, or Azure Event Hub.

Ingest Data To Azure SQL Database Using Azure Databricks

Enable Dynamic Data Masking for sensitive data

In order to avoid exposing sensitive data to the end-users, we can enable dynamic data masking.

Ingest Data To Azure SQL Database Using Azure Databricks

Transparent Data Encryption

This feature automatically encrypts all the data at rest and it doesn't require any manual action to read the encrypted database.

Conclusion

So, in this article, we explored multiple ways to connect to the Azure SQL database and we have also explored code for the same. Additionally, we have also explored what are the best security practices to configure Azure SQL server and SQL database from the configuration perspective.