External Tables In SQL Server 2016 Or 2017 And Working With PolyBase

We will see how to create an external data source in SQL Server 2016/2017 and then we will create an external table with Parquet file format. Later, we will push the data to the external table. Automatically, we get the data in our mapped Parquet file of Blob storage.

In this article, we will see how to create an external data source in SQL Server 2016/2017 and then, we will create an external table with Parquet file format. Later, we will push the data to the external table. We get the data in our mapped Parquet file of Blob storage automatically.

External Tables in SQL Server 2016 are used to set up the new Polybase feature with SQL Server. With this new feature (Polybase), you can connect to Azure blog storage or Hadoop to query non-relational or relational data from SSMS and integrate it with SQL Server relational tables.

PolyBase 
is a technology that accesses the data outside of the database via the T-SQL language. In SQL Server 2016, it allows you to run queries on external data in Hadoop or to import/export data from Azure Blob Storage. Queries are optimized to push computation to Hadoop. In Azure SQL Data Warehouse, you can import/export data from Azure Blob Storage and Azure Data Lake Store.

 Polybase can:

  • Query the data stored in Hadoop from SQL Server or PDW. Users prefer storing the data in cost-effective distributed and scalable systems, such as Hadoop. PolyBase makes it easy to query the data by using T-SQL.

  • Query the data stored in Azure Blob Storage. Azure blob storage is a convenient place to store the data for use by Azure services. PolyBase makes it easy to access the data by using T-SQL.

  • Imports the data from Hadoop, Azure Blob Storage, or Azure Data Lake Store Leverage the speed of Microsoft SQL's columnstore technology and analysis capabilities by importing the data from Hadoop, Azure Blob Storage, or Azure Data Lake Store into relational tables. There is no need for a separate ETL or import tool.

  • Exports the data to Hadoop, Azure Blob Storage, or Azure Data Lake Store. Archives data to Hadoop, Azure Blob Storage, or Azure Data Lake Store to achieve cost-effective storage and keep it online for easy access.

  • Integrates with BI tools. Use PolyBase with Microsoft’s business intelligence and analysis stack or use any third-party tools that are compatible with SQL Server.

Performance

  • Push computation to Hadoop: The query optimizer makes a cost-based decision to push computation to Hadoop, and when doing so will improve query performance. It uses statistics on external tables to make the cost-based decisions. Pushing computation creates MapReduce jobs and leverages Hadoop's distributed computational resources.

  • Scale compute resources: To improve query performance, you can use SQL Server PolyBase scale-out groups. This enables parallel data transfer between SQL Server instances and Hadoop nodes, and it adds computing resources for operating on the external data.

PolyBase is supported from SQL Server 2016 onwards only. While installing the SQL Server you must choose the PolyBase option in the installer.

External Tables in SQL Server 2016 or 2017 and work with PolyBase

PolyBase needs Java Runtime. You must install the Java Runtime in your system before installing SQL Server 2016/2017. Otherwise, the validation will fail.

External Tables in SQL Server 2016 or 2017 and work with PolyBase 

I have already installed Java 8 on my system and v
alidation has succeeded for me. Now, we can continue the installation and it will take some time to finish the installation. You can open the Windows Services and there are two services found in the list.

External Tables in SQL Server 2016 or 2017 and work with PolyBase 

You must ensure both these services are running along with the SQL Server service.

In this article, we will create an external table mapping with one Parquet file which will be hosted in Azure Blob storage. We will transfer some sample data to this Parquet file.

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model, or programming language.

We must create a Blob Storage in Azure.

I have a blob storage and one container inside the blob storage.

External Tables in SQL Server 2016 or 2017 and work with PolyBase 

We will use this storage account and container for external table creation.

Let’s start.

Step 1

Enable configuration for allowing PolyBase 
export.
  1. EXEC sp_configure 'allow polybase export', 1;  
  2. RECONFIGURE  

Step 2

Create a symmetric Master Key.

  1. --Creating Master Key  
  2. CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Password!'  
Step 3

Create a scoped credential. We must give the storage account key here.
  1. CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential  
  2. WITH  
  3. IDENTITY = 'user',  
  4. SECRET = 'cSl2qSNy8OTMIPS3H0xS9Wf5G2moM2kZ9FzyoZhZegfTxhTr8Eh+FHss+tK0oZCjFTGUdrCqKHe48ndRuLJZJA==' -- Please change the key with your Secret  
Step 4

Create an external data source. We must 
give our Azure Blob Storage account and container here.
  1. -- Creating External Data Source  
  2. CREATE EXTERNAL DATA SOURCE PolybaseDS  
  3. WITH (  
  4. TYPE = HADOOP,  
  5. LOCATION = 'wasbs://parquetcontainer@polybaseexternaldata.blob.core.windows.net'-- Please change with your container and storage account  
  6. CREDENTIAL = AzureStorageCredential)  
Step 5

Create
an external table file format. Here, we are giving parquet as format.
  1. --Creating External File Format   
  2. IF NOT EXISTS ( SELECT * FROM sys.external_file_formats WHERE name = 'parquetformat' )  
  3.  BEGIN  
  4.  CREATE EXTERNAL FILE FORMAT parquetformat  
  5. WITH (  
  6.    FORMAT_TYPE = PARQUET,  
  7.    DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'  
  8. );  
  9.  END  
If you check the external resource in the database, you can find there is one external data source and external file format now created.

External Tables in SQL Server 2016 or 2017 and work with PolyBase 
Step 6

Create
an external table using your existing external data source.
  1. --Creating External Table with Parquet file mapping  
  2. CREATE EXTERNAL TABLE [dbo].[tbl_External] (  
  3.    [id] [intNULL,  
  4.    [name] [varchar] (50)  
  5. )  
  6. WITH( LOCATION = '/tbl_Created_FromSQL.parquet',DATA_SOURCE = PolybaseDS, FILE_FORMAT = parquetformat,REJECT_TYPE = VALUE,REJECT_VALUE = 0);  
This will create an external table in dbo schema and will serve as a middleware between our data transferring from SQL server to Parquet file.

Step 7

Insert some sample data to the external table.
  1. --Insert values to External Table  
  2. INSERT INTO tbl_External (id,nameVALUES (1,'Sarath Lal')  
  3. INSERT INTO tbl_External (id,nameVALUES (1,'Anil Kumar')  
  4. INSERT INTO tbl_External (id,nameVALUES (1,'Satheesh Mohan')  
This will automatically transfer to our parquet file through PolyBaseIf you check the blob storage, you can find the new parquet file created now.

External Tables in SQL Server 2016 or 2017 and work with PolyBase 

Parquet is a 
multi-files format, so it contains many files.

External Tables in SQL Server 2016 or 2017 and work with PolyBase 

Currently, PolyBase 
supports the following file formats.
  • Delimited Text (CSV)
  • Hive RCFile
  • Hive ORC
  • Parquet
We faced some limitations and issues while working with PolyBase. The issues are already raised to Microsoft and I will cover those details in another article. There are lots of other features available with PolyBase, that is a matter of another article. 

I have attached the entire SQL scripts as a zip file. You can create one storage account in Azure and check with your SQL Server 2016/2017.