Data Warehouse Units (DWH) In Synapse Dedicated Pool

Azure Synapse Pools

Basically, there are two types of pools in Azure Synapse analytics: Serverless SQL Pool and Dedicated SQL Pool. In the serverless model, as you might be aware, the costing is based on the pay-per-usage model and calculated per TB or processing consumed on the run queries. Whereas the cost of Dedicated SQL pools is based on Data Warehousing Units or simply called DWU, based on which the resources will be provisioned. This setting can be configured based on your requirement or the load you are running. The more DWU you set, the more the cost will be incurred. Let’s dive a bit deeper to know what exactly DWU is.

What is Data Warehousing Units (DWU)?

A DWU is a collection of compute or analytic resources allocated for a dedicated pool. It is a combination of three resources - CPU, Memory and IO bundled together. The term DWU represents a measure of compute resources and their performance that you are using for your dedicated pool. The DWU setting can be increased for higher performance and decreased when less performance is required. Apart from scaling up and down, the system can also be paused when there is no activity. Once after the DWU is increased for a larger workload, for instance, we can see the changes in performance based on a few workload metrics and how it behaves post increase of DWUs as below.

  • Ingestion speed from ADLS and storage blobs as this involves network and CPU.
    • Involves Polybase operations with an increase in the number of readers and writers.
  • Query scans and aggregations/transformations for a large number of rows as this involves CPU and IO.
    • Change in performance for scans, aggregations and CTAS commands.
  • CTAS commands for copying and creating a new table since it involves reading from storage, distributing into nodes, and writing back again as heavy CPU, IO and Network are utilized.
    • Accommodating max number of concurrent queries.

Synapse comes with service levels. Gen2 is the latest tier with updated hardware & drives with networked premium storage, which provides five times more memory per query than Gen1 and with unlimited storage capacity. Gen2 also provides a ten times better individual query execution time than the Gen1 tier. One can differentiate both metrics by DWU, which is Gen1 & cDWU belongs to Gen2.

https://azure.microsoft.com/en-in/updates/name-changes-azure-sql-data-warehouse-gen-1-and-gen-2/

How to set the right amount of DWH for your workloads?

The DWU configuration is workload-specific, and it completely depends on your workload performance and how you want the system to handle it. The best approach to set a baseline is to start small and do enough testing by increasing gradually until you hit the right parameter. From basic up to DW500c, are all single compute nodes with varying compute power, which means you are running your load on a single node server. Scale up and down to see the performance levels and adjust the DW setting accordingly and use DW1000c or above for testing any production loads, POC’s on less than 2 nodes won’t be fit enough to prove the loads for production. The best practice is to scale out DWUs before a heavy data load, transformation operations or before peak business hours when you expect a large number of concurrent queries.

Performance level Compute nodes Distributions per Compute node Memory per data warehouse (GB)
DW100c 1 60 60
DW200c 1 60 120
DW300c 1 60 180
DW400c 1 60 240
DW500c 1 60 300
DW1000c 2 30 600
DW1500c 3 20 900
DW2000c 4 15 1200
DW2500c 5 12 1500
DW3000c 6 10 1800
DW5000c 10 6 3000
DW6000c 12 5 3600
DW7500c 15 4 4500
DW10000c 20 3 6000
DW15000c 30 2 9000
DW30000c 60 1 18000

What is the DWU setting of my current database?

SELECT d.name AS DatabaseName, Edition AS DatabaseEdition, service_objective AS ServiceObjective FROM sys.database_service_objectives AS dso INNER JOIN sys.databases AS d ON dso.database_id = d.database_id

How can I change the DWUs of my existing databases?

ALTER DATABASE <<’database_name’>>
MODIFY (SERVICE_OBJECTIVE = 'DW1000c');

To check the status of the DWU changes that you have made:

SELECT * FROM sys.dm_operation_status
WHERE resource_type_desc = <<'database_name'>>
AND major_resource_id =<<’database_name’>>;

Summary

Today we learned about Azure synapse analytics dedicated pool, what DWU is and how it can be assessed and configured for extracting better performance.

Reference

Microsoft official documentation on Azure Synapse dedicated pool and performance management.