Cost Implications of Mirroring Azure Databricks Unity Catalog Data into Microsoft Fabric

Abiola David
Jul 18
1.1k
0
0

Article

As organizations increasingly adopt Microsoft Fabric as their unified data platform, a common integration scenario involves mirroring data from Azure Databricks Unity Catalog (UC) into Microsoft Fabric. While this integration simplifies analytics workflows, it raises an important question: Who pays for the data movement, and how is it billed?

This article explains the cost implications of mirroring Unity Catalog data from Azure Databricks to Microsoft Fabric, helping you plan and optimize data architecture with minimal surprises.

What Is Mirroring in Microsoft Fabric?

Mirroring in Fabric is a mechanism that connects to an external data source (like Azure Databricks), syncs its metadata and optionally its data, and makes it available within OneLake, Microsoft Fabric's unified data lake.

Fabric supports read-only mirroring and continuous sync for Unity Catalog-managed delta tables.

How the Data Flow Works

When you mirror data from Azure Databricks UC into Fabric:

Fabric connects to Azure Databricks using a secure integration (e.g., Managed Private Endpoint).
Metadata (schema, partitions) is synced.
Data is optionally read from the source and copied into OneLake in delta format.

Cost Responsibility Breakdown

1. Ingress Cost (Microsoft Fabric)

Microsoft Fabric ingests data into OneLake.
Fabric does not charge separately for data ingress.
Instead, usage is counted toward Fabric Capacity and OneLake storage.

Verdict: Fabric incurs the cost. You pay via your Fabric license (F SKU capacity).

2. Egress Cost (Azure / Databricks Side)

This is a crucial part to understand.

If Fabric and Databricks UC are in the same Azure region, and the connection uses Azure backbone (private):
- No outbound data egress charges apply.
If cross-region or public internet is used:
- Azure data egress charges may apply based on the volume of data.
- Egress rates vary by region and usage tier.

Best Practice: Always deploy Fabric and Databricks workspaces in the same Azure region and use Private Endpoints to avoid data egress fees.

3. OneLake Storage Cost

Once data is mirrored, it resides in OneLake as delta tables.

Billed by storage consumed.
Compressed parquet + delta format reduces size overhead.

Tip: Monitor OneLake usage in the Fabric Admin Portal.

4. Compute Cost for Syncing

The actual mirroring and metadata sync in Fabric:

Is handled by Fabric’s internal compute engine.
Uses the capacity assigned to your workspace (F SKU).
No additional charge unless you scale capacity.

Summary Table

Cost Component	Incurred By	Notes
Data Ingress to Fabric	Microsoft Fabric	Included in capacity licensing
OneLake Storage	Microsoft Fabric	Billed per GB stored after mirroring
Fabric Compute (Sync Engine)	Microsoft Fabric	Covered by F SKU capacity
Azure Data Egress	Azure (Databricks side)	Avoided if a same-region & private connection is used

Recommendations

To optimize cost and performance:

Deploy Fabric and Databricks in the same Azure region.
Use Private Endpoints or VNet integration to avoid egress.
Mirror only required datasets to minimize storage and sync overhead.
Monitor Fabric capacity and storage usage for accurate forecasting.

Final Thoughts

Mirroring Databricks UC data to Microsoft Fabric is a powerful integration that enables unified analytics without compromising data governance. With careful planning—especially around network topology and region alignment—you can achieve this with minimal or no data egress charges, while keeping total cost of ownership predictable within the Fabric ecosystem.

Have more questions or want to see an architecture diagram for your use case? Feel free to reach out!