Data Synchronization Between Cloud and On-Prem SQL Databases

Rajesh Gami
Nov 12
3.2k
0
0

Article

Introduction

In modern enterprise systems, it’s common to have data distributed across multiple environments — such as on-premises SQL Servers and cloud-hosted databases like Azure SQL Database or Amazon RDS for SQL Server.
This hybrid setup provides scalability, performance, and security — but it also introduces a crucial challenge: keeping the data synchronized between both environments.

This article explores why synchronization is needed, various strategies to achieve it, and a technical implementation guide using SQL Server, Azure Data Sync, and custom APIs in ASP.NET Core.

Why Synchronization Is Needed

Many organizations are in the process of moving from on-prem systems to the cloud. During this migration — or even after partial migration — both environments often remain active.

Some common scenarios include:

Scenario	Description
Hybrid Applications	Some modules run on the cloud, others still rely on on-prem SQL.
Disaster Recovery (DR)	Keeping cloud data synchronized with on-prem for failover.
Reporting and Analytics	Cloud systems run analytics while on-prem systems handle transactions.
Gradual Cloud Adoption	Businesses that can’t migrate all systems at once.

Hence, synchronization ensures data consistency, integrity, and availability across both sides.

Challenges in Synchronization

While the concept seems simple — “keep data the same in two databases” — the technical execution can get complex due to:

Latency and Network Constraints: Cloud connectivity may not always be stable.
Conflict Resolution: Changes can occur simultaneously on both sides.
Schema Differences: Column types or names may differ across databases.
Data Volume: Handling millions of rows can lead to performance bottlenecks.
Security and Compliance: Data transfer must be encrypted and compliant.

To handle these challenges, we need a robust synchronization strategy.

Synchronization Strategies

There are several proven strategies to synchronize SQL data between on-prem and cloud databases. Let’s explore them in detail.

1. SQL Server Replication

SQL Server supports various types of replication:

Transactional Replication: Real-time updates from on-prem to cloud.
Merge Replication: Bi-directional sync, suitable when both systems can update data.
Snapshot Replication: Periodic full sync, good for static or reference data.

Pros

Natively supported by SQL Server.
Configurable conflict resolution.

Cons

Requires setup and maintenance of replication agents.
Not ideal for large schema or high-latency networks.

2. Azure SQL Data Sync

Azure SQL Data Sync is a Microsoft-managed service for synchronizing between Azure SQL Database and SQL Server (on-prem or Azure VM).

It uses a hub-and-spoke model:

One database acts as the Hub (usually in Azure).
Other databases act as Members.
Sync groups define tables, direction, and frequency.

Pros

Easy to configure via Azure Portal.
Handles conflict detection.
Secure with built-in encryption.

Cons

Works only with Azure SQL (not AWS RDS or other clouds).
Limited fine-grained control over sync logic.

3. Custom Sync via Change Tracking / Change Data Capture (CDC)

If you need more flexibility, you can build a custom synchronization service using SQL Server features like:

Change Tracking – Lightweight, tracks which rows changed.
Change Data Capture (CDC) – Tracks both old and new values for changed rows.

This allows your application to:

Fetch deltas since the last sync.
Push updates via APIs or direct DB connection.
Handle conflicts in custom logic.

Pros

Full control over sync logic.
Works across any SQL Server versions or environments.
Extensible for custom conflict handling.

Cons

Requires manual implementation.
More code maintenance.

4. API-Based Synchronization (ASP.NET Core)

In some architectures, data synchronization is done via secure REST APIs instead of direct DB-to-DB communication.

Flow

On-prem application detects change.
It pushes updated data via API to cloud.
Cloud acknowledges and stores the record.

This approach is ideal when:

Security policies restrict direct DB connections.
You want an audit trail and retry mechanism.
You need partial data synchronization.

Technical Workflow (Flowchart)

Below is the technical flow for a custom synchronization system using SQL Change Tracking + ASP.NET Core APIs.

 ┌────────────────────┐
 │ On-Prem SQL Server │
 └───────┬────────────┘
         │ (Detect Changes using Change Tracking)
         ▼
 ┌────────────────────┐
 │ Sync Agent Service │
 │ (Windows Service / │
 │ Background Job)    │
 └───────┬────────────┘
         │ (Push JSON payload)
         ▼
 ┌────────────────────┐
 │ ASP.NET Core API   │
 │ /api/sync/upload   │
 └───────┬────────────┘
         │ (Validate + Map Data)
         ▼
 ┌────────────────────┐
 │ Cloud SQL Database │
 │ (Azure SQL / RDS)  │
 └────────────────────┘

Implementation Example

Let’s go through a step-by-step implementation using SQL Change Tracking and ASP.NET Core APIs.

Step 1: Enable Change Tracking in SQL Server

ALTER DATABASE [MyOnPremDB]
SET CHANGE_TRACKING = ON  
(CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON);

ALTER TABLE dbo.Customer  
ENABLE CHANGE_TRACKING  
WITH (TRACK_COLUMNS_UPDATED = ON);

This will start tracking inserts, updates, and deletes.

Step 2: Create a Sync Log Table

CREATE TABLE SyncLog (
    Id INT IDENTITY PRIMARY KEY,
    TableName NVARCHAR(100),
    LastSyncVersion BIGINT,
    SyncDate DATETIME DEFAULT GETDATE()
);

You’ll store the last synchronized version number here.

Step 3: Fetch Changes Since Last Sync

DECLARE @last_sync BIGINT = 
    (SELECT ISNULL(LastSyncVersion, 0) FROM SyncLog WHERE TableName = 'Customer');

DECLARE @current_version BIGINT = CHANGE_TRACKING_CURRENT_VERSION();

SELECT c.CustomerId, c.Name, c.Email, c.LastModified
FROM dbo.Customer AS c
JOIN CHANGETABLE(CHANGES dbo.Customer, @last_sync) AS CT
    ON c.CustomerId = CT.CustomerId;

Step 4: Push Data to Cloud via ASP.NET Core API

API Controller (ASP.NET Core)

[ApiController]
[Route("api/[controller]")]
public class SyncController : ControllerBase
{
    private readonly IConfiguration _config;
    private readonly SqlConnection _connection;

    public SyncController(IConfiguration config)
    {
        _config = config;
        _connection = new SqlConnection(_config.GetConnectionString("CloudDB"));
    }

    [HttpPost("upload")]
    public async Task<IActionResult> UploadChanges([FromBody] List<Customer> customers)
    {
        foreach (var customer in customers)
        {
            using var cmd = new SqlCommand("spUpsertCustomer", _connection)
            {
                CommandType = CommandType.StoredProcedure
            };
            cmd.Parameters.AddWithValue("@CustomerId", customer.CustomerId);
            cmd.Parameters.AddWithValue("@Name", customer.Name);
            cmd.Parameters.AddWithValue("@Email", customer.Email);
            await _connection.OpenAsync();
            await cmd.ExecuteNonQueryAsync();
            await _connection.CloseAsync();
        }

        return Ok(new { Status = "Success", Count = customers.Count });
    }
}

Step 5: Cloud Database Stored Procedure

CREATE PROCEDURE spUpsertCustomer
    @CustomerId INT,
    @Name NVARCHAR(100),
    @Email NVARCHAR(100)
ASBEGIN
    IF EXISTS (SELECT 1 FROM Customer WHERE CustomerId = @CustomerId)
        UPDATE Customer
        SET Name = @Name, Email = @Email
        WHERE CustomerId = @CustomerId;
    ELSE
        INSERT INTO Customer (CustomerId, Name, Email)
        VALUES (@CustomerId, @Name, @Email);
END;

Step 6: Update Sync Log

After every successful sync:

UPDATE SyncLog
SET LastSyncVersion = @current_version, SyncDate = GETDATE()
WHERE TableName = 'Customer';

Conflict Resolution Strategy

If both sides can update data, you need a conflict resolution rule.
Common strategies include:

Last Write Wins (LWW): The most recent update overwrites older data.
Source Priority: Cloud or on-prem always wins.
Merge Policy: Combine field-level changes (e.g., append new comments).
Manual Review: Log conflicting records for admin intervention.

Performance Optimization Tips

Use batching: Send records in small chunks (e.g., 500 at a time).
Compress payloads: Use GZip or Brotli in HTTP requests.
Track only necessary columns: Avoid full-table comparisons.
Run sync off-peak hours: Schedule jobs during low user activity.
Use async APIs: Avoid blocking sync agents.

Security Considerations

Always use HTTPS for API communication.
Encrypt sensitive data (AES, TLS).
Implement API Key / OAuth2 authentication.
Maintain audit trails for sync operations.
Restrict access using firewall and VPN connections.

Testing and Monitoring

Before going live:

Validate both sides’ schema and indexes.
Test with sample data changes (inserts, updates, deletes).
Monitor sync frequency, data latency, and conflict logs.
Use SQL Profiler and Application Insights for diagnostics.

Conclusion

Data synchronization between on-prem and cloud SQL databases is critical for hybrid and enterprise applications.
By combining SQL Change Tracking, secure APIs, and automated background jobs, you can build a reliable, scalable, and maintainable synchronization pipeline.

While tools like Azure Data Sync simplify configuration, custom sync services offer unmatched flexibility — especially when integrating with ASP.NET Core APIs and custom business rules.

The right approach depends on your data size, change frequency, and compliance requirements, but with careful design, real-time hybrid data synchronization is absolutely achievable.