Introduction
Modern enterprise systems — collaboration platforms, document management, CAD repositories and content delivery networks — need robust file synchronization that works at scale. Users expect near-real-time sync across many devices, efficient use of bandwidth, reliable version history, and safe conflict handling. A naive upload-everything approach soon breaks: it consumes bandwidth, increases storage cost and forces long sync delays.
This article describes a production-ready architecture and implementation patterns for a High-Scale File Sync Service that:
Detects local and server-side changes efficiently
Transfers only deltas (changed chunks) instead of whole files
Tracks versions, parents, and history reliably
Resolves conflicts safely and provides clear UX for users
Works offline and supports resumable syncs
Scales horizontally and secures data at rest and in transit
Examples and code snippets use Angular for the client and .NET (ASP.NET Core) for the backend. Diagrams follow the block-style format you chose.
Goals and Non-Goals
Goals
Minimise uploaded bytes using chunking and delta/diff transfer
Provide exact version history with facile replay and rollback
Support concurrent edits with deterministic conflict strategies
Keep sync latency low and resource usage predictable
Offer robust retry, resume and partial-sync behaviour
Non-Goals
High-Level Architecture
┌──────────────────────────────┐
│ Angular Client(s) │
│ (Detect change, chunking, │
│ delta upload, conflict UI) │
└───────────────┬──────────────┘
│ HTTPS / WebSocket / SignalR
▼
┌───────────────┴──────────────┐
│ Sync Gateway API │
│ (auth, request validation, │
│ routing, tenant resolution)│
└───────┬────────────┬─────────┘
│ │
│ ▼
│ ┌──────────────┐
│ │ Sync Engine │
│ │ (delta, queue,│
│ │ versioning) │
│ └────┬─────────┘
│ │
┌───────▼──────┐ ┌─▼──────────┐
│ Chunk Storage│ │ Metadata DB│
│ (object store│ │ (versions, │
│ S3/Blob) │ │ locks, refs)│
└──────────────┘ └─────────────┘
│ │
▼ ▼
CDN / Edge / Monitoring,
Backup Metrics, Audit
Core Components
1. Change Detector (Client Side)
Detect file changes using file system watchers or application-level events. For robust delta transfer, compute chunk-level hashes:
Split file into fixed-size chunks (e.g., 4 MiB) or use variable-size rolling chunks (Rsync-style; better for insertions).
Compute SHA-256 (recommended) per chunk. Keep list chunkHashes[].
Client sends minimal metadata: file id (or path), file size, last-modified, chunkHashes, and root hash.
Why chunk hashing on client?
2. Delta Engine (Server Side)
Delta Engine determines which chunks are new and responds with an upload plan:
3. Chunk Storage / Blob Store
Store chunks in object store (S3, Azure Blob, GCS) using content-addressable keys: sha256(chunk) or fileId/version/chunkIndex. Content-addressable storage avoids duplication across tenants and versions.
Keep two stores conceptually:
Chunk store: immutable chunk blobs
File manifests: metadata that lists chunk sequence for a particular file version (Merkle root, chunk list)
4. Version Manager / Manifest
A version manifest records:
fileId, versionId, parentVersionId, chunkHashes[], createdBy, createdAt, changeType
rootHash (Merkle root computed over chunk hashes)
Manifest is small JSON; write it to metadata DB atomically once all missing chunks uploaded.
5. Conflict Resolver
If two clients modify the same base version concurrently:
Prefer preserving both versions rather than deleting or silently overwriting.
Chunking Strategies and Delta Algorithms
Fixed-size chunking
Simple, fast, predictable offsets.
Poor for insertions (shifts all subsequent chunks).
Rolling hash / Content-defined chunking
Uses Rabin fingerprinting to find boundaries that remain stable with insert/delete.
Used by rsync, Dropbox. Better delta detection for many real-world edits.
Binary diffs
If both sides have similar versions, bsdiff/xdelta produce compact binary patches.
Require both versions locally or server-assisted chunk comparison. Good for very specific binary formats.
Merkle Tree for Integrity
Protocol: Sync Session and APIs
Typical Sync Flow
Client computes chunkHashes[] and requests POST /sync/plan with { fileId, size, chunkHashes, versionHint }.
Server responds with { missingChunks[], uploadUrls[], targetManifestId }.
Client uploads missing chunks directly to blob store using PUT to pre-signed URL.
Client calls POST /sync/complete to signal all chunks uploaded; server validates chunks, writes manifest, and increments version.
Server notifies other devices via WebSocket/SignalR push or by queued notification.
Example API Signatures (REST)
POST /api/sync/plan → returns upload plan
PUT /api/sync/chunk/{fileId}/{chunkHash} → upload chunk (or via pre-signed URL)
POST /api/sync/complete → finalize version (manifest commit)
GET /api/files/{fileId}/versions → list versions
GET /api/files/{fileId}/manifest/{versionId} → get manifest
Design the APIs to be idempotent. PUT chunk can safely be retried.
.NET Backend Patterns
Metadata Model (SQL)
CREATE TABLE files (
file_id UUID PRIMARY KEY,
current_version UUID NULL,
created_at timestamptz,
tenant_id UUID
);
CREATE TABLE file_versions (
version_id UUID PRIMARY KEY,
file_id UUID REFERENCES files(file_id),
parent_version UUID NULL,
root_hash text NOT NULL,
created_by UUID,
created_at timestamptz DEFAULT now()
);
CREATE TABLE version_chunks (
version_id UUID REFERENCES file_versions(version_id),
chunk_index int,
chunk_hash text,
PRIMARY KEY(version_id, chunk_index)
);
Finalize Version (pseudo .NET)
public async Task FinalizeVersion(Guid fileId, Guid versionId, List<string> chunkHashes, Guid parentVersion)
{
using var tx = await _db.BeginTransactionAsync();
await _db.InsertFileVersion(versionId, fileId, parentVersion, ComputeRootHash(chunkHashes));
for (int i=0;i<chunkHashes.Count;i++)
await _db.InsertVersionChunk(versionId, i, chunkHashes[i]);
await _db.UpdateFileCurrentVersion(fileId, versionId);
await tx.CommitAsync();
}
Commit manifest atomically to prevent partial versions.
Chunk Uploading
Provide pre-signed URLs so large chunk upload bypasses the API server.
Validate uploaded chunk by server-side checksum before manifest commit.
Keep a garbage collection policy to delete orphaned chunks after a retention window.
Client (Angular) Implementation
Client Responsibilities
Compute chunk hashes (Web Crypto API)
Request upload plan
Upload missing chunks with retry and concurrency
Notify finalize, show progress
Subscribe to push notifications for remote changes
Example chunk hashing in Angular:
async function hashChunk(buffer: ArrayBuffer): Promise<string> {
const hash = await crypto.subtle.digest('SHA-256', buffer);
return Array.from(new Uint8Array(hash)).map(b => b.toString(16).padStart(2,'0')).join('');
}
Upload Manager
Use a concurrent upload queue (e.g., 4-8 parallel uploads).
Persist sync session locally to resume after crash.
Expose progress per-chunk and aggregated.
Offline Support and Resumability
Avoid recomputing full-file hashes frequently — cache chunk hashes for unchanged files.
Conflict Detection and UI
Show a clear “conflict” state with both versions listed and timestamps.
Provide merge helpers:
For text, show 3-way diff and allow merge in UI.
For binary, allow preview of both and choose one or upload merged version.
Provide automatic conflict policies per tenant: last-writer-wins, manual-merge, version-branching.
Security and Compliance
Encrypt in transit (TLS 1.2+).
Encrypt chunks at rest (server-side encryption or client-side encryption with tenant key).
Use signed URLs with short expiry for direct uploads.
Authenticate and authorize per-tenant operations.
Audit writes, uploads, deletes and version changes for compliance.
For BYOK-sensitive tenants, support client-side envelope encryption: client encrypts chunk payload with tenant key before upload; server stores ciphertext (server never holds raw key).
Scalability, Performance and Cost Optimizations
Use a distributed object store (S3 / Blob) that scales horizontally.
Use CDN for serving frequently read versions.
Deduplicate chunks globally by content-addressable storage.
Compress chunks when beneficial; choose compression tradeoffs (CPU vs bandwidth).
Store small file inline in metadata for speed (e.g., <16 KB).
Use batched manifest commits and backpressure the publisher.
Garbage Collection & Retention
Monitoring, Observability and SLOs
Track:
Sync success/failure rates
Average upload latency per chunk
Percentage of bytes saved by delta vs full upload
Number of conflicts per tenant
Storage growth and orphaned chunk rate
Throughput: chunks/sec, manifests/sec
Expose metrics to Prometheus and tracing via OpenTelemetry. Use traceId correlation for debugging complex sync flows.
Testing Strategy
Unit tests for chunk hashing, manifest computation and conflict detection.
Integration tests with real object store and database.
Performance tests with synthetic large files and concurrent clients.
Network-chaos tests: packet loss, slow links, reconnects.
End-to-end tests for resume and offline scenarios.
Operational Playbook
Onboard: provision storage and default retention per tenant.
Throttling: protect system by per-tenant rate limits on chunk uploads.
Backfill: support background re-sync jobs for clients that missed pushes.
Recovery: if manifest commit fails due to missing chunks, provide clear remediation steps (re-upload missing chunks or roll back).
Backups: snapshot metadata DB and ensure object store versioning is enabled or use cross-region replication.
Example End-to-End Scenario
User edits a 1 GB CAD file and changes a 5 MB subsection. Client detects changed chunks (say 2 chunks changed). Instead of uploading 1 GB, client will upload only those 2 chunks (~8 MB) plus a small manifest commit. Server composes new version by referencing existing chunks and new chunks. Other devices get notified and download only two chunks.
Summary
A high-scale file sync service must balance correctness, efficiency and usability. The architecture described provides:
Efficient delta detection via chunking and content hashing
Safe, atomic version commits with manifests and Merkle roots
Robust conflict handling with clear UX patterns
Offline support and resumable uploads
Scalable storage and deduplication patterns
Implement this with Angular for client hashing/upload orchestration and .NET for the Sync Gateway, metadata DB and background processes. Focus on idempotency, observability and security. Start with fixed-size chunking and add rolling chunk logic as you observe real workloads — that is often the fastest path to production.