Introduction
In large-scale applications, especially those following the CQRS (Command Query Responsibility Segregation) pattern, performance bottlenecks often emerge as data grows. One common issue is slow read or write operations due to massive table sizes.
In one of my projects, we faced a similar challenge. Our system used PostgreSQL as the write database and a NoSQL database as the read store. To ensure data consistency, we maintained a recovery endpoint that could rebuild the read store from the write database.
However, as the dataset grew to millions of rows, this process started failing due to database timeouts.
In this article, you will learn how PostgreSQL table partitioning helped solve this problem, along with practical implementation using pg_partman.
Understanding Partitioning
Partitioning is the process of dividing a large table into smaller, more manageable pieces called partitions. This improves query performance and reduces load on the database.
There are three common types of partitioning:
Range partitioning – splits data based on ranges (e.g., dates, IDs)
List partitioning – splits data based on categories (e.g., region)
Hash partitioning – distributes data evenly using a hash function
Partitioning vs Sharding
Partitioning occurs within a single database instance, while sharding distributes data across multiple database instances.
Sharding introduces additional complexity such as:
Request routing layers
Distributed transactions
Cross-shard joins
Because of this, partitioning should be the first approach. Sharding should only be considered when a single database can no longer handle the load.
Problem Statement
Our system needed to process data in batches efficiently. However, querying millions of rows from a single table caused significant delays and timeouts.
The goal was to:
Break the table into manageable chunks
Align partitions with batch processing
Ensure even data distribution
Choosing the Right Strategy
We evaluated multiple approaches:
We selected range partitioning based on the primary key (Id) because it ensured predictable and balanced partitions.
Implementation Using pg_partman
First, enable the required extension:
CREATE SCHEMA partman;
CREATE EXTENSION pg_partman WITH SCHEMA partman;
Next, create a new partitioned table:
CREATE TABLE public."Entities"
(
"Id" bigint GENERATED ALWAYS AS IDENTITY,
"UserName" varchar(255) NOT NULL,
"TimeStamp" timestamp with time zone NOT NULL,
"Entity" jsonb,
"OrganizationInfo" text,
CONSTRAINT "PK_Entities" PRIMARY KEY ("Id")
) PARTITION BY RANGE ("Id");
Now, calculate how many partitions are required:
DO $$
DECLARE
v_row_count bigint;
v_partition_size bigint := 200000;
v_premake integer;
BEGIN
SELECT count(*) INTO v_row_count FROM public."Entities_old";
v_premake := ceil(v_row_count::numeric / v_partition_size) + 1;
PERFORM partman.create_parent(
p_parent_table := 'public.Entities',
p_control := 'Id',
p_interval := v_partition_size::text,
p_premake := v_premake
);
END $$;
Important Insight: p_premake Parameter
The p_premake parameter defines how many partitions are created in advance.
If this value is too low, new data may go into a default partition, defeating the purpose of partitioning.
Automating Partition Maintenance
PostgreSQL does not automatically create new partitions. To solve this, we use pg_cron:
CREATE EXTENSION pg_cron;
SELECT cron.schedule('@hourly', $$
CALL partman.partition_data_proc('public.Entities');
$$);
This ensures:
Data Migration
Finally, migrate data to the new partitioned table:
INSERT INTO public."Entities" ("UserName", "TimeStamp", "Entity", "OrganizationInfo")
SELECT "UserName", "TimeStamp", "Entity", "OrganizationInfo"
FROM public."Entities_old"
ORDER BY "Id";
Before vs After
Before partitioning:
After partitioning:
When NOT to Use Partitioning
Partitioning is not always necessary. Avoid it when:
Conclusion
PostgreSQL partitioning is a powerful technique for improving performance in large-scale systems, especially in CQRS architectures.
By breaking large tables into smaller partitions, you can:
Start small by identifying performance bottlenecks, and gradually introduce partitioning where it provides the most value.