How to Improve PostgreSQL Performance Using Table Partitioning in CQRS Systems

Bohdan Stupak
2d
233
0
2

Article

Introduction

In large-scale applications, especially those following the CQRS (Command Query Responsibility Segregation) pattern, performance bottlenecks often emerge as data grows. One common issue is slow read or write operations due to massive table sizes.

In one of my projects, we faced a similar challenge. Our system used PostgreSQL as the write database and a NoSQL database as the read store. To ensure data consistency, we maintained a recovery endpoint that could rebuild the read store from the write database.

However, as the dataset grew to millions of rows, this process started failing due to database timeouts.

In this article, you will learn how PostgreSQL table partitioning helped solve this problem, along with practical implementation using pg_partman.

Understanding Partitioning

Partitioning is the process of dividing a large table into smaller, more manageable pieces called partitions. This improves query performance and reduces load on the database.

There are three common types of partitioning:

Range partitioning – splits data based on ranges (e.g., dates, IDs)
List partitioning – splits data based on categories (e.g., region)
Hash partitioning – distributes data evenly using a hash function

Partitioning vs Sharding

Partitioning occurs within a single database instance, while sharding distributes data across multiple database instances.

Sharding introduces additional complexity such as:

Request routing layers
Distributed transactions
Cross-shard joins

Because of this, partitioning should be the first approach. Sharding should only be considered when a single database can no longer handle the load.

Problem Statement

Our system needed to process data in batches efficiently. However, querying millions of rows from a single table caused significant delays and timeouts.

The goal was to:

Break the table into manageable chunks
Align partitions with batch processing
Ensure even data distribution

Choosing the Right Strategy

We evaluated multiple approaches:

List partitioning → uneven distribution
Time-based partitioning → skewed data growth

We selected range partitioning based on the primary key (Id) because it ensured predictable and balanced partitions.

Implementation Using pg_partman

First, enable the required extension:

CREATE SCHEMA partman;
CREATE EXTENSION pg_partman WITH SCHEMA partman;

Next, create a new partitioned table:

CREATE TABLE public."Entities"
(
    "Id" bigint GENERATED ALWAYS AS IDENTITY,
    "UserName" varchar(255) NOT NULL,
    "TimeStamp" timestamp with time zone NOT NULL,
    "Entity" jsonb,
    "OrganizationInfo" text,
    CONSTRAINT "PK_Entities" PRIMARY KEY ("Id")
) PARTITION BY RANGE ("Id");

Now, calculate how many partitions are required:

DO $$
DECLARE
    v_row_count bigint;
    v_partition_size bigint := 200000;
    v_premake integer;
BEGIN
    SELECT count(*) INTO v_row_count FROM public."Entities_old";

    v_premake := ceil(v_row_count::numeric / v_partition_size) + 1;

    PERFORM partman.create_parent(
        p_parent_table := 'public.Entities',
        p_control      := 'Id',
        p_interval     := v_partition_size::text,
        p_premake      := v_premake
    );
END $$;

Important Insight: p_premake Parameter

The p_premake parameter defines how many partitions are created in advance.

If this value is too low, new data may go into a default partition, defeating the purpose of partitioning.

Automating Partition Maintenance

PostgreSQL does not automatically create new partitions. To solve this, we use pg_cron:

CREATE EXTENSION pg_cron;

SELECT cron.schedule('@hourly', $$
CALL partman.partition_data_proc('public.Entities');
$$);

This ensures:

New partitions are created automatically
Data is moved from the default partition

Data Migration

Finally, migrate data to the new partitioned table:

INSERT INTO public."Entities" ("UserName", "TimeStamp", "Entity", "OrganizationInfo")
SELECT "UserName", "TimeStamp", "Entity", "OrganizationInfo"
FROM public."Entities_old"
ORDER BY "Id";

Before vs After

Before partitioning:

Single large table (~9 million rows)
Query time: up to 30 seconds

After partitioning:

Multiple smaller partitions
Query time: milliseconds

When NOT to Use Partitioning

Partitioning is not always necessary. Avoid it when:

Dataset is small
Query performance is already acceptable
Operational overhead outweighs benefits

Conclusion

PostgreSQL partitioning is a powerful technique for improving performance in large-scale systems, especially in CQRS architectures.

By breaking large tables into smaller partitions, you can:

Improve query performance
Reduce timeouts
Enable efficient batch processing

Start small by identifying performance bottlenecks, and gradually introduce partitioning where it provides the most value.