Speeding up large table scans with Parallel TID Ranges in PostgreSQL 19

On November 27, 2025, commit 0ca3b16 landed in the PostgreSQL master branch. Authored by Cary Huang (Highgo) and David Rowley (Microsoft), it introduces Parallel TID Range Scans.

For most application developers, this won’t change how you write queries. But for anyone maintaining data migration tools, replication slots, or custom ETL scripts, this patch resolves a specific, annoying bottleneck in how Postgres handles large-scale data movement.

Here is a look at what changed in the planner and executor, and why manual chunking logic might finally be obsolete.

Table of Contents

The Problem: The Planner’s Trade-off

Postgres has supported TID Range Scans since version 14. This allows you to scan a specific slice of a table based on physical block numbers:

SELECT * FROM my_large_table WHERE ctid >= '(0,0)' AND ctid < '(10000,0)';

This is the standard way tools like AWS DMS or logical replication initializers break up massive tables. The problem was that until now, this scan node was strictly single worker.

This forced the Postgres Query Planner into a difficult spot. When you ran a query on a large dataset, the planner had to choose between:

TID Range Scan: I/O efficient (reads only the blocks you asked for) but single worker.
Parallel Seq Scan: CPU efficient (uses all cores) but I/O wasteful (might read blocks outside your range just to filter them out later).

Often, the planner would incorrectly choose the Parallel Seq Scan because the math suggested the CPU gains outweighed the I/O penalty. This resulted in the database reading more data than necessary just to utilize available worker threads.

The Fix: Parallelism and Variable Chunking

Commit 0ca3b16 solves this by allowing TID Range Scans to participate in parallel query execution.

The implementation (~500 lines of code) reuses the “block chunking” logic found in Parallel Sequential Scans. It doesn’t just divide the block range evenly among workers, which could lead to skew if one part of the table is denser than another.

Instead, it uses a decaying chunk size strategy:

Large Start: Workers start by claiming large chunks of blocks to minimize locking overhead on the shared state.
Tapering Down: As the scan progresses, the chunk size shrinks.
Granular Finish: By the end of the scan, workers are claiming 1 block at a time.

This “slow reduction” ensures that we don’t end up with one worker stuck processing a massive final chunk while the other workers sit idle. It forces all threads to cross the finish line at roughly the same time.

Benchmark Analysis

To verify the impact, the authors ran a comparison on a table containing 40 million rows.

1. The “Before” State (Serial)

With parallelism disabled, we see the classic PostgreSQL behavior. A single process undertakes the entire task.

set max_parallel_workers_per_gather=0;
EXPLAIN (ANALYZE, BUFFERS) 
select count(a) from test where ctid >= '(0,0)' and ctid < '(216216,40)';

QUERY PLAN
--------------------------------------------------------------------------------------------------
 Aggregate  (actual time=12931.695..12931.696 rows=1 loops=1)
   Buffers: shared read=216217
   ->  Tid Range Scan on test  (actual time=0.079..6800.482 rows=39999999 loops=1)
         TID Cond: ((ctid >= '(0,0)'::tid) AND (ctid < '(216216,40)'::tid))
         Buffers: shared read=216217
 Planning Time: 0.917 ms
 Execution Time: 12932.348 ms

Observation: The Tid Range Scan shows loops=1. One process read 216,217 buffers sequentially.
Total Time: ~12.9 seconds.

2. The “After” State (Parallel)

With parallelism enabled in PG19, the planner fundamentally changes its approach.

set max_parallel_workers_per_gather=2;
EXPLAIN (ANALYZE, BUFFERS) 
select count(a) from test where ctid >= '(0,0)' and ctid < '(216216,40)';

QUERY PLAN
----------------------------------------------------------------------------------------------------
 Finalize Aggregate  (actual time=4842.512..4847.863 rows=1 loops=1)
   Buffers: shared read=216217
   ->  Gather  (actual time=4842.020..4847.851 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared read=216217
         ->  Partial Aggregate  (actual time=4824.730..4824.731 rows=1 loops=3)
               Buffers: shared read=216217
               ->  Parallel Tid Range Scan on test  (actual time=0.098..2614.108 rows=13333333 loops=3)
                     TID Cond: ((ctid >= '(0,0)'::tid) AND (ctid < '(216216,40)'::tid))
                     Buffers: shared read=216217
 Planning Time: 4.124 ms
 Execution Time: 4847.992 ms

Observation: We now see a Gather node managing two workers. Crucially, the scan node has changed to Parallel Tid Range Scan.
Work Distribution: Notice loops=3 on the scan node (1 Leader + 2 Workers). The rows processed per loop averages to ~13.3 million (40M total / 3 processes). The work was evenly distributed.
Total Time: ~4.8 seconds (2.6x speedup).

What this means for Tooling

If you maintain internal scripts that move data between Postgres instances, you have likely written code that manually calculates block ranges to divide a huge table into chunks and spawns threads to run them.

With PostgreSQL 19, that complexity can likely be deleted. You can issue broader TID range queries and trust the planner to distribute the work across the cluster’s I/O and CPU resources efficiently.

Commit reference: 0ca3b16

Grant Zhou

Founder @Hornetlabs | Open Source Dev @Highgo | IvorySQL & SynchDB | PostgreSQL China Association | PostgresConf Asia Liaison

Speeding up large table scans with Parallel TID Ranges in PostgreSQL 19

The Problem: The Planner’s Trade-off

The Fix: Parallelism and Variable Chunking

Benchmark Analysis

1. The “Before” State (Serial)

2. The “After” State (Parallel)

What this means for Tooling

Leave a Reply Cancel reply

Exploring the Architecture of PostgreSQL based IvorySQL

Speeding up large table scans with Parallel TID Ranges in PostgreSQL 19

Testing Heterogeneous CDC? Skip the Setup Pain with One-Command Deployment