On November 27, 2025, commit 0ca3b16 landed in the PostgreSQL master branch. Authored by Cary Huang (Highgo) and David Rowley (Microsoft), it introduces Parallel TID Range Scans.
For most application developers, this won’t change how you write queries. But for anyone maintaining data migration tools, replication slots, or custom ETL scripts, this patch resolves a specific, annoying bottleneck in how Postgres handles large-scale data movement.
Here is a look at what changed in the planner and executor, and why manual chunking logic might finally be obsolete.
The Problem: The Planner’s Trade-off
Postgres has supported TID Range Scans since version 14. This allows you to scan a specific slice of a table based on physical block numbers:
SELECT * FROM my_large_table WHERE ctid >= '(0,0)' AND ctid < '(10000,0)';
This is the standard way tools like AWS DMS or logical replication initializers break up massive tables. The problem was that until now, this scan node was strictly single worker.
This forced the Postgres Query Planner into a difficult spot. When you ran a query on a large dataset, the planner had to choose between:
- TID Range Scan: I/O efficient (reads only the blocks you asked for) but single worker.
- Parallel Seq Scan: CPU efficient (uses all cores) but I/O wasteful (might read blocks outside your range just to filter them out later).
Often, the planner would incorrectly choose the Parallel Seq Scan because the math suggested the CPU gains outweighed the I/O penalty. This resulted in the database reading more data than necessary just to utilize available worker threads.
The Fix: Parallelism and Variable Chunking
Commit 0ca3b16 solves this by allowing TID Range Scans to participate in parallel query execution.
The implementation (~500 lines of code) reuses the “block chunking” logic found in Parallel Sequential Scans. It doesn’t just divide the block range evenly among workers, which could lead to skew if one part of the table is denser than another.
Instead, it uses a decaying chunk size strategy:
- Large Start: Workers start by claiming large chunks of blocks to minimize locking overhead on the shared state.
- Tapering Down: As the scan progresses, the chunk size shrinks.
- Granular Finish: By the end of the scan, workers are claiming 1 block at a time.
This “slow reduction” ensures that we don’t end up with one worker stuck processing a massive final chunk while the other workers sit idle. It forces all threads to cross the finish line at roughly the same time.
Benchmark Analysis
To verify the impact, the authors ran a comparison on a table containing 40 million rows.
1. The “Before” State (Serial)
With parallelism disabled, we see the classic PostgreSQL behavior. A single process undertakes the entire task.
set max_parallel_workers_per_gather=0;
EXPLAIN (ANALYZE, BUFFERS)
select count(a) from test where ctid >= '(0,0)' and ctid < '(216216,40)';
QUERY PLAN
--------------------------------------------------------------------------------------------------
Aggregate (actual time=12931.695..12931.696 rows=1 loops=1)
Buffers: shared read=216217
-> Tid Range Scan on test (actual time=0.079..6800.482 rows=39999999 loops=1)
TID Cond: ((ctid >= '(0,0)'::tid) AND (ctid < '(216216,40)'::tid))
Buffers: shared read=216217
Planning Time: 0.917 ms
Execution Time: 12932.348 ms
- Observation: The
Tid Range Scanshowsloops=1. One process read 216,217 buffers sequentially. - Total Time: ~12.9 seconds.
2. The “After” State (Parallel)
With parallelism enabled in PG19, the planner fundamentally changes its approach.
set max_parallel_workers_per_gather=2;
EXPLAIN (ANALYZE, BUFFERS)
select count(a) from test where ctid >= '(0,0)' and ctid < '(216216,40)';
QUERY PLAN
----------------------------------------------------------------------------------------------------
Finalize Aggregate (actual time=4842.512..4847.863 rows=1 loops=1)
Buffers: shared read=216217
-> Gather (actual time=4842.020..4847.851 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared read=216217
-> Partial Aggregate (actual time=4824.730..4824.731 rows=1 loops=3)
Buffers: shared read=216217
-> Parallel Tid Range Scan on test (actual time=0.098..2614.108 rows=13333333 loops=3)
TID Cond: ((ctid >= '(0,0)'::tid) AND (ctid < '(216216,40)'::tid))
Buffers: shared read=216217
Planning Time: 4.124 ms
Execution Time: 4847.992 ms
- Observation: We now see a
Gathernode managing two workers. Crucially, the scan node has changed toParallel Tid Range Scan. - Work Distribution: Notice
loops=3on the scan node (1 Leader + 2 Workers). The rows processed per loop averages to ~13.3 million (40M total / 3 processes). The work was evenly distributed. - Total Time: ~4.8 seconds (2.6x speedup).
What this means for Tooling
If you maintain internal scripts that move data between Postgres instances, you have likely written code that manually calculates block ranges to divide a huge table into chunks and spawns threads to run them.
With PostgreSQL 19, that complexity can likely be deleted. You can issue broader TID range queries and trust the planner to distribute the work across the cluster’s I/O and CPU resources efficiently.
Commit reference: 0ca3b16
Founder @Hornetlabs | Open Source Dev @Highgo | IvorySQL & SynchDB | PostgreSQL China Association | PostgresConf Asia Liaison