Partial Index Implementation: Execution Plan Diagnostics & Query Tuning #

A partial index covers only the rows that satisfy a WHERE predicate, letting you index a focused subset of the table rather than every row.

When the Optimizer Chooses a Partial Index #

The PostgreSQL planner selects a partial index when three conditions hold simultaneously: the query’s WHERE clause contains a predicate that is implied by the index predicate, the indexed column appears in a subsequent condition or the ORDER BY, and the planner’s cost estimate for the index path is lower than for a sequential scan.

Within the broader Index Tuning & Strategy workflow, partial indexes are the right tool when a consistent minority of rows drives the bulk of query traffic. The key cardinality triggers are:

For workloads involving JSONB containment or full-text search predicates, evaluate whether specialized index types such as GIN or GiST offer better selectivity before committing to a partial B-tree index.

Unlike a standard B-tree index that stores an entry for every row, the partial index physically excludes rows that do not satisfy the predicate — so write amplification, storage cost, and the index’s working-set size all shrink in proportion to the fraction of rows indexed.


Partial Index vs Full-Table Index: Storage and Scan Path Diagram showing a users table split into active rows (20%) and inactive rows (80%). A full-table B-tree index covers all rows; a partial index covers only active rows. The query path through the partial index is shorter and touches fewer pages. users table active rows ~20% of table inactive / deleted ~80% of table Full B-tree index covers every row large; cold pages fetched on each scan Partial index WHERE status = 'active' Query scan path Seq Scan / full Index Scan reads all 100% of index pages Filter: status = 'active' (post-scan) Index Scan (partial index) reads only active-row pages Index Cond pushed down; no Filter

Annotated EXPLAIN Node Breakdown #

Run the baseline diagnostic before creating any index. The key fields to capture are Rows Removed by Filter, shared_blks_read, and whether the plan uses a heap filter versus an index condition.

-- Baseline: no partial index yet
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, email
FROM users
WHERE status = 'active'
  AND last_login > NOW() - INTERVAL '30 days';

Typical pre-index output:

Seq Scan on users  (cost=0.00..4821.00 rows=412 width=36)
                   (actual time=0.042..38.7 rows=389 loops=1)
  Filter: ((status = 'active') AND (last_login > '2026-05-22'::timestamp))
  Rows Removed by Filter: 99611        -- ← 99k rows discarded after heap read
  Buffers: shared hit=142 read=2103    -- ← 2103 cold page reads
Planning Time: 0.8 ms
Execution Time: 39.1 ms

Diagnostic signals:

Field What it tells you
Rows Removed by Filter: 99611 99% of fetched rows discarded — the index would avoid reading them
Buffers: shared read=2103 Most pages came from disk (cold cache); hot-path latency will be high
Filter: (not Index Cond:) Both predicates evaluated post-scan against the heap

After creating and analysing the partial index:

CREATE INDEX idx_users_active_recent
  ON users (last_login DESC)
  WHERE status = 'active';   -- partial predicate

ANALYZE users;               -- refresh pg_statistic immediately
-- Post-index validation
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, email
FROM users
WHERE status = 'active'
  AND last_login > NOW() - INTERVAL '30 days';
Index Scan Backward using idx_users_active_recent on users
       (cost=0.28..31.4 rows=389 width=36)
       (actual time=0.041..0.93 rows=389 loops=1)
  Index Cond: (last_login > '2026-05-22'::timestamp)
  -- Note: no Filter node — planner knows the index is restricted to active rows
  Buffers: shared hit=8 read=4      -- ← cold reads dropped from 2103 → 4
Planning Time: 0.6 ms
Execution Time: 1.1 ms

The absence of a Filter: line after the Index Scan node is the definitive signal that the partial predicate is being pushed down to the index level, not re-evaluated against the heap.

Algorithm Internals: Predicate Pushdown and Filter Logic #

PostgreSQL’s planner evaluates partial index eligibility in indxpath.c. For the planner to treat a partial index as usable, the query’s WHERE clause must imply the index predicate. Implication is syntactic: status = 'active' in the query implies WHERE status = 'active' in the index definition. Variations that break implication:

Before/after predicate alignment:

-- BROKEN: query predicate does not imply index predicate
-- Index: WHERE status = 'active'
-- Query: WHERE status = 'active' OR status = 'trial'
--        → planner cannot guarantee index covers 'trial' rows

-- FIXED: separate partial index per value, or composite partial index
CREATE INDEX idx_users_active  ON users (last_login DESC) WHERE status = 'active';
CREATE INDEX idx_users_trial   ON users (last_login DESC) WHERE status = 'trial';

-- Or consolidate with a broader predicate if both are always queried together:
CREATE INDEX idx_users_nondorm ON users (last_login DESC, status)
  WHERE status IN ('active', 'trial');

The [covering index](/index-tuning-strategy/covering-index-design/) design principle applies here too: if the query’s SELECT list only references id, email, last_login, and status, adding those columns to the index definition lets the planner use an Index Only Scan, avoiding the heap entirely.

-- Partial covering index: zero heap fetches for the target query
CREATE INDEX idx_users_active_cover
  ON users (last_login DESC, id, email)
  WHERE status = 'active';

With a sufficiently up-to-date visibility map (maintained by regular VACUUM), this yields an Index Only Scan with Heap Fetches: 0.

Storage, I/O, and Write Amplification #

Partial index size is proportional to the fraction of rows in the predicate. A users table where 20% of rows are active produces an index roughly one-fifth the size of a full-table index on the same column. The practical effects on I/O:

Monitor write overhead via pg_stat_user_indexes:

SELECT indexrelname,
       idx_scan,
       idx_tup_read,
       idx_tup_fetch
FROM   pg_stat_user_indexes
WHERE  relname = 'users'
ORDER  BY idx_scan DESC;

If idx_scan on the partial index is zero despite queries matching the predicate, the planner is bypassing it — run EXPLAIN to diagnose why.

Step-by-Step Tuning Workflow #

1. Capture the baseline plan

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, email
FROM   users
WHERE  status = 'active'
  AND  last_login > NOW() - INTERVAL '30 days';
-- Record: Rows Removed by Filter, shared_blks_read, Execution Time

2. Confirm predicate selectivity

-- Fraction of rows matching the partial predicate
SELECT COUNT(*) FILTER (WHERE status = 'active')::numeric
     / COUNT(*)                                  AS active_fraction
FROM   users;
-- Aim for < 0.30; above that, a full-table index may be more cost-effective

3. Check pg_stats for column distribution

SELECT attname,
       n_distinct,
       correlation,
       most_common_vals,
       most_common_freqs
FROM   pg_stats
WHERE  tablename = 'users'
  AND  attname   IN ('status', 'last_login');
-- Low n_distinct on status confirms a good partial index candidate
-- High correlation on last_login favours an Index Scan Backward

4. Create the partial index

CREATE INDEX CONCURRENTLY idx_users_active_recent
  ON users (last_login DESC)
  WHERE status = 'active';
-- CONCURRENTLY avoids an ACCESS SHARE lock on a live table

5. Refresh statistics

ANALYZE users;
-- Re-runs after index creation to give the planner accurate cardinality
-- for the filtered subset; without this, cost estimates can be stale

6. Validate and compare

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, email
FROM   users
WHERE  status = 'active'
  AND  last_login > NOW() - INTERVAL '30 days';
-- Check: Index Cond present, no residual Filter, shared_blks_read << baseline

7. Monitor in production

SELECT query,
       calls,
       total_exec_time / calls AS avg_ms,
       shared_blks_read / calls AS avg_blks_read
FROM   pg_stat_statements
WHERE  query ILIKE '%users%active%'
ORDER  BY avg_blks_read DESC
LIMIT  10;
-- avg_blks_read should be dramatically lower post-index

Understanding cost estimation models helps interpret why the planner’s pre-ANALYZE costs may remain high even after the index exists — statistics lag is the most common cause of post-creation sequential scan fallback.

Common Pitfalls #

Predicate mismatch. Diagnostic: EXPLAIN shows Seq Scan or Index Scan with a residual Filter: node after creation. Fix: compare the index’s WHERE clause in pg_indexes against the query’s WHERE clause character by character. Even a type cast difference (status = 'active' vs status::text = 'active') breaks implication.

Stale statistics causing sequential scan fallback. Diagnostic: EXPLAIN (ANALYZE) shows rows=... estimates far from actual rows, or cost= on the index path exceeds the sequential scan cost despite the index being usable. Fix: run ANALYZE users immediately after CREATE INDEX. Consider lowering default_statistics_target for the predicate column to 500 or higher for skewed distributions.

Volatile function in the index predicate. Diagnostic: CREATE INDEX succeeds but returns a WARNING, or the index is never selected despite correct predicate alignment. Fix: replace NOW() or CURRENT_TIMESTAMP with a fixed literal or an immutable function. Partial index predicates must be immutable — they are evaluated once at creation time.

ORM injecting unexpected conditions. Diagnostic: generated SQL includes AND deleted_at IS NULL or AND "users"."type" = 'User' that does not appear in the index predicate. Fix: add those conditions to the index predicate to match the ORM’s generated SQL, or configure the ORM to suppress automatic scope injection for this query path.

Over-filtering with too selective a predicate. Diagnostic: index is used but idx_scan in pg_stat_user_indexes is very low; the index exists for an edge-case query that runs rarely. Fix: weigh creation cost and maintenance overhead against query frequency. Use pg_stat_statements to confirm the query is high-frequency before keeping the index.

High-churn predicate column causing write amplification. Diagnostic: pg_stat_user_indexes.idx_blks_written is high relative to table write volume; replication lag increases after index creation. Fix: if rows frequently toggle the predicate value (e.g., a job queue where status cycles through pending → processing → done), a partial index on status = 'pending' may generate more index writes than a full-table index. Profile with pgbench before committing.

Frequently Asked Questions #

How do I verify if the query planner is using my partial index? #

Run EXPLAIN (ANALYZE, BUFFERS) and inspect for an Index Scan or Index Only Scan node. The Index Cond field must reference the indexed column. If you see a Seq Scan or a Filter: line immediately after the index scan node, the planner bypassed or did not fully utilise the partial index — most commonly because of predicate mismatch, outdated statistics, or a cost estimate that still favours the sequential path. Compare the index predicate in pg_indexes against the query’s WHERE clause and run ANALYZE if statistics are stale.

Why does my ORM query bypass the partial index even with matching WHERE clauses? #

Many ORMs automatically inject conditions such as deleted_at IS NULL, soft-delete scopes, or STI type columns. These additions break exact predicate matching because the query’s implied predicate no longer matches the index’s stored predicate. Disable automatic condition injection for critical read paths, or adjust the partial index definition to incorporate the ORM’s injected conditions — then verify with EXPLAIN that Index Cond appears and no residual Filter: node remains.

When should I prefer a full B-tree or covering index over a partial index? #

Avoid partial indexes when the target subset exceeds roughly 30–40% of the table, when queries frequently alternate between filtered and unfiltered access patterns, or when high-churn updates continuously toggle rows in and out of the predicate scope. In those cases a full B-tree or covering index delivers more consistent performance across diverse query patterns. Also consider reading plan bottlenecks to confirm the index scan itself is the hot node rather than an upstream join or sort.

Can I use a partial index together with an index-only scan? #

Yes. If the index definition includes all columns referenced in the SELECT list and any additional WHERE conditions beyond the index predicate, PostgreSQL can satisfy the query with an Index Only Scan — no heap fetch required. The visibility map must show the target pages as all-visible; if EXPLAIN (ANALYZE) reports high Heap Fetches, run VACUUM users to advance the visibility map, then re-run the query.