Diagnosing Query Slowdowns from Soft Deletes: Optimizing with Partial Indexes #
Soft deletes are a common architectural pattern, but they frequently degrade query performance by inflating index cardinality. This forces the query planner to evaluate inactive rows during standard lookups. When execution plans reveal excessive rows_examined or post-index filtering on is_deleted = false predicates, the root cause is typically index bloat. This guide details how to diagnose execution plan degradation caused by soft deletes and resolve it using targeted partial indexes. For broader context on predicate-driven indexing, see our guide on Index Tuning & Strategy.
Root-Cause Analysis: Execution Plan Bloat from Soft Deletes #
When a table accumulates a high ratio of soft-deleted records, standard B-tree indexes on frequently queried columns become inefficient. The planner must traverse index pages containing both active and inactive tuples. It then applies a Filter node to discard is_deleted = true rows.
In EXPLAIN (ANALYZE, BUFFERS) output, this manifests as a high actual rows count relative to planned rows. You will also see elevated shared hit counts and frequent Rows Removed by Filter metrics. The I/O overhead stems from fetching dead tuples into shared buffers. This triggers unnecessary cache pollution and increases lock contention during MVCC cleanup.
EXPLAIN Node Behavior & Diagnostic Steps #
To isolate the performance bottleneck, run EXPLAIN (ANALYZE, BUFFERS, TIMING) against the target query. Look for an Index Scan or Bitmap Index Scan followed immediately by a Filter: (is_deleted = false) node. If the filter removes more than 30% of scanned rows, the index is suboptimal.
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, status, created_at
FROM orders
WHERE status = 'processing' AND is_deleted = false;
This baseline execution plan typically shows an Index Scan on a generic idx_orders_status. It is immediately followed by Filter: (is_deleted = false). High Rows Removed by Filter indicates severe index bloat.
Examine the cost and actual time metrics closely. A large divergence between estimated startup cost and actual time indicates row-by-row filtering overhead. The loops metric should remain at 1 for single-table scans. Higher values suggest nested loop amplification caused by the bloated index.
Next, check the Index Cond node. If it lacks the soft-delete predicate, the planner is relying on a full index traversal. Compare the Heap Fetches metric against Index Fetches. A high ratio indicates the index is not covering the query and forces heap lookups for deleted rows. Step-by-step resolution begins by capturing the baseline plan, verifying table statistics, and preparing a targeted index replacement.
Implementing the Partial Index #
A partial index restricts index entries to only rows matching a WHERE clause. This effectively excludes soft-deleted records from the B-tree structure. The result is reduced index size, improved cache hit ratios, and planner bypass of the Filter node entirely.
When designing the index, ensure the predicate exactly matches the query’s is_deleted = false condition. This guarantees index selection during planning. Proper Partial Index Implementation requires aligning the index predicate with application-level query patterns. This avoids planner fallbacks to sequential scans.
CREATE INDEX CONCURRENTLY idx_orders_active_processing
ON orders (status, created_at)
WHERE is_deleted = false;
This partial index restricts B-tree entries exclusively to active records. Using CONCURRENTLY prevents exclusive write locks during deployment. After creation, force a statistics refresh. Re-run the execution plan to verify the Filter node has been eliminated. You should see it replaced with a direct Index Cond or Index Only Scan.
Step-by-Step Resolution & Validation #
Follow this exact workflow to deploy and validate the optimization:
- Capture baseline
EXPLAIN (ANALYZE, BUFFERS)output. - Identify the bloated index using system catalog queries (
pg_stat_user_indexes). - Drop the bloated index or mark it invalid if using concurrent creation.
- Create the partial index with the exact soft-delete predicate.
- Run
ANALYZEon the target table to update planner statistics. - Re-execute the query with
EXPLAIN (ANALYZE, BUFFERS). - Validate success by confirming
Rows Removed by Filterdrops to zero. - Verify
shared hitblocks decrease proportionally to the deleted row ratio. - Ensure query latency stabilizes under concurrent load.
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, status, created_at
FROM orders
WHERE status = 'processing' AND is_deleted = false;
The post-optimization plan shows an Index Only Scan with a direct Index Cond. The Filter node is completely eliminated. shared hit blocks drop significantly, and heap fetches approach zero. Execution time stabilizes as the planner reads only active tuples.
Common Pitfalls #
- Stale planner statistics causing the optimizer to ignore the new partial index despite correct syntax.
- ORM query builders dynamically injecting
OR is_deleted IS NULLpredicates that bypass the exactWHEREclause match. - Frequent
UPDATEstatements togglingis_deletedcausing index fragmentation and increased WAL volume. - Over-reliance on
Index Only Scanwithout accounting for visibility map updates on heavily updated tables.
Frequently Asked Questions #
Why does the query planner ignore my partial index after creation?
The planner ignores partial indexes when table statistics are outdated. It also ignores them when the query predicate does not exactly match the index WHERE clause. Run ANALYZE to refresh statistics. Verify that the query’s is_deleted = false condition matches the index definition verbatim. This includes operator precedence and data type casting.
How do I handle queries that need both active and recently deleted records?
Partial indexes only cover the predicate defined at creation. For queries requiring mixed states, maintain a separate full index or use a composite index without a WHERE clause. Alternatively, implement a time-bound soft-delete strategy. Hard-delete records older than a threshold to allow the partial index to cover the majority of active queries.
Does a partial index reduce VACUUM overhead?
Yes. By excluding soft-deleted rows from the index structure, VACUUM processes fewer index entries. This reduces I/O during dead tuple cleanup. However, the heap still requires standard VACUUM cycles to reclaim space from deleted rows. Monitor pg_stat_user_tables for dead_tuples accumulation separately from index maintenance.
Can I combine partial indexes with covering indexes for soft-deleted tables?
Absolutely. You can define a partial covering index by adding INCLUDE columns to the partial index definition. This allows Index Only Scan execution for queries that select non-key columns. The visibility map must confirm the heap tuple is visible. Ensure the WHERE clause remains is_deleted = false to maintain the performance benefits.