Covering Index Design: Eliminate Heap Lookups #

A covering index stores every column a query needs directly in the index structure, so the database engine never has to touch the table heap. When the design is correct, execution plans shift from Index Scan (random I/O against heap pages) to Index Only Scan with Heap Fetches: 0 — often cutting query time by an order of magnitude on read-heavy workloads.

When the Optimizer Chooses an Index Only Scan #

The planner selects Index Only Scan when two conditions are simultaneously true: the index contains all columns referenced anywhere in the query (both filter predicates and projection columns), and the visibility map for the scanned pages is sufficiently current for the planner to trust tuple visibility without touching the heap.

Within the broader Index Tuning & Strategy framework, covering indexes occupy a specific niche alongside B-Tree Index Optimization and Partial Index Implementation. Where a partial index shrinks index size by restricting rows, and a B-tree index optimizes key structure for traversal, a covering index optimizes projection — eliminating the lookup step that follows traversal.

The cost model triggers Index Only Scan over Index Scan when the planner estimates that heap fetches would dominate the plan cost. This threshold depends on:

The sibling topic Specialized Index Types (GIN/GiST) covers cases where B-tree covering indexes are structurally unsuitable — full-text search, JSONB, and geometric predicates require access methods that do not support INCLUDE.

Annotated EXPLAIN Node Breakdown #

The key fields to inspect when diagnosing covering index behavior are Heap Fetches, Buffers, and the ratio of actual rows to loops.

Before: Index Scan with heap lookups

Index Scan using idx_orders_status on orders
  (cost=0.43..124.55 rows=100 width=64)     -- planner estimate
  (actual time=0.085..1.420 rows=100 loops=1)
  Index Cond: (status = 'shipped'::text)
  Heap Fetches: 100                          -- one heap page visit per row
  Buffers: shared hit=4 read=98             -- 98 disk reads from heap
Planning Time: 0.210 ms
Execution Time: 1.580 ms

Every row returned by the index causes one heap page access. shared read=98 shows most of those pages were not in the buffer cache — random I/O that accumulates under concurrent load.

After: Index Only Scan with zero heap access

Index Only Scan using idx_orders_status_covering on orders
  (cost=0.43..12.10 rows=100 width=64)      -- planner estimate
  (actual time=0.042..0.115 rows=100 loops=1)
  Index Cond: (status = 'shipped'::text)
  Heap Fetches: 0                            -- no table I/O at all
  Buffers: shared hit=4                      -- only index pages in cache
Planning Time: 0.180 ms
Execution Time: 0.145 ms                     -- ~11x faster execution

Heap Fetches: 0 is the definitive confirmation. Buffers: shared hit=4 shows the entire result set was served from four index pages in the buffer pool.

The field actual rows vs the planner’s rows estimate is worth cross-checking here too. If estimates diverge significantly, outdated statistics may cause the planner to choose a less efficient plan under different parameter values — run ANALYZE orders after structural index changes.

What to look for in each field #

Field Healthy signal Problem signal
Node type Index Only Scan Index Scan or Seq Scan
Heap Fetches 0 Any positive value
Buffers: shared read Low or zero High — indicates disk I/O
actual rows vs rows Within 3× Off by 10× or more — stale stats

How the Index Structure Works #

The diagram below shows how a covering index with INCLUDE differs structurally from a plain B-tree index. Understanding the structure explains why column ordering matters and why INCLUDE columns do not participate in traversal.

Covering index structure: key columns traverse the tree, INCLUDE columns live only in leaf pages Diagram showing a B-tree index where internal (non-leaf) nodes hold only the key columns used for traversal, while leaf nodes hold both key columns and the INCLUDE payload columns. A query touches only leaf nodes to serve projection, never the heap. Root (internal node) key: status, created_at (key columns only) Leaf page status='active' + customer_id, total (INCLUDE payload) Leaf page status='shipped' + customer_id, total (INCLUDE payload) Leaf page status='pending' + customer_id, total (INCLUDE payload) (key + INCLUDE columns) Query satisfied from leaf pages Heap Fetches: 0 — no table access needed

Internal nodes hold only the key columns (status, created_at) for B-tree traversal. Leaf nodes carry both the key columns and the INCLUDE payload (customer_id, total_amount). A query projecting those columns is fully answered at the leaf level — the heap is never reached.

Column Ordering and the INCLUDE Clause #

Effective covering index design follows a strict two-tier column hierarchy:

Tier 1 — Key columns (leading index columns): These satisfy WHERE, JOIN, and ORDER BY predicates. They participate in B-tree traversal from root to leaf. Their order matters: the leftmost column must match the most selective equality predicate, followed by range or sort columns.

Tier 2 — INCLUDE columns: These satisfy SELECT projection only. PostgreSQL stores them exclusively in leaf pages. They do not affect sort order, tree depth, or traversal logic. The INCLUDE clause keeps internal node pages compact, reducing tree depth and improving cache locality for traversal.

-- Leading key columns: handle WHERE and ORDER BY
-- INCLUDE columns: handle SELECT projection only
CREATE INDEX idx_orders_status_covering
  ON orders (status, created_at DESC)
  INCLUDE (customer_id, total_amount, shipping_address);

The leading key (status, created_at DESC) drives the index scan: the planner traverses the tree to the first matching status = 'shipped' entry, then walks leaf pages in descending created_at order. The INCLUDE clause pays no traversal cost and adds no B-tree branching overhead.

Guidelines for INCLUDE column selection:

Memory, I/O, and Resource Behavior #

Covering indexes convert random heap I/O into sequential index leaf-page reads, which are far more buffer-pool-friendly. Several resource dimensions change:

Buffer pool efficiency: Index-only scans serve results from a much smaller set of pages than a full heap scan would require. For a orders table with 10 million rows, a plain Index Scan returning 10,000 rows may touch 10,000 distinct heap pages. A covering index returns those rows from perhaps 50–100 contiguous leaf pages, all likely to be cache-resident under any realistic working set.

Write amplification: Every INSERT and UPDATE must maintain both the heap and the covering index. Wide INCLUDE payloads increase the per-row write cost. For tables where writes exceed 25–30% of total operations, measure WAL generation (pg_stat_bgwriter.buffers_checkpoint) before and after adding a covering index.

Visibility map dependency: The Index Only Scan path depends on the visibility map being current. Pages not marked all-visible in the visibility map force the planner to visit the heap to confirm tuple visibility — these show up as non-zero Heap Fetches even with a correctly designed index. Run VACUUM on the table immediately after heavy write batches, and ensure autovacuum autovacuum_vacuum_scale_factor is tuned low enough on high-churn tables.

Temporary file impact: Covering indexes do not use work_mem during scanning. However, if the query includes a Sort node above the Index Only Scan (because the sort columns do not match the index key order), work_mem spill behavior applies normally. Eliminate the redundant sort by aligning ORDER BY with the leading key columns.

Step-by-Step Tuning Workflow #

Follow these steps in sequence. Each step produces evidence that informs the next.

Step 1 — Capture a baseline with full buffer diagnostics.

EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, total_amount
FROM orders
WHERE status = 'shipped'
ORDER BY created_at DESC
LIMIT 100;

Record: node type (Index Scan vs Index Only Scan), Heap Fetches, Buffers: shared hit and shared read, and Execution Time.

Step 2 — Check index usage statistics to distinguish one-off vs sustained problem.

SELECT
  indexrelname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch          -- heap fetches; should approach 0 for Index Only Scans
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY idx_scan DESC;

If idx_tup_fetch significantly exceeds zero relative to idx_scan, the current index is not covering the projection.

Step 3 — Identify the missing projection columns.

-- Check which columns the query needs vs which are in the current index
SELECT
  a.attname,
  pg_catalog.format_type(a.atttypid, a.atttypmod) AS data_type
FROM pg_attribute a
JOIN pg_class c ON c.oid = a.attrelid
WHERE c.relname = 'orders'
  AND a.attnum > 0
  AND a.attname IN ('customer_id', 'total_amount', 'shipping_address');

Step 4 — Drop the old index and create the covering index.

DROP INDEX CONCURRENTLY IF EXISTS idx_orders_status;

CREATE INDEX CONCURRENTLY idx_orders_status_covering
  ON orders (status, created_at DESC)
  INCLUDE (customer_id, total_amount, shipping_address);

Use CONCURRENTLY on production tables to avoid locking.

Step 5 — Update the visibility map.

VACUUM (ANALYZE) orders;

This updates the visibility map so the planner can trust index-only tuple visibility, and refreshes statistics so the planner has accurate row estimates for the new index.

Step 6 — Verify the plan and confirm heap fetch elimination.

SET LOCAL enable_indexscan = off;   -- force planner to use only index-only or seq scan path
EXPLAIN (ANALYZE, BUFFERS)
SELECT customer_id, total_amount
FROM orders
WHERE status = 'shipped'
ORDER BY created_at DESC
LIMIT 100;

Confirm Index Only Scan appears, Heap Fetches: 0, and Buffers: shared read is near zero. Reset the session parameter afterward.

Step 7 — Monitor ongoing health.

SELECT
  schemaname,
  relname,
  indexrelname,
  idx_blks_read,    -- index pages read from disk
  idx_blks_hit      -- index pages served from buffer pool
FROM pg_statio_user_indexes
WHERE relname = 'orders';

A healthy covering index shows idx_blks_hit dominating idx_blks_read. A sustained high idx_blks_read after warming indicates the index does not fit the buffer pool — consider raising shared_buffers or narrowing the INCLUDE payload.

ORM and Application Layer Alignment #

ORMs routinely emit SELECT *, which immediately disqualifies Index Only Scan. The planner cannot serve unindexed columns from the index, so it falls back to Index Scan with heap fetches — silently, with no error.

The covering index then provides no execution plan benefit until projection is controlled at the query layer:

-- ORM default — breaks Index Only Scan
SELECT * FROM orders WHERE status = 'active';

-- Explicit projection — enables Index Only Scan
SELECT id, customer_id, total_amount
FROM orders
WHERE status = 'active';

For detailed implementation patterns across Django ORM, ActiveRecord, and SQLAlchemy, see Building Effective Covering Indexes.

Practical application-layer steps:

Common Pitfalls #

Over-provisioning INCLUDE columns. Diagnostic: pg_relation_size(indexname) is large relative to the base table, and write throughput has measurably dropped. Fix: audit which INCLUDE columns are actually projected by live queries using pg_stat_statements, and remove columns with zero query coverage.

Non-zero Heap Fetches despite correct column coverage. Diagnostic: EXPLAIN (ANALYZE, BUFFERS) shows Heap Fetches greater than zero even though the index includes all required columns. Fix: run VACUUM (ANALYZE) on the table to update the visibility map. Tuning autovacuum_vacuum_scale_factor lower on high-churn tables prevents recurrence.

Sort node appears above Index Only Scan. Diagnostic: plan contains SortIndex Only Scan rather than a pure Index Only Scan. The sort columns do not match the leading key order. Fix: reorder the index key columns to match the ORDER BY clause.

ORM SELECT * silently disabling Index Only Scan. Diagnostic: pg_stat_user_indexes.idx_tup_fetch grows despite a correct covering index. Fix: intercept ORM-generated SQL at the application layer and enforce explicit projection.

Write amplification on high-churn tables. Diagnostic: pg_stat_bgwriter.buffers_checkpoint increases noticeably after adding the index; checkpoint duration or WAL volume grows. Fix: measure the read/write ratio before adding a covering index. On tables with more than 25% write operations, evaluate Partial Index Implementation to limit index scope, reducing write overhead.

Assuming Index Only Scan during high-concurrency updates. Diagnostic: Heap Fetches spikes during write-heavy windows even on a well-vacuumed table. Fix: pages being actively updated temporarily lose their all-visible status. This is expected behavior — visibility map pages are updated asynchronously by vacuum. Reduce autovacuum_vacuum_cost_delay on the affected table to allow vacuum to keep pace.

Frequently Asked Questions #

How do I verify an Index Only Scan is actually covering the query? #

Run EXPLAIN (ANALYZE, BUFFERS). If it shows Index Only Scan with Heap Fetches: 0, the index fully covers the query. Any non-zero Heap Fetches count indicates either projection columns are missing from the index or the visibility map is stale — run VACUUM on the table and recheck.

Does the INCLUDE clause work across all major relational databases? #

Syntax varies by engine. PostgreSQL uses INCLUDE since version 11. SQL Server also uses INCLUDE in CREATE INDEX. MySQL achieves covering index behavior by including projected columns in the secondary index key itself — there is no separate INCLUDE syntax through MySQL 8.4. Always verify behavior against your specific database version.

How does autovacuum affect covering index performance? #

Autovacuum updates the visibility map, which allows the planner to trust that index leaf pages contain only visible tuples. When autovacuum lags, the planner is forced to fetch heap pages to confirm row visibility, breaking the Index Only Scan optimization even when the index column set is correct. Tune autovacuum_vacuum_scale_factor and autovacuum_vacuum_cost_delay on high-write tables to keep the visibility map current.

When should I prefer a covering index over a materialized view? #

Choose a covering index for real-time, high-frequency read queries that need up-to-date data with minimal latency. Use a materialized view when queries involve complex multi-table aggregations, expensive window functions, or when occasional staleness is acceptable and the refresh overhead is manageable compared to the query cost savings.


Up: Index Tuning & Strategy