How do I prevent disk spills during large sort operations?

Reduce the working set with tighter predicates, add a covering index that satisfies sort order without an explicit Sort node, and tune session-level work_mem. Verify the change with EXPLAIN ANALYZE: the Sort Method line should change from 'external merge Disk: XkB' to 'quicksort' or 'top-N heapsort'.

Debugging Unexpected Sort Operations in PostgreSQL #

A Sort node appearing in an execution plan where you did not write ORDER BY is one of the clearest signals that the query optimizer is compensating for a structural mismatch — a missing index, stale statistics, or an implicit ordering requirement buried inside a join or aggregation. Diagnosing it precisely, rather than guessing, prevents the two most common wrong turns: blindly adding memory and blindly adding indexes.

Why the Planner Injects a Sort Node #

The planner injects a Sort node under four distinct conditions, each with a different cure. Sort and Hash Node Analysis covers the full cost model; this page focuses on the diagnostic path once you have spotted the node.

Condition 1 — No index satisfies the required order. When the query needs rows in a specific order and no index provides that order, the planner retrieves rows unordered and sorts them afterward. This is the most common source of a sort that appears without an ORDER BY in your SQL: an upstream operator (merge join, DISTINCT, window function) required sorted input and the optimizer could not find a pre-sorted path.

Condition 2 — Merge join requires sorted inputs. A merge join can only compare rows from two streams if both arrive sorted on the join key. If neither side is pre-sorted via an index, the planner adds an explicit Sort node to each input.

Condition 3 — Sort-based aggregation. When the planner estimates that sort-based GROUP BY or DISTINCT is cheaper than hash aggregation — typically because cost estimation models predict a small output set — it will sort the input rather than hash it.

Condition 4 — Window function frame ordering. ROW_NUMBER(), RANK(), LAG(), and similar window functions require rows ordered within their partition. That ordering manifests as a Sort node even when the outer query has no ORDER BY.

Understanding which condition applies dictates the fix.

Annotated EXPLAIN Evidence #

The two fields that confirm the problem and guide the fix are Sort Method and the estimated-versus-actual row count on the Sort node itself.

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.order_id, c.customer_name, o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'PENDING'
ORDER BY o.created_at DESC;

Sort  (cost=38420.15..38795.15 rows=150000 width=48)
      (actual time=312.4..341.7 rows=150000 loops=1)
  Sort Key: o.created_at DESC
  Sort Method: external merge  Disk: 4096kB        -- ← spill to disk; work_mem too low
  Buffers: shared hit=3200 read=8100, temp read=512 written=512
  ->  Hash Join  (cost=3.25..12000.00 rows=150000 width=48)
                 (actual time=0.8..211.3 rows=150000 loops=1)
        Hash Cond: (o.customer_id = c.customer_id)
        Buffers: shared hit=3200 read=8100
        ->  Seq Scan on orders  (cost=0.00..9500.00 rows=150000 width=36)
                                (actual time=0.1..98.4 rows=150000 loops=1)
              Filter: ((status)::text = 'PENDING')   -- ← full seq scan, no index
              Rows Removed by Filter: 350000
        ->  Hash  (cost=2.00..2.00 rows=100 width=24) ...

Key observations:

Sort Method: external merge Disk: 4096kB — the sort spilled to temporary files. This single line tells you that the working set exceeded work_mem at sort time.
rows=150000 on the Seq Scan with Rows Removed by Filter: 350000 — the planner is correctly estimating the row count here, so the issue is not statistics: it is a missing index on (status, created_at DESC).
The Hash Join itself is fine; the sort is the hot node.

Compare this to a statistics-driven problem, where rows on the Sort node would be far smaller than actual rows:

Sort  (cost=820.15..822.65 rows=1000 width=48)
      (actual time=312.4..341.7 rows=150000 loops=1)
  -- rows=1000 vs actual rows=150000 → cardinality misestimate

That pattern points to stale statistics as the root cause, not a missing index.

The inline SVG below shows the two diagnostic paths and how they branch from a single Sort node observation:

Step-by-Step Resolution Workflow #

Step 1 — Capture the baseline #

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.order_id, c.customer_name, o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'PENDING'
ORDER BY o.created_at DESC;

Record: total execution time, Sort Method (in-memory vs disk), estimated vs actual rows on the Sort node, and the Buffers: temp lines. These four numbers are your before-state for validation.

Step 2 — Classify the sort #

Is the sort explicit (driven by your ORDER BY) or implicit (injected for GROUP BY, DISTINCT, merge join, or a window function)?

If implicit, the sort’s Sort Key tells you which operator upstream demanded the ordering. A Sort Key: o.created_at DESC preceding a Merge Join means the join algorithm required that order.
If explicit, the question is whether an index could provide that order, making the sort node unnecessary.

Step 3 — Check cardinality accuracy #

-- Compare planner estimate against table statistics
SELECT tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE tablename = 'orders'
  AND attname IN ('status', 'created_at', 'customer_id');

A correlation near 1.0 or -1.0 means the column is physically ordered on disk — an index scan in that direction is very cheap. A correlation near 0 means random access, which increases index scan cost and makes the planner more likely to choose a sequential scan followed by a sort.

If rows on the Sort node is far below actual rows, run:

ANALYZE orders;

Then re-run EXPLAIN (ANALYZE, BUFFERS) and check whether the plan changes.

Step 4 — Add a targeted composite index #

When statistics are accurate but no index satisfies the ordering, create one that covers the filter predicate and the sort column together. Column order matters: the leading columns should eliminate the most rows, and the sort column comes last with the correct direction.

-- Covers: WHERE status = 'PENDING' AND join on customer_id, ORDER BY created_at DESC
CREATE INDEX CONCURRENTLY idx_orders_status_cust_created
  ON orders (status, customer_id, created_at DESC);

CONCURRENTLY avoids a table lock in production. The created_at DESC suffix matches the query’s ORDER BY created_at DESC, allowing the planner to use Index Scan Backward and provide rows already in the required order.

Step 5 — Tune `work_mem` for residual spills #

If a sort remains but spills to disk, increase work_mem at session scope first to validate before making it permanent:

SET LOCAL work_mem = '64MB';  -- try 2–4× the reported Disk: XkB value

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.order_id, c.customer_name, o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'PENDING'
ORDER BY o.created_at DESC;

If Sort Method changes from external merge Disk: XkB to quicksort, the spill was the bottleneck. Make the change role-specific rather than global:

ALTER ROLE reporting_user SET work_mem = '64MB';

Never increase work_mem globally without accounting for concurrent sessions — each sort or hash operation in every session can claim up to work_mem, so a 64 MB setting under 50 concurrent users with two sorts each can require 6.4 GB of RAM.

Step 6 — Validate #

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.order_id, c.customer_name, o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'PENDING'
ORDER BY o.created_at DESC;

Confirm the Sort node is absent (replaced by Index Scan Backward) or that Sort Method now reads quicksort rather than external merge. Check Buffers: temp is gone. Total execution time should drop.

Before/After Plan Comparison #

-- BEFORE: sort spills to disk, full sequential scan
Sort  (actual time=312.4..341.7 rows=150000)  Sort Method: external merge  Disk: 4096kB
  -> Seq Scan on orders  (actual rows=150000)  Filter: (status = 'PENDING')

-- AFTER: sort node eliminated, index provides order directly
Index Scan Backward using idx_orders_status_cust_created on orders
  (actual time=0.1..18.2 rows=150000)  Index Cond: (status = 'PENDING')

The key change: Sort and Seq Scan replaced by a single Index Scan Backward that delivers rows pre-ordered, with zero temporary file usage.

Common Pitfalls #

Fixing the wrong root cause. Adding work_mem when the problem is a missing index delays spill onset but does not eliminate the sort. Conversely, adding an index when the problem is stale statistics may not change the plan at all because the optimizer still believes the row count is small enough to prefer a hash join.

Index column order does not match query predicates. An index on (created_at DESC, status) will not help a query filtering on status first — the optimizer cannot efficiently skip to status = 'PENDING' when created_at is the leading column. Filter predicates must come before sort columns in a composite index.

Increasing work_mem globally. A global work_mem increase multiplies across every concurrent sort and hash operation. Use ALTER ROLE to scope the change to the users or workloads that need it, and monitor pg_stat_activity for concurrent query counts before committing to any global change.

Ignoring implicit sorts from window functions. A window function like ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) requires a sort that does not appear in the outer ORDER BY. The fix is to ensure an index exists on (customer_id, created_at DESC) — the same pattern as for explicit sorts.

Frequently Asked Questions #

Why does my query show a Sort node without an ORDER BY clause? Implicit sorting is required to satisfy DISTINCT operations, GROUP BY aggregations, window functions, or merge join algorithms. The optimizer injects a Sort to guarantee deterministic output or to align data streams for a downstream operator — none of these appear as ORDER BY in the SQL you wrote.

How do I prevent disk spills during sort operations? Reduce the working set with tighter predicates, add a covering index that satisfies sort order without an explicit Sort node, and tune session-level work_mem. Verify with EXPLAIN ANALYZE: the Sort Method line must change from external merge Disk: XkB to quicksort or top-N heapsort.

Can stale statistics cause unexpected Sort nodes? Yes. When cardinality estimates are wrong, the optimizer may prefer sort-merge joins or explicit sorts over hash joins or index scans. Running ANALYZE on affected tables recalculates statistics and typically resolves these regressions. Query pg_stats to confirm n_distinct and correlation values are plausible before and after.

Related

Sort and Hash Node Analysis — parent: full cost model, hash vs sort strategy selection, and work_mem thresholds
Reading & Interpreting Query Plans — grandparent pillar: execution plan fundamentals and node taxonomy
Covering Index Design — eliminate sort nodes by building indexes that satisfy both filter and order requirements
Identifying Plan Bottlenecks — locate the hot node in a multi-operator plan before deciding where to intervene