Skip to content

Indexing Strategies

Choose the right index type for your queries and data patterns.

Index Types Overview

Index Type Best For Example Use Case
B-tree (default) Equality, range, sorting WHERE id = 1, ORDER BY date
Hash Equality only WHERE uuid = '...' (rare)
GIN Arrays, JSONB, full-text WHERE tags @> ARRAY['a']
GiST Geometric, full-text, ranges PostGIS, range types
BRIN Large sorted tables Time-series data

B-tree Indexes

The default and most common index type.

Basic Index

-- Single column
CREATE INDEX idx_users_email ON users(email);

-- Multi-column (order matters!)
CREATE INDEX idx_bookings_user_status ON bookings(user_id, status);

Column Order in Multi-Column Indexes

The leftmost column(s) must be in the WHERE clause for the index to be used.

CREATE INDEX idx_bookings_user_status_date
ON bookings(user_id, status, created_at);

-- Uses index (matches prefix)
WHERE user_id = 1
WHERE user_id = 1 AND status = 'active'
WHERE user_id = 1 AND status = 'active' AND created_at > '2024-01-01'

-- Does NOT use index (missing user_id)
WHERE status = 'active'
WHERE created_at > '2024-01-01'

Rule: Put most selective (unique) columns first, unless sorting needs differ.

Index for Sorting

-- Query with ORDER BY
SELECT * FROM bookings
WHERE user_id = 1
ORDER BY created_at DESC;

-- Index supports both filter AND sort
CREATE INDEX idx_bookings_user_date
ON bookings(user_id, created_at DESC);

Covering Indexes (INCLUDE)

Include extra columns to enable Index Only Scans.

-- Query
SELECT email, name FROM users WHERE status = 'active';

-- Covering index
CREATE INDEX idx_users_status_covering
ON users(status)
INCLUDE (email, name);

-- Now: Index Only Scan (never touches table)

Partial Indexes

Index only rows matching a condition. Smaller and faster.

-- Only index active users
CREATE INDEX idx_users_email_active
ON users(email)
WHERE status = 'active';

-- Only index unprocessed jobs
CREATE INDEX idx_jobs_pending
ON jobs(created_at)
WHERE status = 'pending';

-- Unique constraint on non-deleted records only
CREATE UNIQUE INDEX idx_users_email_not_deleted
ON users(email)
WHERE deleted_at IS NULL;

When to Use Partial Indexes

  • Hot/cold data split — Index only "hot" rows
  • Skewed distributions — Most rows have one value
  • Soft deletes — Exclude deleted rows
  • Unique with exceptions — Allow duplicates in some cases

Expression Indexes

Index the result of an expression.

-- Case-insensitive email lookup
CREATE INDEX idx_users_email_lower
ON users(LOWER(email));

-- Query must use same expression
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- Date extraction
CREATE INDEX idx_bookings_year
ON bookings(EXTRACT(YEAR FROM created_at));

-- JSONB field
CREATE INDEX idx_users_preferences_theme
ON users((preferences->>'theme'));

GIN Indexes

For arrays, JSONB, and full-text search.

Array Containment

-- Users with tags
CREATE INDEX idx_users_tags_gin ON users USING GIN(tags);

-- Queries that use the index
SELECT * FROM users WHERE tags @> ARRAY['premium'];  -- contains
SELECT * FROM users WHERE tags && ARRAY['a', 'b'];   -- overlaps

JSONB Indexing

-- Index all keys (supports @>, ?, ?&, ?|)
CREATE INDEX idx_users_metadata_gin
ON users USING GIN(metadata);

-- Supports
WHERE metadata @> '{"type": "premium"}'
WHERE metadata ? 'verified'
WHERE metadata ?& ARRAY['key1', 'key2']

-- Index specific path (smaller, faster)
CREATE INDEX idx_users_metadata_type
ON users USING GIN((metadata->'type'));

jsonb_path_ops

More efficient for @> containment queries.

CREATE INDEX idx_products_attrs
ON products USING GIN(attributes jsonb_path_ops);

-- Faster for
WHERE attributes @> '{"color": "red"}'

-- But doesn't support ? (key exists) operators

Full-Text Search Indexes

See Full-Text Search for complete coverage.

-- GIN index on tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;

CREATE INDEX idx_articles_search
ON articles USING GIN(search_vector);

-- Or expression index
CREATE INDEX idx_articles_search_expr
ON articles USING GIN(to_tsvector('english', title || ' ' || body));

BRIN Indexes

Block Range INdex — tiny indexes for naturally sorted data.

-- Time-series table (rows inserted in time order)
CREATE INDEX idx_events_time_brin
ON events USING BRIN(created_at);

-- Check correlation (should be close to 1.0 or -1.0)
SELECT correlation
FROM pg_stats
WHERE tablename = 'events' AND attname = 'created_at';

When to use BRIN: - Append-only tables (logs, events) - Data physically ordered by index column - Very large tables where B-tree would be huge

Limitations: - Less precise than B-tree - Poor for random access patterns - Bad if data is not physically ordered

Unique Indexes

Enforce uniqueness and provide fast lookups.

-- Simple unique
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Partial unique (unique among non-deleted)
CREATE UNIQUE INDEX idx_users_email_active
ON users(email)
WHERE deleted_at IS NULL;

-- Multi-column unique
CREATE UNIQUE INDEX idx_booking_slots
ON booking_slots(date, time_slot, resource_id);

Composite Primary Keys

-- Junction table with composite PK
CREATE TABLE user_roles (
    user_id INTEGER REFERENCES users(id),
    role_id INTEGER REFERENCES roles(id),
    granted_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (user_id, role_id)
);

-- Note: PK creates an index on (user_id, role_id)
-- You may still need a separate index on (role_id) for reverse lookups
CREATE INDEX idx_user_roles_role ON user_roles(role_id);

Index Maintenance

Check Index Size

SELECT
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size,
    idx_scan AS scans
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Find Unused Indexes

SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

Find Duplicate Indexes

SELECT
    a.indrelid::regclass AS table,
    a.indexrelid::regclass AS index1,
    b.indexrelid::regclass AS index2
FROM pg_index a
JOIN pg_index b ON (
    a.indrelid = b.indrelid
    AND a.indexrelid < b.indexrelid
    AND a.indkey = b.indkey
);

Rebuild Bloated Indexes

-- Concurrent rebuild (no locks)
REINDEX INDEX CONCURRENTLY idx_users_email;

-- Rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY users;

Best Practices

Do

  1. Index foreign keys — PostgreSQL doesn't auto-index them
  2. Use covering indexes — For frequently-run queries
  3. Use partial indexes — For skewed data distributions
  4. Match index order to query order — For multi-column indexes
  5. Monitor usage — Remove unused indexes

Don't

  1. Over-index — Each index slows writes
  2. Index low-cardinality columns alonestatus with 3 values isn't selective
  3. Ignore column order(a, b) doesn't help WHERE b = ?
  4. Use indexes for small tables — Sequential scan is faster

Index Sizing Guidelines

Table Size Indexing Strategy
< 1K rows Often no indexes needed beyond PK
1K - 100K Index filtered columns
100K - 1M Careful index design, covering indexes
> 1M Partial indexes, BRIN for time-series

Decision Tree

Need to query by column X?
├── Equality only?
│   ├── High cardinality? → B-tree
│   ├── Array/JSONB? → GIN
│   └── Low cardinality? → Maybe no index (or partial)
├── Range queries?
│   ├── Time-series (ordered inserts)? → BRIN
│   └── Random access? → B-tree
├── Sorting?
│   └── Include in B-tree with correct order
├── Full-text search?
│   └── GIN on tsvector
└── Multiple conditions?
    └── Multi-column B-tree (selective columns first)