Indexing Strategies¶

Choose the right index type for your queries and data patterns.

Index Types Overview¶

Index Type	Best For	Example Use Case
B-tree (default)	Equality, range, sorting	`WHERE id = 1`, `ORDER BY date`
Hash	Equality only	`WHERE uuid = '...'` (rare)
GIN	Arrays, JSONB, full-text	`WHERE tags @> ARRAY['a']`
GiST	Geometric, full-text, ranges	PostGIS, range types
BRIN	Large sorted tables	Time-series data

B-tree Indexes¶

The default and most common index type.

Basic Index¶

-- Single column
CREATE INDEX idx_users_email ON users(email);

-- Multi-column (order matters!)
CREATE INDEX idx_bookings_user_status ON bookings(user_id, status);

Column Order in Multi-Column Indexes¶

The leftmost column(s) must be in the WHERE clause for the index to be used.

CREATE INDEX idx_bookings_user_status_date
ON bookings(user_id, status, created_at);

-- Uses index (matches prefix)
WHERE user_id = 1
WHERE user_id = 1 AND status = 'active'
WHERE user_id = 1 AND status = 'active' AND created_at > '2024-01-01'

-- Does NOT use index (missing user_id)
WHERE status = 'active'
WHERE created_at > '2024-01-01'

Rule: Put most selective (unique) columns first, unless sorting needs differ.

Index for Sorting¶

-- Query with ORDER BY
SELECT * FROM bookings
WHERE user_id = 1
ORDER BY created_at DESC;

-- Index supports both filter AND sort
CREATE INDEX idx_bookings_user_date
ON bookings(user_id, created_at DESC);

Covering Indexes (INCLUDE)¶

Include extra columns to enable Index Only Scans.

-- Query
SELECT email, name FROM users WHERE status = 'active';

-- Covering index
CREATE INDEX idx_users_status_covering
ON users(status)
INCLUDE (email, name);

-- Now: Index Only Scan (never touches table)

Partial Indexes¶

Index only rows matching a condition. Smaller and faster.

-- Only index active users
CREATE INDEX idx_users_email_active
ON users(email)
WHERE status = 'active';

-- Only index unprocessed jobs
CREATE INDEX idx_jobs_pending
ON jobs(created_at)
WHERE status = 'pending';

-- Unique constraint on non-deleted records only
CREATE UNIQUE INDEX idx_users_email_not_deleted
ON users(email)
WHERE deleted_at IS NULL;

When to Use Partial Indexes¶

Hot/cold data split — Index only "hot" rows
Skewed distributions — Most rows have one value
Soft deletes — Exclude deleted rows
Unique with exceptions — Allow duplicates in some cases

Expression Indexes¶

Index the result of an expression.

-- Case-insensitive email lookup
CREATE INDEX idx_users_email_lower
ON users(LOWER(email));

-- Query must use same expression
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- Date extraction
CREATE INDEX idx_bookings_year
ON bookings(EXTRACT(YEAR FROM created_at));

-- JSONB field
CREATE INDEX idx_users_preferences_theme
ON users((preferences->>'theme'));

GIN Indexes¶

For arrays, JSONB, and full-text search.

Array Containment¶

-- Users with tags
CREATE INDEX idx_users_tags_gin ON users USING GIN(tags);

-- Queries that use the index
SELECT * FROM users WHERE tags @> ARRAY['premium'];  -- contains
SELECT * FROM users WHERE tags && ARRAY['a', 'b'];   -- overlaps

JSONB Indexing¶

-- Index all keys (supports @>, ?, ?&, ?|)
CREATE INDEX idx_users_metadata_gin
ON users USING GIN(metadata);

-- Supports
WHERE metadata @> '{"type": "premium"}'
WHERE metadata ? 'verified'
WHERE metadata ?& ARRAY['key1', 'key2']

-- Index specific path (smaller, faster)
CREATE INDEX idx_users_metadata_type
ON users USING GIN((metadata->'type'));

jsonb_path_ops¶

More efficient for @> containment queries.

CREATE INDEX idx_products_attrs
ON products USING GIN(attributes jsonb_path_ops);

-- Faster for
WHERE attributes @> '{"color": "red"}'

-- But doesn't support ? (key exists) operators

Full-Text Search Indexes¶

See Full-Text Search for complete coverage.

-- GIN index on tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;

CREATE INDEX idx_articles_search
ON articles USING GIN(search_vector);

-- Or expression index
CREATE INDEX idx_articles_search_expr
ON articles USING GIN(to_tsvector('english', title || ' ' || body));

BRIN Indexes¶

Block Range INdex — tiny indexes for naturally sorted data.

-- Time-series table (rows inserted in time order)
CREATE INDEX idx_events_time_brin
ON events USING BRIN(created_at);

-- Check correlation (should be close to 1.0 or -1.0)
SELECT correlation
FROM pg_stats
WHERE tablename = 'events' AND attname = 'created_at';

When to use BRIN: - Append-only tables (logs, events) - Data physically ordered by index column - Very large tables where B-tree would be huge

Limitations: - Less precise than B-tree - Poor for random access patterns - Bad if data is not physically ordered

Unique Indexes¶

Enforce uniqueness and provide fast lookups.

-- Simple unique
CREATE UNIQUE INDEX idx_users_email ON users(email);

-- Partial unique (unique among non-deleted)
CREATE UNIQUE INDEX idx_users_email_active
ON users(email)
WHERE deleted_at IS NULL;

-- Multi-column unique
CREATE UNIQUE INDEX idx_booking_slots
ON booking_slots(date, time_slot, resource_id);

Composite Primary Keys¶

-- Junction table with composite PK
CREATE TABLE user_roles (
    user_id INTEGER REFERENCES users(id),
    role_id INTEGER REFERENCES roles(id),
    granted_at TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (user_id, role_id)
);

-- Note: PK creates an index on (user_id, role_id)
-- You may still need a separate index on (role_id) for reverse lookups
CREATE INDEX idx_user_roles_role ON user_roles(role_id);

Index Maintenance¶

Check Index Size¶

SELECT
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size,
    idx_scan AS scans
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Find Unused Indexes¶

SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

Find Duplicate Indexes¶

SELECT
    a.indrelid::regclass AS table,
    a.indexrelid::regclass AS index1,
    b.indexrelid::regclass AS index2
FROM pg_index a
JOIN pg_index b ON (
    a.indrelid = b.indrelid
    AND a.indexrelid < b.indexrelid
    AND a.indkey = b.indkey
);

Rebuild Bloated Indexes¶

-- Concurrent rebuild (no locks)
REINDEX INDEX CONCURRENTLY idx_users_email;

-- Rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY users;

Best Practices¶

Do¶

Index foreign keys — PostgreSQL doesn't auto-index them
Use covering indexes — For frequently-run queries
Use partial indexes — For skewed data distributions
Match index order to query order — For multi-column indexes
Monitor usage — Remove unused indexes

Don't¶

Over-index — Each index slows writes
Index low-cardinality columns alone — status with 3 values isn't selective
Ignore column order — (a, b) doesn't help WHERE b = ?
Use indexes for small tables — Sequential scan is faster

Index Sizing Guidelines¶

Table Size	Indexing Strategy
< 1K rows	Often no indexes needed beyond PK
1K - 100K	Index filtered columns
100K - 1M	Careful index design, covering indexes
> 1M	Partial indexes, BRIN for time-series

Decision Tree¶

Need to query by column X?
├── Equality only?
│   ├── High cardinality? → B-tree
│   ├── Array/JSONB? → GIN
│   └── Low cardinality? → Maybe no index (or partial)
├── Range queries?
│   ├── Time-series (ordered inserts)? → BRIN
│   └── Random access? → B-tree
├── Sorting?
│   └── Include in B-tree with correct order
├── Full-text search?
│   └── GIN on tsvector
└── Multiple conditions?
    └── Multi-column B-tree (selective columns first)