Indexing Strategies¶
Choose the right index type for your queries and data patterns.
Index Types Overview¶
| Index Type | Best For | Example Use Case |
|---|---|---|
| B-tree (default) | Equality, range, sorting | WHERE id = 1, ORDER BY date |
| Hash | Equality only | WHERE uuid = '...' (rare) |
| GIN | Arrays, JSONB, full-text | WHERE tags @> ARRAY['a'] |
| GiST | Geometric, full-text, ranges | PostGIS, range types |
| BRIN | Large sorted tables | Time-series data |
B-tree Indexes¶
The default and most common index type.
Basic Index¶
-- Single column
CREATE INDEX idx_users_email ON users(email);
-- Multi-column (order matters!)
CREATE INDEX idx_bookings_user_status ON bookings(user_id, status);
Column Order in Multi-Column Indexes¶
The leftmost column(s) must be in the WHERE clause for the index to be used.
CREATE INDEX idx_bookings_user_status_date
ON bookings(user_id, status, created_at);
-- Uses index (matches prefix)
WHERE user_id = 1
WHERE user_id = 1 AND status = 'active'
WHERE user_id = 1 AND status = 'active' AND created_at > '2024-01-01'
-- Does NOT use index (missing user_id)
WHERE status = 'active'
WHERE created_at > '2024-01-01'
Rule: Put most selective (unique) columns first, unless sorting needs differ.
Index for Sorting¶
-- Query with ORDER BY
SELECT * FROM bookings
WHERE user_id = 1
ORDER BY created_at DESC;
-- Index supports both filter AND sort
CREATE INDEX idx_bookings_user_date
ON bookings(user_id, created_at DESC);
Covering Indexes (INCLUDE)¶
Include extra columns to enable Index Only Scans.
-- Query
SELECT email, name FROM users WHERE status = 'active';
-- Covering index
CREATE INDEX idx_users_status_covering
ON users(status)
INCLUDE (email, name);
-- Now: Index Only Scan (never touches table)
Partial Indexes¶
Index only rows matching a condition. Smaller and faster.
-- Only index active users
CREATE INDEX idx_users_email_active
ON users(email)
WHERE status = 'active';
-- Only index unprocessed jobs
CREATE INDEX idx_jobs_pending
ON jobs(created_at)
WHERE status = 'pending';
-- Unique constraint on non-deleted records only
CREATE UNIQUE INDEX idx_users_email_not_deleted
ON users(email)
WHERE deleted_at IS NULL;
When to Use Partial Indexes¶
- Hot/cold data split — Index only "hot" rows
- Skewed distributions — Most rows have one value
- Soft deletes — Exclude deleted rows
- Unique with exceptions — Allow duplicates in some cases
Expression Indexes¶
Index the result of an expression.
-- Case-insensitive email lookup
CREATE INDEX idx_users_email_lower
ON users(LOWER(email));
-- Query must use same expression
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- Date extraction
CREATE INDEX idx_bookings_year
ON bookings(EXTRACT(YEAR FROM created_at));
-- JSONB field
CREATE INDEX idx_users_preferences_theme
ON users((preferences->>'theme'));
GIN Indexes¶
For arrays, JSONB, and full-text search.
Array Containment¶
-- Users with tags
CREATE INDEX idx_users_tags_gin ON users USING GIN(tags);
-- Queries that use the index
SELECT * FROM users WHERE tags @> ARRAY['premium']; -- contains
SELECT * FROM users WHERE tags && ARRAY['a', 'b']; -- overlaps
JSONB Indexing¶
-- Index all keys (supports @>, ?, ?&, ?|)
CREATE INDEX idx_users_metadata_gin
ON users USING GIN(metadata);
-- Supports
WHERE metadata @> '{"type": "premium"}'
WHERE metadata ? 'verified'
WHERE metadata ?& ARRAY['key1', 'key2']
-- Index specific path (smaller, faster)
CREATE INDEX idx_users_metadata_type
ON users USING GIN((metadata->'type'));
jsonb_path_ops¶
More efficient for @> containment queries.
CREATE INDEX idx_products_attrs
ON products USING GIN(attributes jsonb_path_ops);
-- Faster for
WHERE attributes @> '{"color": "red"}'
-- But doesn't support ? (key exists) operators
Full-Text Search Indexes¶
See Full-Text Search for complete coverage.
-- GIN index on tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
CREATE INDEX idx_articles_search
ON articles USING GIN(search_vector);
-- Or expression index
CREATE INDEX idx_articles_search_expr
ON articles USING GIN(to_tsvector('english', title || ' ' || body));
BRIN Indexes¶
Block Range INdex — tiny indexes for naturally sorted data.
-- Time-series table (rows inserted in time order)
CREATE INDEX idx_events_time_brin
ON events USING BRIN(created_at);
-- Check correlation (should be close to 1.0 or -1.0)
SELECT correlation
FROM pg_stats
WHERE tablename = 'events' AND attname = 'created_at';
When to use BRIN: - Append-only tables (logs, events) - Data physically ordered by index column - Very large tables where B-tree would be huge
Limitations: - Less precise than B-tree - Poor for random access patterns - Bad if data is not physically ordered
Unique Indexes¶
Enforce uniqueness and provide fast lookups.
-- Simple unique
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Partial unique (unique among non-deleted)
CREATE UNIQUE INDEX idx_users_email_active
ON users(email)
WHERE deleted_at IS NULL;
-- Multi-column unique
CREATE UNIQUE INDEX idx_booking_slots
ON booking_slots(date, time_slot, resource_id);
Composite Primary Keys¶
-- Junction table with composite PK
CREATE TABLE user_roles (
user_id INTEGER REFERENCES users(id),
role_id INTEGER REFERENCES roles(id),
granted_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (user_id, role_id)
);
-- Note: PK creates an index on (user_id, role_id)
-- You may still need a separate index on (role_id) for reverse lookups
CREATE INDEX idx_user_roles_role ON user_roles(role_id);
Index Maintenance¶
Check Index Size¶
SELECT
indexrelname AS index,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_scan AS scans
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
Find Unused Indexes¶
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;
Find Duplicate Indexes¶
SELECT
a.indrelid::regclass AS table,
a.indexrelid::regclass AS index1,
b.indexrelid::regclass AS index2
FROM pg_index a
JOIN pg_index b ON (
a.indrelid = b.indrelid
AND a.indexrelid < b.indexrelid
AND a.indkey = b.indkey
);
Rebuild Bloated Indexes¶
-- Concurrent rebuild (no locks)
REINDEX INDEX CONCURRENTLY idx_users_email;
-- Rebuild all indexes on a table
REINDEX TABLE CONCURRENTLY users;
Best Practices¶
Do¶
- Index foreign keys — PostgreSQL doesn't auto-index them
- Use covering indexes — For frequently-run queries
- Use partial indexes — For skewed data distributions
- Match index order to query order — For multi-column indexes
- Monitor usage — Remove unused indexes
Don't¶
- Over-index — Each index slows writes
- Index low-cardinality columns alone —
statuswith 3 values isn't selective - Ignore column order —
(a, b)doesn't helpWHERE b = ? - Use indexes for small tables — Sequential scan is faster
Index Sizing Guidelines¶
| Table Size | Indexing Strategy |
|---|---|
| < 1K rows | Often no indexes needed beyond PK |
| 1K - 100K | Index filtered columns |
| 100K - 1M | Careful index design, covering indexes |
| > 1M | Partial indexes, BRIN for time-series |
Decision Tree¶
Need to query by column X?
├── Equality only?
│ ├── High cardinality? → B-tree
│ ├── Array/JSONB? → GIN
│ └── Low cardinality? → Maybe no index (or partial)
├── Range queries?
│ ├── Time-series (ordered inserts)? → BRIN
│ └── Random access? → B-tree
├── Sorting?
│ └── Include in B-tree with correct order
├── Full-text search?
│ └── GIN on tsvector
└── Multiple conditions?
└── Multi-column B-tree (selective columns first)