Python Profiling¶

Find and fix performance bottlenecks in Python code.

CPU Profiling¶

cProfile (Built-in)¶

# Profile script
python -m cProfile -s cumtime script.py

# Save to file
python -m cProfile -o profile.prof script.py

# View results
python -m pstats profile.prof

Reading output:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10000    0.500    0.000    2.500    0.000 slow_function
    10000    2.000    0.000    2.000    0.000 inner_function

ncalls: Number of calls
tottime: Time in function (excluding subcalls)
cumtime: Time in function (including subcalls)

Programmatic Profiling¶

import cProfile
import pstats
from io import StringIO

def profile_function(func):
    """Decorator to profile a function."""
    def wrapper(*args, **kwargs):
        profiler = cProfile.Profile()
        profiler.enable()
        result = func(*args, **kwargs)
        profiler.disable()

        stream = StringIO()
        stats = pstats.Stats(profiler, stream=stream)
        stats.sort_stats('cumtime')
        stats.print_stats(20)
        print(stream.getvalue())

        return result
    return wrapper

@profile_function
def slow_function():
    # ... code to profile
    pass

py-spy (Sampling Profiler)¶

No code changes needed, minimal overhead.

# Install
pip install py-spy

# Profile running process
py-spy top --pid 12345

# Record to flamegraph
py-spy record -o profile.svg --pid 12345

# Profile a script
py-spy record -o profile.svg -- python script.py

line_profiler¶

Profile line-by-line.

pip install line_profiler

# script.py
@profile  # Decorator, no import needed
def slow_function():
    result = []
    for i in range(1000):
        result.append(i ** 2)
    return result

slow_function()

kernprof -l -v script.py

Output:

Line #    Hits     Time  Per Hit   % Time  Line Contents
     4    1000     2000      2.0     20.0      result.append(i ** 2)

Async Profiling¶

import asyncio
import cProfile
import pstats

async def main():
    await slow_async_function()

# Profile async code
profiler = cProfile.Profile()
profiler.enable()
asyncio.run(main())
profiler.disable()

stats = pstats.Stats(profiler)
stats.sort_stats('cumtime')
stats.print_stats(20)

Memory Profiling¶

memory_profiler¶

pip install memory_profiler

from memory_profiler import profile

@profile
def memory_intensive():
    large_list = [i ** 2 for i in range(1000000)]
    return sum(large_list)

memory_intensive()

Output:

Line #    Mem usage    Increment  Line Contents
    4     50.0 MiB     0.0 MiB   def memory_intensive():
    5     88.5 MiB    38.5 MiB       large_list = [...]
    6     88.5 MiB     0.0 MiB       return sum(large_list)

tracemalloc (Built-in)¶

import tracemalloc

tracemalloc.start()

# Run code
result = memory_intensive_function()

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("Top 10 memory allocations:")
for stat in top_stats[:10]:
    print(stat)

objgraph¶

Visualize object references.

pip install objgraph

import objgraph

# Most common types
objgraph.show_most_common_types(limit=10)

# Growth between calls
objgraph.show_growth()

# Reference graph
obj = find_leaky_object()
objgraph.show_backrefs([obj], filename='refs.png')

Finding Memory Leaks¶

import gc
import tracemalloc

def find_leaks():
    tracemalloc.start()

    # Force garbage collection
    gc.collect()
    snapshot1 = tracemalloc.take_snapshot()

    # Run suspected leaky code
    for _ in range(100):
        leaky_function()

    gc.collect()
    snapshot2 = tracemalloc.take_snapshot()

    # Compare snapshots
    diff = snapshot2.compare_to(snapshot1, 'lineno')

    print("Memory growth:")
    for stat in diff[:10]:
        print(stat)

find_leaks()

Flame Graphs¶

Visual representation of where time is spent.

# With py-spy
py-spy record -o flamegraph.svg --pid 12345

# View in browser
open flamegraph.svg

Reading flame graphs: - Width = time spent - Stack grows upward - Plateaus = time in that function - Wide bars at bottom = worth optimizing

FastAPI Profiling¶

from fastapi import FastAPI, Request
import cProfile
import pstats
from io import StringIO

app = FastAPI()

@app.middleware("http")
async def profile_requests(request: Request, call_next):
    if request.query_params.get("profile"):
        profiler = cProfile.Profile()
        profiler.enable()
        response = await call_next(request)
        profiler.disable()

        stream = StringIO()
        stats = pstats.Stats(profiler, stream=stream)
        stats.sort_stats('cumtime')
        stats.print_stats(30)
        print(stream.getvalue())

        return response

    return await call_next(request)

# Usage: GET /api/users?profile=1

Database Query Profiling¶

import logging
import time
from sqlalchemy import event

logger = logging.getLogger("sqlalchemy.engine")

@event.listens_for(Engine, "before_cursor_execute")
def before_cursor_execute(conn, cursor, statement, parameters, context, executemany):
    conn.info.setdefault('query_start_time', []).append(time.time())

@event.listens_for(Engine, "after_cursor_execute")
def after_cursor_execute(conn, cursor, statement, parameters, context, executemany):
    total = time.time() - conn.info['query_start_time'].pop()
    if total > 0.1:  # Log slow queries
        logger.warning(f"Slow query ({total:.2f}s): {statement[:100]}")

Profiling Tips¶

What to Profile¶

Slow endpoints — Start with user-reported issues
High-traffic paths — Small improvements = big impact
Background jobs — Often overlooked
Startup time — Affects deployment

Common Bottlenecks¶

Symptom	Likely Cause
High CPU, low I/O	Algorithm inefficiency
Low CPU, high I/O	Database/network bound
Memory growth	Leaks, caching gone wrong
Spiky latency	GC pauses, locks