Skip to content

Sartiq Documentation

Choosing the Right Tool

Choosing the Right Tool¶

A decision framework for Python concurrency.

Quick Decision Tree¶

What type of work?
│
├── I/O-bound (network, disk, database)?
│   │
│   ├── High concurrency (1000s of connections)?
│   │   └── asyncio
│   │
│   ├── Moderate concurrency (10-100)?
│   │   ├── New code? → asyncio
│   │   └── Existing sync code? → threading
│   │
│   └── Legacy code can't use async?
│       └── threading
│
├── CPU-bound (computation)?
│   │
│   ├── Pure Python?
│   │   └── multiprocessing
│   │
│   ├── NumPy/Pandas/C extensions?
│   │   └── threading (GIL released in C)
│   │
│   └── Heavy computation, need scaling?
│       └── multiprocessing or distributed (Celery, Dask)
│
└── Mixed (I/O + CPU)?
    └── asyncio + run_in_executor (ProcessPoolExecutor)

Comparison Table¶

Aspect	asyncio	threading	multiprocessing
Best for	I/O-bound	I/O-bound	CPU-bound
Concurrency	High (10K+)	Moderate (100s)	Limited by CPUs
Memory	Low	Medium	High
Startup	Fast	Fast	Slow
Communication	Easy	Shared memory	Serialization
GIL impact	None	Limited	None
Debugging	Medium	Hard	Medium

When to Use asyncio¶

Good Fit¶

# HTTP requests
async def fetch_all_users():
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_user(session, id) for id in user_ids]
        return await asyncio.gather(*tasks)

# Database queries
async def get_user_data(user_id: int):
    user, posts, comments = await asyncio.gather(
        db.get_user(user_id),
        db.get_posts(user_id),
        db.get_comments(user_id),
    )
    return {"user": user, "posts": posts, "comments": comments}

# WebSocket handling
async def websocket_handler(websocket):
    async for message in websocket:
        response = await process_message(message)
        await websocket.send(response)

Indicators¶

Network requests (HTTP, WebSocket, TCP)
Database queries
File I/O (with aiofiles)
Many concurrent connections
Event-driven architecture

When to Use Threading¶

Good Fit¶

# Parallel file downloads with requests (sync library)
from concurrent.futures import ThreadPoolExecutor
import requests

def download(url):
    return requests.get(url).content

with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(executor.map(download, urls))

# Blocking I/O in existing sync codebase
def process_files(paths):
    with ThreadPoolExecutor() as executor:
        return list(executor.map(process_file, paths))

Indicators¶

Can't use async (legacy code, sync libraries)
Moderate number of concurrent tasks
Blocking I/O operations
Need shared memory access

When to Use Multiprocessing¶

Good Fit¶

# CPU-intensive data processing
from concurrent.futures import ProcessPoolExecutor

def process_image(path):
    img = load_image(path)
    return apply_filters(img)

with ProcessPoolExecutor() as executor:
    results = list(executor.map(process_image, image_paths))

# Parallel computation
def compute_chunk(data):
    return [x ** 2 + 3 * x + 1 for x in data]

with ProcessPoolExecutor() as executor:
    results = list(executor.map(compute_chunk, data_chunks))

Indicators¶

Pure Python computation
Need true parallelism
CPU usage is bottleneck
Tasks are independent

Mixed Workloads¶

I/O with CPU-bound Processing¶

import asyncio
from concurrent.futures import ProcessPoolExecutor

def cpu_intensive(data):
    # Heavy computation
    return process(data)

async def fetch_and_process(url):
    # I/O: fetch data
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            data = await response.json()

    # CPU: process in separate process
    loop = asyncio.get_running_loop()
    with ProcessPoolExecutor() as pool:
        result = await loop.run_in_executor(pool, cpu_intensive, data)

    return result

Async with Sync Libraries¶

import asyncio

def sync_database_call():
    # Using a sync database driver
    return db.execute("SELECT * FROM users")

async def main():
    # Run sync code in thread pool
    result = await asyncio.to_thread(sync_database_call)
    return result

Performance Considerations¶

Task Granularity¶

# Too fine-grained (overhead > benefit)
async def bad():
    results = []
    for i in range(1000000):
        result = await process_single_item(i)  # Tiny task, big overhead
        results.append(result)

# Better: Batch operations
async def good():
    chunks = [items[i:i+1000] for i in range(0, len(items), 1000)]
    results = await asyncio.gather(*[process_chunk(c) for c in chunks])

Pool Sizing¶

import multiprocessing

# CPU-bound: match CPU count
cpu_workers = multiprocessing.cpu_count()

# I/O-bound: can exceed CPU count
io_workers = 20  # Or more for high-latency I/O

# Mixed: experiment to find optimal
# Start with CPU count, adjust based on profiling

Memory Usage¶

# multiprocessing: Each process copies data
# Solution: Use shared memory or files

# threading: Shared memory, but beware of GIL

# asyncio: Single process, most memory efficient
# But watch for memory leaks in long-running tasks

Common Mistakes¶

Using Threading for CPU-Bound Work¶

# Wrong: Threading won't parallelize CPU work
def cpu_work():
    return sum(i ** 2 for i in range(10_000_000))

# This is NOT faster due to GIL
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(lambda _: cpu_work(), range(4)))

# Right: Use multiprocessing
with ProcessPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(lambda _: cpu_work(), range(4)))

Blocking in Async Code¶

# Wrong: Blocks the event loop
async def bad():
    time.sleep(1)  # Blocks!
    result = requests.get(url)  # Blocks!

# Right: Use async alternatives
async def good():
    await asyncio.sleep(1)
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.json()

Not Awaiting Coroutines¶

# Wrong: Coroutine never runs
async def bad():
    fetch_data()  # Warning: coroutine never awaited!

# Right: Await the coroutine
async def good():
    result = await fetch_data()

Summary¶

Situation	Use	Example
HTTP requests	asyncio	aiohttp, httpx
Database queries	asyncio	asyncpg, databases
File downloads	asyncio or threading	aiohttp or requests
Image processing	multiprocessing	Pillow, OpenCV
Data crunching	multiprocessing	pandas, numpy
Background jobs	Celery, RQ	Email, reports
Legacy sync code	threading	Existing libraries
Web server	asyncio	FastAPI, Starlette

Rule of thumb: - Start with asyncio for I/O - Use multiprocessing for CPU - Use threading only when async isn't feasible