Common Concurrency Problems¶
Understand and avoid the classic pitfalls of concurrent programming.
Race Conditions¶
A bug where behavior depends on the timing of events.
Example: Check-Then-Act¶
# Unsafe: Race between check and update
if balance >= amount:
# Another thread could withdraw here!
balance -= amount
Timeline:
Thread A: if balance >= 100 (balance = 150, True)
Thread B: if balance >= 100 (balance = 150, True)
Thread A: balance -= 100 (balance = 50)
Thread B: balance -= 100 (balance = -50) ← Bug!
Solution: Atomic Check-and-Update¶
Example: Read-Modify-Write¶
# Unsafe: counter += 1 is not atomic
counter = 0
def increment():
global counter
counter += 1 # Read, add, write — can be interrupted
Solution:
Deadlocks¶
Circular waiting where no thread can proceed.
Classic Example: Two Locks¶
lock_a = threading.Lock()
lock_b = threading.Lock()
def thread_1():
with lock_a: # Holds A
time.sleep(0.1)
with lock_b: # Waits for B (held by thread_2)
do_work()
def thread_2():
with lock_b: # Holds B
time.sleep(0.1)
with lock_a: # Waits for A (held by thread_1)
do_work()
# Deadlock! Thread 1 waits for B, Thread 2 waits for A
Conditions for Deadlock¶
All four must be true (Coffman conditions):
- Mutual exclusion — Resource held exclusively
- Hold and wait — Hold one, wait for another
- No preemption — Can't forcibly take resource
- Circular wait — A waits for B waits for A
Solutions¶
1. Lock Ordering
# Always acquire locks in same order
def safe_transfer(from_acc, to_acc, amount):
# Order by account ID
first, second = sorted([from_acc, to_acc], key=lambda a: a.id)
with first.lock:
with second.lock:
from_acc.balance -= amount
to_acc.balance += amount
2. Try-Lock with Timeout
def try_transfer(from_acc, to_acc, amount):
while True:
if from_acc.lock.acquire(timeout=0.1):
try:
if to_acc.lock.acquire(timeout=0.1):
try:
# Do transfer
return True
finally:
to_acc.lock.release()
finally:
from_acc.lock.release()
# Back off and retry
time.sleep(random.random() * 0.1)
3. Lock-Free Design
# Use atomic operations instead of locks
from queue import Queue
# Producer-consumer without explicit locks
task_queue = Queue() # Internally synchronized
Livelocks¶
Threads are active but make no progress.
# Two threads yielding to each other forever
def thread_1():
while resource.is_used_by_other():
release_resource()
time.sleep(0) # Yield
acquire_resource()
def thread_2():
while resource.is_used_by_other():
release_resource()
time.sleep(0) # Yield
acquire_resource()
# Both keep releasing, neither makes progress
Solution: Random Backoff
def thread_with_backoff():
while resource.is_used_by_other():
release_resource()
time.sleep(random.random() * 0.1) # Random wait
acquire_resource()
Starvation¶
A thread never gets the resources it needs.
# High-priority threads always run, low-priority starves
def high_priority():
while True:
with lock:
do_important_work()
def low_priority():
with lock: # Never acquires — high priority always gets it
do_work()
Solutions: - Fair locks (FIFO ordering) - Priority inheritance - Aging (increase priority over time)
Priority Inversion¶
High-priority task waits for low-priority task.
High priority: ─────[waiting for lock...]────────[runs]
Medium priority: ─────[runs][runs][runs][runs]─────
Low priority: [has lock]───────────────────────[releases]
High waits for Low, but Medium keeps running
Solution: Priority Inheritance
Low-priority task temporarily inherits high priority while holding the lock.
Data Races¶
Concurrent access where at least one is a write, without synchronization.
# Data race: unsynchronized access
shared_data = []
def writer():
shared_data.append(value) # Write
def reader():
return shared_data[-1] # Read
# Can crash or return garbage
Solution: Synchronize or use thread-safe types
from queue import Queue
from threading import Lock
# Option 1: Lock
lock = Lock()
def safe_append(value):
with lock:
shared_data.append(value)
# Option 2: Thread-safe type
queue = Queue()
queue.put(value)
Memory Visibility Issues¶
Changes in one thread not visible to another.
# Thread 1
flag = True
value = 42
# Thread 2
if flag: # Might see flag=True but value=0 due to reordering
print(value)
Cause: CPU caching, compiler reordering
Solutions: - Use locks (provide memory barriers) - Use atomic types - Use volatile (in languages that have it)
In Python, the GIL provides some memory safety, but it's still best to use proper synchronization.
Detection and Debugging¶
Detecting Race Conditions¶
# Add delays to expose races
def suspicious_function():
value = shared_state
time.sleep(0.1) # Makes race more likely to trigger
shared_state = value + 1
# Use thread sanitizers (C/C++/Rust)
# Use -Xcheck:jni (Java)
Detecting Deadlocks¶
# Timeout-based detection
if not lock.acquire(timeout=5.0):
logging.error("Possible deadlock!")
# Log stack traces of all threads
import traceback
for thread in threading.enumerate():
print(f"\n{thread.name}:")
traceback.print_stack(sys._current_frames()[thread.ident])
Logging for Debugging¶
import threading
import logging
logging.basicConfig(level=logging.DEBUG)
def logged_acquire(lock, name):
thread = threading.current_thread().name
logging.debug(f"{thread}: Attempting to acquire {name}")
lock.acquire()
logging.debug(f"{thread}: Acquired {name}")
def logged_release(lock, name):
thread = threading.current_thread().name
logging.debug(f"{thread}: Releasing {name}")
lock.release()
Prevention Guidelines¶
Design Principles¶
- Minimize shared state — Less sharing = fewer bugs
- Immutable data when possible — Can't race on read-only data
- Message passing over shared memory — Queues, channels
- Lock ordering conventions — Document and follow
- Keep critical sections small — Less time holding locks
Code Review Checklist¶
- All shared mutable state is protected
- Locks are acquired in consistent order
- No lock held during I/O or slow operations
- Context managers used for lock release
- Thread-safe data structures where appropriate
- Timeouts on blocking operations
Testing¶
# Stress test for race conditions
def stress_test():
threads = []
for _ in range(100):
t = threading.Thread(target=concurrent_operation)
threads.append(t)
t.start()
for t in threads:
t.join()
assert expected_invariant()
# Run many times
for i in range(1000):
stress_test()
Summary¶
| Problem | Cause | Prevention |
|---|---|---|
| Race condition | Unsynchronized access | Locks, atomics |
| Deadlock | Circular lock waiting | Lock ordering, timeouts |
| Livelock | Active but no progress | Random backoff |
| Starvation | Never gets resources | Fair locks, aging |
| Data race | Unsynchronized read/write | Proper synchronization |