Skip to content

Python GIL: How It Works and When It Matters

Understanding Python's Global Interpreter Lock.

What is the GIL?

The Global Interpreter Lock (GIL) is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecode simultaneously.

Without GIL (hypothetical):
Thread 1: ─execute─execute─execute─
Thread 2: ─execute─execute─execute─  (true parallelism)

With GIL (reality):
Thread 1: ─execute─░░░░░░░─execute─
Thread 2: ─░░░░░░░─execute─░░░░░░░─  (interleaved)

Why Does the GIL Exist?

  1. Memory management safety — Reference counting is not thread-safe
  2. C extension compatibility — Many C extensions aren't thread-safe
  3. Simplicity — Easier implementation of the interpreter
# Reference counting example
a = "hello"  # refcount = 1
b = a        # refcount = 2
del b        # refcount = 1

# Without GIL, two threads could corrupt the refcount

When Does the GIL Matter?

GIL Does NOT Limit:

  • I/O operations — GIL released during I/O
  • C extensions — Can release GIL (NumPy does this)
  • Multiprocessing — Each process has own GIL
  • Async/await — Cooperative multitasking, no threads

GIL DOES Limit:

  • CPU-bound Python code — Can't parallelize with threads
  • Pure Python loops — No speedup from threading

Demonstration

CPU-Bound: Threading Doesn't Help

import time
import threading

def cpu_work():
    total = 0
    for i in range(10_000_000):
        total += i * i
    return total

# Sequential
start = time.time()
cpu_work()
cpu_work()
print(f"Sequential: {time.time() - start:.2f}s")  # ~2.0s

# Threaded (NOT faster due to GIL)
start = time.time()
t1 = threading.Thread(target=cpu_work)
t2 = threading.Thread(target=cpu_work)
t1.start()
t2.start()
t1.join()
t2.join()
print(f"Threaded: {time.time() - start:.2f}s")  # ~2.0s (same!)

I/O-Bound: Threading Helps

import time
import threading

def io_work():
    time.sleep(1)  # Simulates I/O, releases GIL

# Sequential
start = time.time()
io_work()
io_work()
print(f"Sequential: {time.time() - start:.2f}s")  # ~2.0s

# Threaded (IS faster)
start = time.time()
t1 = threading.Thread(target=io_work)
t2 = threading.Thread(target=io_work)
t1.start()
t2.start()
t1.join()
t2.join()
print(f"Threaded: {time.time() - start:.2f}s")  # ~1.0s

GIL Release Points

The GIL is released during:

# I/O operations
open("file.txt").read()      # GIL released
socket.recv(1024)            # GIL released
time.sleep(1)                # GIL released

# C extensions (if they release it)
import numpy as np
np.dot(large_array, large_array)  # GIL released

# Explicit release
import ctypes
ctypes.pythonapi.Py_BEGIN_ALLOW_THREADS  # Manual release

Working Around the GIL

Solution 1: Multiprocessing

from concurrent.futures import ProcessPoolExecutor
import multiprocessing

def cpu_work(n):
    return sum(i * i for i in range(n))

# True parallelism — each process has own GIL
with ProcessPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
    results = list(executor.map(cpu_work, [10_000_000] * 4))

Solution 2: NumPy/C Extensions

import numpy as np

# NumPy releases GIL for many operations
def vectorized_work():
    arr = np.arange(10_000_000)
    return np.sum(arr ** 2)  # Runs in C, no GIL

# Threading DOES help here
import threading
t1 = threading.Thread(target=vectorized_work)
t2 = threading.Thread(target=vectorized_work)
t1.start(); t2.start()
t1.join(); t2.join()
# Faster than sequential!

Solution 3: Cython with nogil

# compute.pyx
from cython.parallel import prange

def parallel_sum(int n):
    cdef long total = 0
    cdef int i

    with nogil:  # Release GIL
        for i in prange(n):
            total += i * i

    return total

Solution 4: Alternative Interpreters

  • PyPy — Has GIL but often faster
  • Jython — No GIL (but Python 2 only)
  • IronPython — No GIL (but Python 2 only)
  • GraalPy — Experimental, potentially no GIL

Experimental

Sub-interpreters and no-GIL builds (Python 3.13t) are experimental. These APIs may change and should not be used in production code.

Python 3.12+ Sub-Interpreters

Python 3.12 introduced per-interpreter GIL:

# Each interpreter has its own GIL
import interpreters

interp1 = interpreters.create()
interp2 = interpreters.create()

# Can run CPU-bound code in parallel
# (Still experimental)

Python 3.13+ Free-Threaded Mode

Python 3.13 introduces experimental GIL-free builds:

# Build Python without GIL
./configure --disable-gil
make

# Or use pre-built
python3.13t script.py  # 't' for threaded/no-GIL

Caveats: - Experimental - Some C extensions may not work - Performance implications vary

Practical Guidelines

When Threading Works

# I/O operations
def download_files(urls):
    with ThreadPoolExecutor(max_workers=10) as executor:
        executor.map(download, urls)  # GIL released during download

# Mixed I/O and light CPU
def process_requests(requests):
    with ThreadPoolExecutor() as executor:
        executor.map(handle_request, requests)  # Mostly I/O

When to Use Multiprocessing

# CPU-intensive pure Python
def compute_heavy(data):
    # Pure Python computation
    return sum(x ** 2 for x in data)

# Use processes, not threads
with ProcessPoolExecutor() as executor:
    results = executor.map(compute_heavy, data_chunks)

When to Use Asyncio

# Many concurrent I/O operations
async def fetch_all(urls):
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, url) for url in urls]
        return await asyncio.gather(*tasks)

# Single thread, no GIL concerns
asyncio.run(fetch_all(urls))

Summary

Situation GIL Impact Solution
I/O-bound GIL released Threading or asyncio
CPU-bound Python GIL limits parallelism Multiprocessing
CPU-bound NumPy GIL released in C Threading works
Many small tasks GIL contention Asyncio

Key Takeaway: The GIL prevents threads from parallelizing CPU-bound Python code, but doesn't affect I/O-bound code or code that releases the GIL (C extensions).