Skip to content

Computer Vision Techniques

This reference covers core computer vision concepts used across the Sartiq pipeline -- from edge detection and frequency analysis to compression and compositing techniques.

The sections below follow a deliberate progression. We start with edge detection in the spatial domain (Sobel), then generalize to frequency-domain analysis where convolution becomes simple multiplication. Next, Singular Value Decomposition offers a data-adaptive alternative to fixed-basis frequency decomposition, with direct applications in compression. Finally, blurring and feathering bring these ideas together in the practical context of image compositing.


Sobel Gradient

What is Edge Detection?

Edges are regions in an image where pixel intensity changes sharply. Detecting them is fundamental: edges encode the structural information of a scene (object boundaries, texture changes, depth discontinuities) while discarding uniform regions that carry less semantic value.

The Sobel operator approximates the first derivative of image intensity using two 3x3 convolution kernels -- one for horizontal changes and one for vertical changes.

The Sobel Kernels

The horizontal and vertical gradient kernels are:

\[ G_x = \begin{bmatrix} -1 & 0 & +1 \\ -2 & 0 & +2 \\ -1 & 0 & +1 \end{bmatrix} \qquad G_y = \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ +1 & +2 & +1 \end{bmatrix} \]

Each kernel is the outer product of a smoothing vector \([1, 2, 1]^T\) and a differencing vector \([-1, 0, 1]\). The smoothing component suppresses noise in the direction perpendicular to the derivative, which is what distinguishes Sobel from the simpler Prewitt operator (which uses \([1, 1, 1]^T\) instead).

Gradient Magnitude and Direction

After convolving the image \(I\) with both kernels, the gradient magnitude and direction at each pixel are:

\[ G = \sqrt{G_x^2 + G_y^2} \qquad \theta = \arctan\!\left(\frac{G_y}{G_x}\right) \]

In practice, \(|G_x| + |G_y|\) is often used as a faster approximation of the magnitude.

Comparison with Other Edge Detectors

Operator Kernel Size Noise Handling Multi-stage Notes
Prewitt 3x3 Low No Uniform smoothing weights
Sobel 3x3 Moderate No Weighted smoothing (center-emphasized)
Scharr 3x3 Better No Optimized rotational symmetry
Canny Variable High Yes Gaussian smoothing + non-max suppression + hysteresis thresholding

Canny is generally superior for clean edge maps, but Sobel is preferred when you need raw gradient magnitude (e.g., for feeding into further processing stages) because it is a single, predictable convolution step.

Code Example

import cv2
import numpy as np

def compute_sobel_edges(image_path: str, ksize: int = 3) -> np.ndarray:
    """Compute Sobel edge magnitude from a grayscale image."""
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

    grad_x = cv2.Sobel(img, cv2.CV_64F, 1, 0, ksize=ksize)
    grad_y = cv2.Sobel(img, cv2.CV_64F, 0, 1, ksize=ksize)

    magnitude = np.sqrt(grad_x**2 + grad_y**2)
    magnitude = np.clip(magnitude / magnitude.max() * 255, 0, 255).astype(np.uint8)
    return magnitude

/// details | Manual convolution with NumPy

import numpy as np
from scipy.signal import convolve2d

def sobel_manual(img: np.ndarray) -> np.ndarray:
    """Sobel edge detection using raw convolution."""
    kx = np.array([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], dtype=np.float64)
    ky = np.array([[-1, -2, -1], [0, 0, 0], [1, 2, 1]], dtype=np.float64)

    gx = convolve2d(img, kx, mode="same", boundary="symm")
    gy = convolve2d(img, ky, mode="same", boundary="symm")
    return np.hypot(gx, gy)
///

Use Cases in the Pipeline

  • Quality assessment -- gradient magnitude statistics (mean, variance) serve as sharpness indicators for uploaded images.
  • Feature alignment -- edge maps are used as inputs to registration algorithms when aligning composited layers.
  • Mask refinement -- Sobel edges help detect where an auto-generated mask diverges from the true object boundary (see also Feathering for how these edges inform alpha blending).
  • Garment segmentation -- edge-refined masks feed into K-Means clustering, which partitions the masked region into coherent garment parts.

From spatial convolution to frequency analysis -- The Sobel operator works by convolving the image with a small kernel, which is equivalent to multiplying by a specific transfer function in the frequency domain. The next section generalizes this idea: instead of a single derivative kernel, we examine the full frequency spectrum of an image and show how arbitrary filters can be applied through element-wise multiplication.

Frequencies in Images

Spatial Frequency

An image can be decomposed into sinusoidal components of varying frequency, amplitude, and orientation -- just like a 1D signal. Low-frequency components correspond to smooth, slowly-varying regions (sky, skin), while high-frequency components correspond to rapid intensity changes (edges, textures, noise).

The 2D Discrete Fourier Transform

The DFT converts an image \(f(x, y)\) of size \(M \times N\) into its frequency-domain representation \(F(u, v)\):

\[ F(u, v) = \sum_{x=0}^{M-1} \sum_{y=0}^{N-1} f(x, y) \, e^{-j2\pi\left(\frac{ux}{M} + \frac{vy}{N}\right)} \]

The result is a complex-valued matrix. Its magnitude spectrum \(|F(u, v)|\) shows how much energy exists at each spatial frequency, and its phase spectrum \(\angle F(u, v)\) encodes the spatial position of those frequency components.

A key insight: phase carries more structural information than magnitude. Swapping the phase of two images while keeping their magnitudes produces results that look like the phase-donor, not the magnitude-donor.

Frequency-Domain Filtering

Filtering in the frequency domain is element-wise multiplication:

\[ G(u, v) = H(u, v) \cdot F(u, v) \]

where \(H(u, v)\) is the filter transfer function. Common filters:

Filter Effect Transfer Function
Ideal low-pass Remove all frequencies above cutoff \(D_0\) \(H = 1\) if \(D(u,v) \leq D_0\), else \(0\)
Gaussian low-pass Smooth blur, no ringing artifacts \(H = e^{-D^2(u,v) / 2D_0^2}\)
Butterworth low-pass Tunable rolloff via order \(n\) \(H = \frac{1}{1 + [D(u,v)/D_0]^{2n}}\)
High-pass Retain edges, remove smooth regions \(H_{\text{HP}} = 1 - H_{\text{LP}}\)
Band-pass Isolate specific frequency range \(H_{\text{BP}} = H_{\text{LP},D_1} - H_{\text{LP},D_0}\)

Here \(D(u,v) = \sqrt{(u - M/2)^2 + (v - N/2)^2}\) is the distance from the center of the shifted spectrum.

Code Example

import numpy as np
import cv2

def frequency_filter(
    image: np.ndarray,
    cutoff: float,
    filter_type: str = "low",
) -> np.ndarray:
    """Apply a Gaussian frequency-domain filter to a grayscale image."""
    rows, cols = image.shape
    crow, ccol = rows // 2, cols // 2

    # Compute centered DFT
    dft = np.fft.fft2(image.astype(np.float64))
    dft_shift = np.fft.fftshift(dft)

    # Build Gaussian transfer function
    u = np.arange(rows).reshape(-1, 1) - crow
    v = np.arange(cols).reshape(1, -1) - ccol
    d_sq = u**2 + v**2
    h = np.exp(-d_sq / (2 * cutoff**2))

    if filter_type == "high":
        h = 1.0 - h

    # Apply filter and inverse transform
    filtered = dft_shift * h
    result = np.fft.ifft2(np.fft.ifftshift(filtered))
    return np.abs(result).astype(np.uint8)

Practical Applications

  • Denoising -- A low-pass filter removes high-frequency noise while preserving overall structure.
  • Sharpening -- Adding a scaled high-pass component back to the original image emphasizes edges: \(I_{\text{sharp}} = I + \alpha \cdot I_{\text{HP}}\).
  • Compression -- JPEG works by transforming 8x8 blocks into the frequency domain (via DCT, a real-valued relative of the DFT) and quantizing high-frequency coefficients more aggressively, since the human visual system is less sensitive to them.
  • Periodic noise removal -- Narrow notch filters in the frequency domain can surgically remove periodic patterns (e.g., moire artifacts from scanning) without affecting the rest of the image.

From fixed bases to data-adaptive decomposition -- The Fourier transform decomposes an image into a fixed sinusoidal basis. SVD takes a different approach: it discovers a basis tailored to the specific image, ordering components by how much variance (energy) they capture. Where frequency-domain compression (e.g., JPEG's DCT) quantizes high-frequency coefficients, SVD achieves compression by discarding low-energy singular components -- a strategy that adapts naturally to image content.

Singular Value Decomposition (SVD)

Mathematical Foundation

Any real matrix \(A\) of size \(m \times n\) can be factored as:

\[ A = U \Sigma V^T \]

where:

  • \(U\) is an \(m \times m\) orthogonal matrix (left singular vectors -- column-space basis)
  • \(\Sigma\) is an \(m \times n\) diagonal matrix of singular values \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0\)
  • \(V^T\) is an \(n \times n\) orthogonal matrix (right singular vectors -- row-space basis)
  • \(r = \text{rank}(A)\)

Geometric Interpretation

SVD reveals that any linear transformation can be decomposed into three steps:

  1. Rotate (by \(V^T\)) -- align the input with the principal axes of the transformation
  2. Scale (by \(\Sigma\)) -- stretch or compress along each axis by the corresponding singular value
  3. Rotate (by \(U\)) -- orient the output into the final coordinate system

For images, each singular value represents the "importance" of a corresponding spatial pattern. The first few singular values typically capture the dominant structure; the remaining ones encode fine detail and noise.

Image Compression via Truncated SVD

A rank-\(k\) approximation keeps only the \(k\) largest singular values:

\[ A_k = \sum_{i=1}^{k} \sigma_i \, \mathbf{u}_i \, \mathbf{v}_i^T \]

By the Eckart-Young theorem, \(A_k\) is the closest rank-\(k\) matrix to \(A\) in both the Frobenius and spectral norms.

Storage analysis: The original \(m \times n\) image requires \(mn\) values. The rank-\(k\) approximation stores \(k(m + n + 1)\) values. Compression is achieved when \(k \ll \frac{mn}{m + n + 1}\).

Rank vs. Reconstruction Quality

Rank \(k\) (% of max) Typical PSNR Visual Quality
1-5% 20-25 dB Major structure only, heavy artifacts
10-20% 28-35 dB Recognizable, some loss of detail
30-50% 35-42 dB Good quality, minor softening
80-100% 42+ dB Near-lossless to lossless

Relationship to PCA

PCA on a data matrix \(X\) (centered, with samples as rows) computes the eigendecomposition of the covariance matrix \(X^T X\). SVD of \(X\) directly yields the same result: the right singular vectors \(V\) are the principal components, and the singular values relate to eigenvalues by \(\lambda_i = \sigma_i^2 / (n - 1)\). SVD is numerically more stable than explicitly forming \(X^T X\). Note the parallel with Frequencies in Images: both PCA/SVD and Fourier analysis decompose data into ordered components -- one using a data-driven basis, the other a fixed sinusoidal basis.

Code Example

import numpy as np
from PIL import Image

def svd_compress(image_path: str, rank: int) -> np.ndarray:
    """Compress a grayscale image using truncated SVD."""
    img = np.array(Image.open(image_path).convert("L"), dtype=np.float64)

    u, s, vt = np.linalg.svd(img, full_matrices=False)

    # Truncate to rank-k
    compressed = u[:, :rank] @ np.diag(s[:rank]) @ vt[:rank, :]
    return np.clip(compressed, 0, 255).astype(np.uint8)


def svd_compress_color(image_path: str, rank: int) -> np.ndarray:
    """Compress a color image by applying truncated SVD per channel."""
    img = np.array(Image.open(image_path), dtype=np.float64)
    result = np.zeros_like(img)

    for c in range(3):
        u, s, vt = np.linalg.svd(img[:, :, c], full_matrices=False)
        result[:, :, c] = u[:, :rank] @ np.diag(s[:rank]) @ vt[:rank, :]

    return np.clip(result, 0, 255).astype(np.uint8)

From decomposition to compositing -- The previous sections explored how to analyze images by breaking them into components (gradients, frequencies, singular values). The final section turns to synthesis: how blurring and feathering -- both forms of controlled information reduction -- are used to produce seamless composites. Blurring operates on pixel intensities via spatial averaging (a low-pass filter, as described in Frequencies in Images), while feathering applies smooth transitions to the alpha channel, often guided by edge information from operators like Sobel.

Blurring vs Feathering

These two operations are often confused because both produce "soft" visual results, but they serve fundamentally different purposes and operate on different principles.

Blurring

Blurring replaces each pixel with a weighted average of its neighbors. It is a convolution operation applied to pixel intensities, reducing high-frequency content uniformly across the image (or a region of it).

Gaussian Blur

The most common blur. The kernel is derived from the 2D Gaussian function:

\[ G(x, y) = \frac{1}{2\pi\sigma^2} \, e^{-\frac{x^2 + y^2}{2\sigma^2}} \]

The parameter \(\sigma\) controls the blur radius. A larger \(\sigma\) produces a wider kernel and stronger smoothing. The kernel size is typically chosen as \(\lceil 6\sigma \rceil\) (rounded to an odd number) to capture 99.7% of the distribution.

import cv2

blurred = cv2.GaussianBlur(image, ksize=(0, 0), sigmaX=3.0)

Box Blur (Average Filter)

Every pixel in the kernel has equal weight. Fast to compute (separable, and can be implemented with running sums), but produces visible boxy artifacts on strong edges.

box_blurred = cv2.blur(image, ksize=(7, 7))

Median Blur

A non-linear filter: replaces each pixel with the median of its neighborhood. Uniquely effective at removing salt-and-pepper noise while preserving edges, because the median is robust to outliers.

median_blurred = cv2.medianBlur(image, ksize=5)

Motion Blur

Simulates camera movement by applying a directional 1D kernel. The kernel is a line of uniform weights at a given angle and length.

import numpy as np
import cv2

def motion_blur(image: np.ndarray, size: int, angle: float) -> np.ndarray:
    """Apply directional motion blur."""
    kernel = np.zeros((size, size), dtype=np.float64)
    kernel[size // 2, :] = 1.0 / size

    # Rotate kernel to desired angle
    center = (size // 2, size // 2)
    rot_matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
    kernel = cv2.warpAffine(kernel, rot_matrix, (size, size))
    kernel /= kernel.sum()

    return cv2.filter2D(image, -1, kernel)

Feathering

Feathering creates a smooth alpha (opacity) transition at the boundary of a mask or selection. Unlike blurring, which modifies pixel color values, feathering modifies the alpha channel -- it controls how opaque or transparent each pixel is, typically as a function of distance from an edge.

Purpose

When compositing a foreground object onto a background, a hard binary mask produces jarring visible seams. Feathering softens the mask edges so the foreground blends gradually into the background, producing natural-looking composites.

Types of Feathering

Gaussian feathering -- The alpha falloff follows a Gaussian curve based on distance from the mask edge:

\[ \alpha(d) = e^{-\frac{d^2}{2\sigma^2}} \]

where \(d\) is the signed distance from the edge (positive inside, negative outside). This is the most common type and produces the most natural falloff.

import cv2
import numpy as np

def gaussian_feather(mask: np.ndarray, radius: float) -> np.ndarray:
    """Apply Gaussian feathering to a binary mask."""
    # Distance transform gives distance from nearest zero pixel
    dist = cv2.distanceTransform(mask, cv2.DIST_L2, 5)
    alpha = 1.0 - np.exp(-(dist**2) / (2 * radius**2))
    # Invert: we want falloff at edges, not center
    feathered = np.where(mask > 0, 1.0 - np.exp(-dist**2 / (2 * radius**2)), 0.0)
    return (feathered * 255).astype(np.uint8)

Linear feathering -- Alpha increases linearly with distance from the edge up to a maximum feather width \(w\):

\[ \alpha(d) = \min\!\left(\frac{d}{w},\; 1\right) \]

Simple and predictable, but the transition can appear slightly abrupt at \(d = w\).

Cosine / sigmoid feathering -- Uses an S-curve for a perceptually smoother transition:

\[ \alpha(d) = \frac{1}{2}\left(1 + \cos\!\left(\pi \cdot \frac{\max(0, w - d)}{w}\right)\right) \]

or a logistic sigmoid. The S-curve avoids the hard cutoff of linear feathering and the long tails of Gaussian feathering.

Directional feathering -- The feather width varies by direction (anisotropic). Useful when the compositing context requires different falloff rates -- for example, a shadow that should feather more horizontally than vertically.

Key Differences

Aspect Blurring Feathering
Operates on Pixel color/intensity values Alpha (opacity) channel
Purpose Reduce detail, suppress noise Smooth transitions at mask edges
Method Convolution with averaging kernel Distance-based alpha gradient
Affects Entire image or selected region Only the boundary of a selection
Result Softer, less detailed image Seamless composite blending
Reversible? Not easily (information is lost) Yes (alpha is separate from color data)

Compositing with Feathering

The standard alpha-blending formula used when compositing a foreground \(F\) over a background \(B\) with feathered alpha \(\alpha\):

\[ C = \alpha \cdot F + (1 - \alpha) \cdot B \]

In the Sartiq pipeline, feathering is applied to auto-generated masks before compositing to prevent hard edges from appearing in the final output. The feather radius is typically scaled relative to the mask's bounding box dimensions to remain visually consistent across different image resolutions.

Use Cases

  • Mask refinement -- Feathering the output of segmentation models to avoid aliased boundaries, often using edge maps from the Sobel operator to identify boundary regions.
  • Layer compositing -- Blending inpainted regions with the original image so edits are invisible.
  • Vignette effects -- Radial feathering of an elliptical mask to darken image corners.
  • Selective blur -- Combining both techniques: blurring a region and then feathering the blur mask so the effect fades gradually rather than stopping abruptly.

From continuous gradients to perceptual illusion -- Feathering produces smooth, continuous alpha gradients, but many output formats and devices require discrete levels (binary masks, limited color palettes, CMYK halftones). Dithering bridges this gap: by strategically distributing quantization error, it creates the perceptual illusion of continuous tone using only discrete values. The human visual system's low-pass filtering (see Frequencies in Images) averages nearby pixels, allowing carefully placed dots to simulate shades that don't physically exist in the output.

Dithering

What is Dithering?

Dithering is a technique for approximating continuous tones using a limited set of discrete values. When an image must be reduced to fewer colors (or even just two, as in binary output), simply rounding each pixel to the nearest available value produces visible banding -- smooth gradients become stair-stepped contours.

Dithering adds controlled noise or patterns to distribute quantization error spatially, trading amplitude resolution for spatial resolution. At normal viewing distances, the eye's spatial averaging produces the perception of intermediate tones that the output medium cannot directly represent.

The technique has roots in telecommunications (where it reduced quantization distortion in audio) and became essential in computer graphics for displaying photographic images on early displays with limited color depth.

Quantization and Error

When reducing bit depth, each pixel value \(x\) is mapped to the nearest representable level:

\[ Q(x) = \text{round}\!\left(\frac{x}{L}\right) \cdot L \]

where \(L\) is the step size between quantization levels. For \(n\) output levels spanning \([0, 255]\), we have \(L = 255 / (n - 1)\).

The quantization error at each pixel is:

\[ e = x - Q(x) \]

Without dithering, this error accumulates visually as banding. Dithering strategies differ in how they handle this error:

Strategy Error Handling
Random dithering Add random noise before quantization; error is uncorrelated
Ordered dithering Compare against spatially-varying threshold; error is patterned
Error diffusion Propagate error to neighboring unprocessed pixels; error is redistributed

Ordered Dithering (Bayer Matrix)

Ordered dithering compares each pixel against a spatially-varying threshold from a Bayer matrix (also called an index matrix or dither matrix). The matrix tiles across the image, creating a regular pattern that, when viewed from a distance, approximates continuous tone.

The Bayer Matrix

The 2x2 Bayer matrix is:

\[ M_2 = \frac{1}{4} \begin{bmatrix} 0 & 2 \\ 3 & 1 \end{bmatrix} \]

The 4x4 matrix is:

\[ M_4 = \frac{1}{16} \begin{bmatrix} 0 & 8 & 2 & 10 \\ 12 & 4 & 14 & 6 \\ 3 & 11 & 1 & 9 \\ 15 & 7 & 13 & 5 \end{bmatrix} \]

These matrices are constructed to maximize the spatial frequency of the threshold pattern, ensuring that quantization artifacts appear as fine texture rather than coarse bands.

Recursive Construction

Larger Bayer matrices can be built recursively:

\[ M_{2n} = \frac{1}{4} \begin{bmatrix} 4M_n & 4M_n + 2U_n \\ 4M_n + 3U_n & 4M_n + U_n \end{bmatrix} \]

where \(U_n\) is an \(n \times n\) matrix of ones. This produces matrices of size \(2^k \times 2^k\) with \(4^k\) distinct threshold levels.

Algorithm

For each pixel at position \((x, y)\) with normalized intensity \(I \in [0, 1]\):

  1. Look up the threshold: \(t = M[x \mod n, y \mod n]\)
  2. Output black if \(I < t\), white otherwise (for binary dithering)

For multi-level output, the threshold modulates the rounding decision:

\[ Q_{\text{dither}}(x) = \left\lfloor \frac{x}{L} + t - 0.5 \right\rfloor \cdot L \]

Code Example

import numpy as np
from PIL import Image
from typing import Optional


def bayer_matrix(n: int) -> np.ndarray:
    """Generate an n x n Bayer threshold matrix (n must be a power of 2)."""
    if n == 1:
        return np.array([[0]])

    smaller = bayer_matrix(n // 2)
    ones = np.ones_like(smaller)

    top_left = 4 * smaller
    top_right = 4 * smaller + 2 * ones
    bottom_left = 4 * smaller + 3 * ones
    bottom_right = 4 * smaller + ones

    result = np.block([[top_left, top_right], [bottom_left, bottom_right]])
    return result


def ordered_dither(
    image: np.ndarray,
    levels: int = 2,
    matrix_size: int = 4,
) -> np.ndarray:
    """Apply ordered dithering with a Bayer matrix."""
    img = image.astype(np.float64) / 255.0
    h, w = img.shape[:2]

    # Generate and normalize threshold matrix
    bayer = bayer_matrix(matrix_size)
    bayer = (bayer + 0.5) / (matrix_size * matrix_size)

    # Tile threshold matrix to image size
    threshold = np.tile(bayer, (h // matrix_size + 1, w // matrix_size + 1))
    threshold = threshold[:h, :w]

    # Apply dithering
    step = 1.0 / (levels - 1)
    dithered = np.floor((img + threshold * step - step / 2) / step) * step
    dithered = np.clip(dithered, 0, 1)

    return (dithered * 255).astype(np.uint8)

Error Diffusion (Floyd-Steinberg)

Unlike ordered dithering, which uses a fixed threshold pattern, error diffusion propagates the quantization error from each pixel to its unprocessed neighbors. This produces more natural-looking results with less visible patterning.

The Floyd-Steinberg Kernel

The most widely used error diffusion algorithm distributes error according to this kernel:

\[ \frac{1}{16} \begin{bmatrix} \cdot & \ast & 7 \\ 3 & 5 & 1 \end{bmatrix} \]

where \(\ast\) marks the current pixel. The error is distributed:

  • 7/16 to the right neighbor
  • 3/16 to the bottom-left
  • 5/16 to the bottom
  • 1/16 to the bottom-right

Serpentine Scanning

Processing strictly left-to-right on every row can introduce directional bias artifacts. Serpentine scanning alternates direction each row (left-to-right, then right-to-left), distributing error more evenly and reducing visible diagonal patterns.

Algorithm

import numpy as np
from PIL import Image


def floyd_steinberg_dither(
    image: np.ndarray,
    levels: int = 2,
    serpentine: bool = True,
) -> np.ndarray:
    """Apply Floyd-Steinberg error diffusion dithering."""
    img = image.astype(np.float64)
    h, w = img.shape[:2]

    # Quantization step
    step = 255.0 / (levels - 1)

    for y in range(h):
        # Determine scan direction
        if serpentine and y % 2 == 1:
            x_range = range(w - 1, -1, -1)
            right, left = -1, 1
        else:
            x_range = range(w)
            right, left = 1, -1

        for x in x_range:
            old_val = img[y, x]
            new_val = np.round(old_val / step) * step
            new_val = np.clip(new_val, 0, 255)
            img[y, x] = new_val

            error = old_val - new_val

            # Distribute error to neighbors
            if 0 <= x + right < w:
                img[y, x + right] += error * 7 / 16
            if y + 1 < h:
                if 0 <= x + left < w:
                    img[y + 1, x + left] += error * 3 / 16
                img[y + 1, x] += error * 5 / 16
                if 0 <= x + right < w:
                    img[y + 1, x + right] += error * 1 / 16

    return np.clip(img, 0, 255).astype(np.uint8)

Comparison of Dithering Algorithms

Algorithm Pattern Error Handling Speed Best For
Random None (noise) Per-pixel, uncorrelated Fast Quick previews, noise texturing
Ordered (Bayer) Periodic grid Threshold only Fast Real-time rendering, GPU/parallel
Floyd-Steinberg Semi-random Diffused to neighbors Sequential Print, high-quality output
Blue noise Aperiodic Per-pixel, spectrally shaped Pre-gen costly Highest perceptual quality

When to Use Each

  • Ordered dithering is ideal when speed matters or when processing must be parallelizable (each pixel is independent). The visible pattern can be acceptable for artistic effects or when output will be viewed at small sizes.

  • Floyd-Steinberg produces superior results for print and archival purposes. The sequential dependency makes it unsuitable for GPU parallelization, but it remains the standard for quality-critical applications.

  • Blue noise dithering uses a pre-computed noise texture with energy concentrated at high spatial frequencies. This avoids both the visible patterns of ordered dithering and the "worms" sometimes produced by error diffusion. It's the gold standard for perceptual quality but requires generating or storing the noise texture.

/// details | Blue Noise Dithering (Void-and-Cluster) Blue noise dithering uses a pre-computed threshold matrix with carefully distributed values that appear random but avoid both clustering and visible patterns. The void-and-cluster algorithm generates such matrices:

import numpy as np
from scipy.ndimage import gaussian_filter


def void_and_cluster(size: int, sigma: float = 1.5) -> np.ndarray:
    """Generate a blue noise threshold matrix using void-and-cluster."""
    # Initialize with random binary pattern (50% density)
    rng = np.random.default_rng(42)
    pattern = rng.random((size, size)) > 0.5

    # Iteratively remove clusters and fill voids
    threshold = np.zeros((size, size), dtype=np.float64)
    rank = 0

    # Phase 1: Remove clusters (initial pattern has ones)
    while pattern.sum() > 0:
        blurred = gaussian_filter(pattern.astype(np.float64), sigma, mode="wrap")
        blurred[~pattern] = -np.inf
        y, x = np.unravel_index(blurred.argmax(), pattern.shape)
        pattern[y, x] = False
        threshold[y, x] = rank
        rank += 1

    # Phase 2: Fill voids
    while rank < size * size:
        blurred = gaussian_filter(pattern.astype(np.float64), sigma, mode="wrap")
        blurred[pattern] = np.inf
        y, x = np.unravel_index(blurred.argmin(), pattern.shape)
        pattern[y, x] = True
        threshold[y, x] = rank
        rank += 1

    return threshold / (size * size)

In practice, blue noise textures are pre-computed and stored, as generation is expensive. Apply like ordered dithering:

def blue_noise_dither(image: np.ndarray, noise: np.ndarray) -> np.ndarray:
    """Apply blue noise dithering using a pre-computed threshold matrix."""
    h, w = image.shape[:2]
    nh, nw = noise.shape

    # Tile noise to image size
    threshold = np.tile(noise, (h // nh + 1, w // nw + 1))[:h, :w]

    # Normalize image and apply threshold
    normalized = image.astype(np.float64) / 255.0
    return ((normalized > threshold) * 255).astype(np.uint8)
///

Use Cases in the Pipeline

Dithering appears throughout the Sartiq pipeline wherever continuous values must be discretized:

  1. Mask anti-aliasing -- When exporting binary masks (e.g., for downstream models that expect hard edges), dithering the anti-aliased boundary region produces cleaner results than simple thresholding. This preserves the visual smoothness created by feathering even in a 1-bit output.

  2. Thumbnail generation -- Reduced color depth for web display can introduce visible banding on gradients. Ordered dithering adds texture that masks the banding while keeping file sizes small.

  3. Print-ready export -- CMYK printing inherently dithers continuous tone into halftone dots. Pre-dithering in software provides control over the pattern, especially for stylized or artistic output.

  4. Gradient mask binarization -- When a mask created by feathering must be converted to binary (e.g., for formats that don't support alpha), error diffusion preserves the edge softness perceptually.

  5. Limited color depth compositing -- GIF and palette PNG formats support only 256 colors. Dithering is essential for photographic content in these formats, and integrates with the compression strategies discussed in SVD for optimizing quality at a given color budget.