Skip to content

Storage

Overview of Sartiq's storage infrastructure, including content storage (Cloudflare R2), dataset storage (AWS S3 + LakeFS), and local development (MinIO).


Storage Architecture

flowchart TB
    subgraph Application["Application Layer"]
        Backend[Backend API]
        Compute[Compute Server]
        Webapp[Web App]
    end

    subgraph ContentStorage["Content Storage · Cloudflare"]
        R2[(R2 Storage)]
        CDN[CDN + Image Transformations]
    end

    subgraph LocalDev["Local Development"]
        MinIO[(MinIO)]
    end

    subgraph DataStorage["Dataset Storage · AWS"]
        S3[(S3 Buckets)]
        LakeFS[LakeFS]
        DynamoDB[(DynamoDB)]
    end

    Webapp -->|presigned PUT| R2
    Backend -->|S3 API| R2
    Compute -->|S3 API| R2
    R2 --> CDN
    CDN --> Webapp

    Backend -.->|dev| MinIO
    Compute -.->|dev| MinIO

    LakeFS --> S3
    LakeFS --> DynamoDB

Cloudflare R2 (Content Storage)

All generated content and user uploads are stored in Cloudflare R2 and served through the CDN.

Why R2?

  • No egress fees — Cost-effective for high-traffic content delivery
  • S3-compatible — Works with existing S3 tooling and boto3
  • Global CDN — Automatic edge caching via Cloudflare

Bucket Structure

{R2_BUCKET_NAME}/
├── media/{resource_id}/                           # Canonical path (all new files)
│   └── file.{ext}
├── temp/                                          # Temporary presigned uploads (24h TTL)
│   └── {file_id}_{timestamp}_{safe_filename}
├── images/                                        # LEGACY: pre-existing data only
│   ├── products/{product_id}/
│   │   └── product_{product_id}_{sku}_{uuid}_{image_type}.{ext}
│   ├── styles/{style_id}/
│   │   └── style_{style_id}_{uuid}_{image_type}.{ext}
│   ├── shot_types/{shot_type_id}/
│   │   └── shot_type_{shot_type_id}_{uuid}_{image_type}.{ext}
│   ├── organizations/{organization_id}/
│   │   └── organization_{organization_id}_{uuid}_{image_type}.{ext}
│   ├── subjects/
│   │   └── subject_{slug}_{subject_id}_{uuid}_{image_type}.{ext}
│   ├── generations/{generation_id}/
│   │   └── prediction_{prediction_id}_{sku}_{uuid}_{image_type}.{ext}
│   ├── shots/{shot_id}/revisions/
│   │   └── revision_{revision_id}_{uuid}.{ext}
│   └── temp/
│       └── temp_{uuid}.{ext}
├── shooting_looks/                                # LEGACY: pre-existing data only
│   └── {look_id}/
│       └── outfit_{look_id}_{uuid}.{ext}
└── compute/                                       # Compute server workspace
    └── {task_type}s/{task_id}/
        └── result_{index}.webp

Key Naming Conventions

Entity Pattern Example
MediaResource (canonical) media/{resource_id}/file.{ext} media/a1b2c3d4-e5f6-7890-abcd-ef1234567890/file.webp
Product (legacy) images/products/{product_id}/product_{product_id}_{sku}_{uuid}_{image_type}.{ext} images/products/abc123/product_abc123_SKU001_d4e5f6_cover.webp
Subject (legacy) images/subjects/subject_{slug}_{subject_id}_{uuid}_{image_type}.{ext} images/subjects/subject_anna_def456_a1b2c3_cover.png
Generation (legacy) images/generations/{generation_id}/prediction_{prediction_id}_{sku}_{uuid}_{image_type}.{ext} images/generations/gen123/prediction_pred456_SKU001_f7g8h9_result.webp
Shot Revision (legacy) images/shots/{shot_id}/revisions/revision_{revision_id}_{uuid}.{ext} images/shots/shot789/revisions/revision_rev012_a1b2c3.webp
Shooting Look (legacy) shooting_looks/{look_id}/outfit_{look_id}_{uuid}.{ext} shooting_looks/look345/outfit_look345_d4e5f6.webp
Compute Result compute/{task_type}s/{task_id}/result_{index}.webp compute/generations/task789/result_0.webp
Temp Upload temp/{file_id}_{timestamp}_{safe_filename} temp/abc123_1706000000_product-001.jpg

Compute Server Isolation

The Compute Server stores all intermediate and final results under the compute/ key prefix. This provides clear separation between compute workspace and permanent content:

  • Compute Server writes to compute/{task_type}s/{task_id}/...
  • Backend ingests compute results via MediaResourceService.ingest_from_storage_key(relocate=True), copying to the canonical media/{resource_id}/file.{ext} path and marking the intermediate compute/ key for deferred deletion
  • The compute/ prefix is automatically added by the R2ImageStorage class in the Compute Server
  • See MediaResource Lifecycle for full ingestion details

Lifecycle Policies

Lifecycle policies for objects are configured on Cloudflare R2 webUI. Each bucket (staging and production) has its own retention times.

Path Retention Notes
temp/* Depending on environment Presigned upload staging area, auto-deleted
compute/* Depending on environment Compute results area, auto-deleted after more time to preserve observability
media/* Permanent Customer content (canonical MediaResource paths; orphan scanner may delete unattached resources after age threshold)
images/* Permanent Customer content (legacy paths for pre-existing data)
shooting_looks/* Permanent Customer content (legacy paths for pre-existing data)

Content Delivery

CDN Configuration

Content is served through Cloudflare CDN with aggressive caching:

Content Type Cache-Control Notes
Generated images public, max-age=31536000, immutable 1 year, content-addressed URLs
Product images public, max-age=31536000, immutable 1 year, content-addressed URLs
All stored images public, max-age=31536000, immutable 1 year
API responses No cache Dynamic content

Cloudflare Image Transformations

In production, the Backend uses Cloudflare Image Transformations for on-the-fly format conversion via the ?type= query parameter on the file serving endpoint:

Parameter Values Description
type webp, jpeg, png, avif, gif Target format for conversion

The Backend constructs transformation URLs in the format:

/cdn-cgi/image/format={type},quality=90/{file_path}

This is only active when CLOUDFLARE_IMAGE_TRANSFORMATIONS_ENABLED=true. In local development (MinIO), the file is served directly without transformation.

URL Resolution

Files are stored with relative paths in the database (e.g., images/products/abc123/product_abc123_SKU001_d4e5f6_cover.webp). These are resolved to full CDN URLs at serving time:

Environment URL Pattern
Production https://media.sartiq.com/{file_path}
Staging https://staging-media.sartiq.com/{file_path}
Development http://localhost:9002/shootify-media-dev/{file_path}

The Backend's file serving endpoint (GET /files/{file_path}) returns a 302 redirect to the appropriate CDN/MinIO URL.

AWS S3 + LakeFS (Dataset Storage)

S3 is used exclusively for dataset storage and data versioning with LakeFS. This remains on AWS (eu-central-1) with DynamoDB as the LakeFS metadata store.

S3 Buckets (eu-central-1)

Bucket Purpose
garment-accuracy LakeFS backing store for garment accuracy project
shootify-datasets Training and evaluation datasets

LakeFS

LakeFS provides Git-like versioning for data assets, backed by S3 for object storage and DynamoDB for metadata.

  • Version control for datasets
  • Branching for experiments
  • Rollback capabilities
Resource Details
UI/API https://lakefs.sartiq.com
Backing Store s3://garment-accuracy (eu-central-1)
Metadata Store AWS DynamoDB (eu-central-1)
Host AWS-LakeFS (18.199.125.185)