Storage¶

Overview of Sartiq's storage infrastructure, including content storage (Cloudflare R2), dataset storage (AWS S3 + LakeFS), and local development (MinIO).

Storage Architecture¶

flowchart TB
    subgraph Application["Application Layer"]
        Backend[Backend API]
        Compute[Compute Server]
        Webapp[Web App]
    end

    subgraph ContentStorage["Content Storage · Cloudflare"]
        R2[(R2 Storage)]
        CDN[CDN + Image Transformations]
    end

    subgraph LocalDev["Local Development"]
        MinIO[(MinIO)]
    end

    subgraph DataStorage["Dataset Storage · AWS"]
        S3[(S3 Buckets)]
        LakeFS[LakeFS]
        DynamoDB[(DynamoDB)]
    end

    Webapp -->|presigned PUT| R2
    Backend -->|S3 API| R2
    Compute -->|S3 API| R2
    R2 --> CDN
    CDN --> Webapp

    Backend -.->|dev| MinIO
    Compute -.->|dev| MinIO

    LakeFS --> S3
    LakeFS --> DynamoDB

Cloudflare R2 (Content Storage)¶

All generated content and user uploads are stored in Cloudflare R2 and served through the CDN.

Why R2?¶

No egress fees — Cost-effective for high-traffic content delivery
S3-compatible — Works with existing S3 tooling and boto3
Global CDN — Automatic edge caching via Cloudflare

Bucket Structure¶

{R2_BUCKET_NAME}/
├── media/{resource_id}/                           # Canonical path (all new files)
│   └── file.{ext}
│
├── temp/                                          # Temporary presigned uploads (24h TTL)
│   └── {file_id}_{timestamp}_{safe_filename}
│
├── images/                                        # LEGACY: pre-existing data only
│   ├── products/{product_id}/
│   │   └── product_{product_id}_{sku}_{uuid}_{image_type}.{ext}
│   ├── styles/{style_id}/
│   │   └── style_{style_id}_{uuid}_{image_type}.{ext}
│   ├── shot_types/{shot_type_id}/
│   │   └── shot_type_{shot_type_id}_{uuid}_{image_type}.{ext}
│   ├── organizations/{organization_id}/
│   │   └── organization_{organization_id}_{uuid}_{image_type}.{ext}
│   ├── subjects/
│   │   └── subject_{slug}_{subject_id}_{uuid}_{image_type}.{ext}
│   ├── generations/{generation_id}/
│   │   └── prediction_{prediction_id}_{sku}_{uuid}_{image_type}.{ext}
│   ├── shots/{shot_id}/revisions/
│   │   └── revision_{revision_id}_{uuid}.{ext}
│   └── temp/
│       └── temp_{uuid}.{ext}
│
├── shooting_looks/                                # LEGACY: pre-existing data only
│   └── {look_id}/
│       └── outfit_{look_id}_{uuid}.{ext}
│
└── compute/                                       # Compute server workspace
    └── {task_type}s/{task_id}/
        └── result_{index}.webp

Key Naming Conventions¶

Entity	Pattern	Example
MediaResource (canonical)	`media/{resource_id}/file.{ext}`	`media/a1b2c3d4-e5f6-7890-abcd-ef1234567890/file.webp`
Product (legacy)	`images/products/{product_id}/product_{product_id}_{sku}_{uuid}_{image_type}.{ext}`	`images/products/abc123/product_abc123_SKU001_d4e5f6_cover.webp`
Subject (legacy)	`images/subjects/subject_{slug}_{subject_id}_{uuid}_{image_type}.{ext}`	`images/subjects/subject_anna_def456_a1b2c3_cover.png`
Generation (legacy)	`images/generations/{generation_id}/prediction_{prediction_id}_{sku}_{uuid}_{image_type}.{ext}`	`images/generations/gen123/prediction_pred456_SKU001_f7g8h9_result.webp`
Shot Revision (legacy)	`images/shots/{shot_id}/revisions/revision_{revision_id}_{uuid}.{ext}`	`images/shots/shot789/revisions/revision_rev012_a1b2c3.webp`
Shooting Look (legacy)	`shooting_looks/{look_id}/outfit_{look_id}_{uuid}.{ext}`	`shooting_looks/look345/outfit_look345_d4e5f6.webp`
Compute Result	`compute/{task_type}s/{task_id}/result_{index}.webp`	`compute/generations/task789/result_0.webp`
Temp Upload	`temp/{file_id}_{timestamp}_{safe_filename}`	`temp/abc123_1706000000_product-001.jpg`

Compute Server Isolation¶

The Compute Server stores all intermediate and final results under the compute/ key prefix. This provides clear separation between compute workspace and permanent content:

Compute Server writes to compute/{task_type}s/{task_id}/...
Backend ingests compute results via MediaResourceService.ingest_from_storage_key(relocate=True), copying to the canonical media/{resource_id}/file.{ext} path and marking the intermediate compute/ key for deferred deletion
The compute/ prefix is automatically added by the R2ImageStorage class in the Compute Server
See MediaResource Lifecycle for full ingestion details

Lifecycle Policies¶

Lifecycle policies for objects are configured on Cloudflare R2 webUI. Each bucket (staging and production) has its own retention times.

Path	Retention	Notes
`temp/*`	Depending on environment	Presigned upload staging area, auto-deleted
`compute/*`	Depending on environment	Compute results area, auto-deleted after more time to preserve observability
`media/*`	Permanent	Customer content (canonical MediaResource paths; orphan scanner may delete unattached resources after age threshold)
`images/*`	Permanent	Customer content (legacy paths for pre-existing data)
`shooting_looks/*`	Permanent	Customer content (legacy paths for pre-existing data)

Content Delivery¶

CDN Configuration¶

Content is served through Cloudflare CDN with aggressive caching:

Content Type	Cache-Control	Notes
Generated images	`public, max-age=31536000, immutable`	1 year, content-addressed URLs
Product images	`public, max-age=31536000, immutable`	1 year, content-addressed URLs
All stored images	`public, max-age=31536000, immutable`	1 year
API responses	No cache	Dynamic content

Cloudflare Image Transformations¶

In production, the Backend uses Cloudflare Image Transformations for on-the-fly format conversion via the ?type= query parameter on the file serving endpoint:

Parameter	Values	Description
`type`	`webp`, `jpeg`, `png`, `avif`, `gif`	Target format for conversion

The Backend constructs transformation URLs in the format:

/cdn-cgi/image/format={type},quality=90/{file_path}

This is only active when CLOUDFLARE_IMAGE_TRANSFORMATIONS_ENABLED=true. In local development (MinIO), the file is served directly without transformation.

URL Resolution¶

Files are stored with relative paths in the database (e.g., images/products/abc123/product_abc123_SKU001_d4e5f6_cover.webp). These are resolved to full CDN URLs at serving time:

Environment	URL Pattern
Production	`https://media.sartiq.com/{file_path}`
Staging	`https://staging-media.sartiq.com/{file_path}`
Development	`http://localhost:9002/shootify-media-dev/{file_path}`

The Backend's file serving endpoint (GET /files/{file_path}) returns a 302 redirect to the appropriate CDN/MinIO URL.

AWS S3 + LakeFS (Dataset Storage)¶

S3 is used exclusively for dataset storage and data versioning with LakeFS. This remains on AWS (eu-central-1) with DynamoDB as the LakeFS metadata store.

S3 Buckets (eu-central-1)¶

Bucket	Purpose
`garment-accuracy`	LakeFS backing store for garment accuracy project
`shootify-datasets`	Training and evaluation datasets

LakeFS¶

LakeFS provides Git-like versioning for data assets, backed by S3 for object storage and DynamoDB for metadata.

Version control for datasets
Branching for experiments
Rollback capabilities

Resource	Details
UI/API	https://lakefs.sartiq.com
Backing Store	`s3://garment-accuracy` (eu-central-1)
Metadata Store	AWS DynamoDB (eu-central-1)
Host	AWS-LakeFS (`18.199.125.185`)

Infrastructure Overview — Cloud providers and architecture
Data Flows — How data moves through the system
Product Ingestion — Upload flow details