Storage¶
Overview of Sartiq's storage infrastructure, including content storage (Cloudflare R2), dataset storage (AWS S3 + LakeFS), and local development (MinIO).
Storage Architecture¶
flowchart TB
subgraph Application["Application Layer"]
Backend[Backend API]
Compute[Compute Server]
Webapp[Web App]
end
subgraph ContentStorage["Content Storage · Cloudflare"]
R2[(R2 Storage)]
CDN[CDN + Image Transformations]
end
subgraph LocalDev["Local Development"]
MinIO[(MinIO)]
end
subgraph DataStorage["Dataset Storage · AWS"]
S3[(S3 Buckets)]
LakeFS[LakeFS]
DynamoDB[(DynamoDB)]
end
Webapp -->|presigned PUT| R2
Backend -->|S3 API| R2
Compute -->|S3 API| R2
R2 --> CDN
CDN --> Webapp
Backend -.->|dev| MinIO
Compute -.->|dev| MinIO
LakeFS --> S3
LakeFS --> DynamoDB
Cloudflare R2 (Content Storage)¶
All generated content and user uploads are stored in Cloudflare R2 and served through the CDN.
Why R2?¶
- No egress fees — Cost-effective for high-traffic content delivery
- S3-compatible — Works with existing S3 tooling and boto3
- Global CDN — Automatic edge caching via Cloudflare
Bucket Structure¶
{R2_BUCKET_NAME}/
├── media/{resource_id}/ # Canonical path (all new files)
│ └── file.{ext}
│
├── temp/ # Temporary presigned uploads (24h TTL)
│ └── {file_id}_{timestamp}_{safe_filename}
│
├── images/ # LEGACY: pre-existing data only
│ ├── products/{product_id}/
│ │ └── product_{product_id}_{sku}_{uuid}_{image_type}.{ext}
│ ├── styles/{style_id}/
│ │ └── style_{style_id}_{uuid}_{image_type}.{ext}
│ ├── shot_types/{shot_type_id}/
│ │ └── shot_type_{shot_type_id}_{uuid}_{image_type}.{ext}
│ ├── organizations/{organization_id}/
│ │ └── organization_{organization_id}_{uuid}_{image_type}.{ext}
│ ├── subjects/
│ │ └── subject_{slug}_{subject_id}_{uuid}_{image_type}.{ext}
│ ├── generations/{generation_id}/
│ │ └── prediction_{prediction_id}_{sku}_{uuid}_{image_type}.{ext}
│ ├── shots/{shot_id}/revisions/
│ │ └── revision_{revision_id}_{uuid}.{ext}
│ └── temp/
│ └── temp_{uuid}.{ext}
│
├── shooting_looks/ # LEGACY: pre-existing data only
│ └── {look_id}/
│ └── outfit_{look_id}_{uuid}.{ext}
│
└── compute/ # Compute server workspace
└── {task_type}s/{task_id}/
└── result_{index}.webp
Key Naming Conventions¶
| Entity | Pattern | Example |
|---|---|---|
| MediaResource (canonical) | media/{resource_id}/file.{ext} |
media/a1b2c3d4-e5f6-7890-abcd-ef1234567890/file.webp |
| Product (legacy) | images/products/{product_id}/product_{product_id}_{sku}_{uuid}_{image_type}.{ext} |
images/products/abc123/product_abc123_SKU001_d4e5f6_cover.webp |
| Subject (legacy) | images/subjects/subject_{slug}_{subject_id}_{uuid}_{image_type}.{ext} |
images/subjects/subject_anna_def456_a1b2c3_cover.png |
| Generation (legacy) | images/generations/{generation_id}/prediction_{prediction_id}_{sku}_{uuid}_{image_type}.{ext} |
images/generations/gen123/prediction_pred456_SKU001_f7g8h9_result.webp |
| Shot Revision (legacy) | images/shots/{shot_id}/revisions/revision_{revision_id}_{uuid}.{ext} |
images/shots/shot789/revisions/revision_rev012_a1b2c3.webp |
| Shooting Look (legacy) | shooting_looks/{look_id}/outfit_{look_id}_{uuid}.{ext} |
shooting_looks/look345/outfit_look345_d4e5f6.webp |
| Compute Result | compute/{task_type}s/{task_id}/result_{index}.webp |
compute/generations/task789/result_0.webp |
| Temp Upload | temp/{file_id}_{timestamp}_{safe_filename} |
temp/abc123_1706000000_product-001.jpg |
Compute Server Isolation¶
The Compute Server stores all intermediate and final results under the compute/ key prefix. This provides clear separation between compute workspace and permanent content:
- Compute Server writes to
compute/{task_type}s/{task_id}/... - Backend ingests compute results via
MediaResourceService.ingest_from_storage_key(relocate=True), copying to the canonicalmedia/{resource_id}/file.{ext}path and marking the intermediatecompute/key for deferred deletion - The
compute/prefix is automatically added by theR2ImageStorageclass in the Compute Server - See MediaResource Lifecycle for full ingestion details
Lifecycle Policies¶
Lifecycle policies for objects are configured on Cloudflare R2 webUI. Each bucket (staging and production) has its own retention times.
| Path | Retention | Notes |
|---|---|---|
temp/* |
Depending on environment | Presigned upload staging area, auto-deleted |
compute/* |
Depending on environment | Compute results area, auto-deleted after more time to preserve observability |
media/* |
Permanent | Customer content (canonical MediaResource paths; orphan scanner may delete unattached resources after age threshold) |
images/* |
Permanent | Customer content (legacy paths for pre-existing data) |
shooting_looks/* |
Permanent | Customer content (legacy paths for pre-existing data) |
Content Delivery¶
CDN Configuration¶
Content is served through Cloudflare CDN with aggressive caching:
| Content Type | Cache-Control | Notes |
|---|---|---|
| Generated images | public, max-age=31536000, immutable |
1 year, content-addressed URLs |
| Product images | public, max-age=31536000, immutable |
1 year, content-addressed URLs |
| All stored images | public, max-age=31536000, immutable |
1 year |
| API responses | No cache | Dynamic content |
Cloudflare Image Transformations¶
In production, the Backend uses Cloudflare Image Transformations for on-the-fly format conversion via the ?type= query parameter on the file serving endpoint:
| Parameter | Values | Description |
|---|---|---|
type |
webp, jpeg, png, avif, gif |
Target format for conversion |
The Backend constructs transformation URLs in the format:
This is only active when CLOUDFLARE_IMAGE_TRANSFORMATIONS_ENABLED=true. In local development (MinIO), the file is served directly without transformation.
URL Resolution¶
Files are stored with relative paths in the database (e.g., images/products/abc123/product_abc123_SKU001_d4e5f6_cover.webp). These are resolved to full CDN URLs at serving time:
| Environment | URL Pattern |
|---|---|
| Production | https://media.sartiq.com/{file_path} |
| Staging | https://staging-media.sartiq.com/{file_path} |
| Development | http://localhost:9002/shootify-media-dev/{file_path} |
The Backend's file serving endpoint (GET /files/{file_path}) returns a 302 redirect to the appropriate CDN/MinIO URL.
AWS S3 + LakeFS (Dataset Storage)¶
S3 is used exclusively for dataset storage and data versioning with LakeFS. This remains on AWS (eu-central-1) with DynamoDB as the LakeFS metadata store.
S3 Buckets (eu-central-1)¶
| Bucket | Purpose |
|---|---|
garment-accuracy |
LakeFS backing store for garment accuracy project |
shootify-datasets |
Training and evaluation datasets |
LakeFS¶
LakeFS provides Git-like versioning for data assets, backed by S3 for object storage and DynamoDB for metadata.
- Version control for datasets
- Branching for experiments
- Rollback capabilities
| Resource | Details |
|---|---|
| UI/API | https://lakefs.sartiq.com |
| Backing Store | s3://garment-accuracy (eu-central-1) |
| Metadata Store | AWS DynamoDB (eu-central-1) |
| Host | AWS-LakeFS (18.199.125.185) |
Related Documentation¶
- Infrastructure Overview — Cloud providers and architecture
- Data Flows — How data moves through the system
- Product Ingestion — Upload flow details