MediaResource Lifecycle¶
This document describes the MediaResource system — the centralized, content-addressable media management layer introduced to replace per-entity storage path conventions with a unified, deduplicated approach.
Overview¶
Why MediaResource Exists¶
Before MediaResource, each entity type (Product, Subject, Style, etc.) managed its own storage paths with bespoke naming conventions (images/products/{id}/..., images/subjects/...). This led to:
- No deduplication — identical files uploaded twice consumed double storage
- Fragile path coupling — entity-specific path formats baked into multiple services
- No centralized cleanup — orphaned files accumulated when entities were deleted
- Inconsistent metadata — dimensions, hashes, and file info tracked differently per entity
MediaResource solves these problems with:
- Content-addressable storage — files identified by MD5 content hash, automatic dedup
- Entity-agnostic canonical paths — all files stored at
media/{resource_id}/file.{ext} - Polymorphic attachments — one join table links any entity to any media resource
- Resilient deletion — deferred post-commit deletions with dead-letter retry and orphan scanning
Data Model¶
Entity-Relationship Diagram¶
erDiagram
MediaResource ||--o{ MediaResourceAttachment : "has many"
MediaResource {
uuid id PK
enum resource_type "IMAGE | VIDEO"
string storage_key "R2 path"
string content_hash "MD5 for dedup"
string extension
int file_size
int width
int height
string dominant_color
bool protected
string alt_text
string caption
string color_profile "images only"
float duration "videos only"
datetime created_at
}
MediaResourceAttachment {
uuid id PK
uuid media_resource_id FK
enum entity_type "PRODUCT | SUBJECT | ..."
uuid entity_id
string slot "e.g. cover_image"
int position "NULL for single slots"
datetime created_at
}
MediaDeletionDeadLetter {
uuid id PK
string storage_key
string error_message
int attempts
datetime created_at
datetime last_attempted_at
}
MediaResource¶
The central entity representing a unique media file in storage.
| Field | Type | Description |
|---|---|---|
id |
UUID | Primary key |
resource_type |
IMAGE | VIDEO |
Single-table inheritance discriminator |
storage_key |
string | Relative R2 path (e.g., media/{id}/file.webp) |
content_hash |
string | MD5 hash for deduplication |
extension |
string | File extension (e.g., webp, png) |
file_size |
int | Size in bytes |
width / height |
int? | Pixel dimensions |
dominant_color |
string? | Hex color code |
protected |
bool | Prevents deletion by orphan scanner |
alt_text / caption |
string? | Accessibility and display text |
color_profile |
string? | ICC profile (images only) |
duration |
float? | Length in seconds (videos only) |
Computed properties:
- aspect_ratio — width / height (if both present)
- orientation — "portrait", "landscape", or "square"
MediaResourceAttachment¶
Polymorphic join table connecting any entity to any MediaResource.
| Field | Type | Description |
|---|---|---|
media_resource_id |
UUID (FK) | Points to media_resource.id |
entity_type |
MediaEntityType | Which entity table (e.g., PRODUCT, PREDICTION) |
entity_id |
UUID | The owning entity's primary key |
slot |
string | Field name on entity (e.g., cover_image, reference_images) |
position |
int? | Order within list slots; NULL for single-value slots |
Uniqueness constraints:
- Single slots: UNIQUE(entity_type, entity_id, slot) WHERE position IS NULL
- List slots: UNIQUE(entity_type, entity_id, slot, position) WHERE position IS NOT NULL
MediaDeletionDeadLetter¶
Records failed R2 storage deletions for retry by the scheduler.
| Field | Type | Description |
|---|---|---|
storage_key |
string | R2 path that failed to delete |
error_message |
string | Failure reason |
attempts |
int | Retry counter (starts at 1) |
last_attempted_at |
datetime | Most recent retry timestamp |
Canonical Path¶
All new media files are stored at:
Regex: ^media/[0-9a-f\-]{36}/file\.\w+$
Benefits:
- Entity-agnostic — renaming or re-typing an entity requires no file move
- Predictable — any resource_id maps to exactly one path
- No naming collisions — UUID guarantees uniqueness
Legacy paths (images/products/..., images/subjects/..., shooting_looks/...) remain valid for pre-existing data. Both legacy and canonical paths resolve correctly via CDN.
Ingestion Flows¶
Path A: User Uploads (via ingest_all)¶
Used when creating or updating entities from API requests with file URLs.
sequenceDiagram
participant FE as Frontend
participant R2 as R2 Storage
participant BE as Backend
participant DB as Database
FE->>R2: PUT file to presigned URL
Note over R2: File at temp/{file_id}_{ts}_{name}
FE->>BE: POST /products (with temp URLs)
BE->>R2: HEAD temp file (get ETag, size)
BE->>BE: Resolve content_hash from ETag
alt Dedup hit (hash exists)
BE->>R2: DELETE temp file
BE->>DB: Reuse existing MediaResource
else Dedup miss (new file)
BE->>DB: CREATE MediaResource
BE->>R2: COPY temp → media/{resource_id}/file.{ext}
BE->>DB: UPDATE storage_key to canonical path
end
BE->>DB: CREATE MediaResourceAttachment
BE->>DB: COMMIT
BE->>R2: Delete temp file (deferred)
Entities using Path A: Product, Subject, Style, Organization, ShotType, GuidelinesShotType, ShootingLookOutfit, BackgroundPreset
Key method: media_service.ingest_all(entity, input_schema)
Path B: Compute Results (via ingest_from_storage_key, relocate=True)¶
Used when the Backend ingests results from the Compute Server.
sequenceDiagram
participant CS as Compute Server
participant R2 as R2 Storage
participant BE as Backend
participant DB as Database
CS->>R2: Save result at compute/{type}s/{task_id}/result_0.webp
CS-->>BE: task.completed event
BE->>R2: HEAD permanent key (get ETag, size)
BE->>BE: Resolve content_hash
alt Dedup hit
BE->>DB: Reuse existing MediaResource, attach
Note over BE: Source key queued for deferred deletion
else Dedup miss
BE->>DB: CREATE MediaResource
BE->>R2: COPY source → media/{resource_id}/file.{ext}
BE->>DB: UPDATE storage_key to canonical
BE->>DB: CREATE attachment
Note over BE: Source key queued for deferred deletion
end
BE->>DB: COMMIT
BE->>R2: Delete source key (deferred, dead-letter on failure)
Entities using Path B: Prediction (result_image), Refine
Path B.2: Shared References (via ingest_from_storage_key, relocate=False)¶
Used for files that should remain at their source path because multiple entities reference them.
- File stays at the existing R2 path (no copy/move)
- MediaResource created with
storage_keypointing to the source path - No deferred deletion of source
Entities using Path B.2: Generation snapshot fields (e.g., product_snapshot_url)
Schema System¶
Input Schemas¶
The MediaInput schema standardizes how media fields are received in API requests:
class MediaInput(SQLModel):
url: Optional[str] = None # Presigned temp URL (new upload)
media_resource_id: Optional[uuid.UUID] = None # Reuse existing resource
width: Optional[int] = None
height: Optional[int] = None
# ... other metadata fields
Validation: Exactly one of url or media_resource_id must be provided.
Slot Annotations¶
MediaSlot is a PEP 593 Annotated metadata tag that decouples Python field names from persisted slot identifiers:
# Schema field definition
cover_img: Annotated[Optional[MediaInput], MediaSlot("cover_image")] = None
Renaming cover_img to main_image on the Python side requires no data migration — the persisted slot name remains "cover_image".
Type Factories¶
Convenience factories reduce boilerplate:
| Factory | Produces | Example |
|---|---|---|
MediaField("slot") |
Annotated[Optional[MediaInput], MediaSlot("slot")] |
Single image input |
MediaListField("slot") |
Annotated[Optional[List[MediaInput]], MediaSlot("slot")] |
Multiple images input |
MediaOutput("slot") |
Annotated[Optional[MediaResourcePublic], MediaSlot("slot")] |
Single image response |
MediaListOutput("slot") |
Annotated[Optional[List[MediaResourcePublic]], MediaSlot("slot")] |
Multiple images response |
Auto-Discovery¶
discover_media_fields(schema_class) introspects a Pydantic schema to find all MediaInput (or MediaResourcePublic) fields, unwrapping Annotated, Optional, Union, and List layers. Returns a dict of field_name → MediaFieldInfo(slot, is_list, field_name, config).
extract_media_inputs(input_data) reads values from a schema instance, using model_fields_set to distinguish between "not provided" (skip) and "explicitly set to None" (detach).
Response Population¶
populate_media_response(entity, response) auto-populates MediaResourcePublic output fields on a response schema by matching slot names to fetched media data. A batch variant populate_media_responses_batch() handles list endpoints efficiently.
Response Format¶
The MediaResourcePublic schema is returned in API responses:
{
"id": "a1b2c3d4-...",
"resource_type": "IMAGE",
"url": "https://media.sartiq.com/media/a1b2c3d4-.../file.webp",
"content_hash": "d41d8cd98f00b204e9800998ecf8427e",
"extension": "webp",
"file_size": 245760,
"width": 1024,
"height": 1536,
"aspect_ratio": 0.667,
"orientation": "portrait",
"dominant_color": "#2a3f5f",
"protected": false,
"alt_text": null,
"caption": null
}
The url field contains the full CDN URL resolved from the internal storage_key.
Deduplication¶
Content Hash¶
Every file is identified by its MD5 content hash, derived from the S3/R2 ETag:
- Standard upload — ETag is the MD5 hash; use directly
- Multipart upload — ETag is a composite hash (contains
-); download file and compute true MD5
Race Condition Handling¶
When two concurrent requests upload the same file:
- Both pass the dedup check (no existing hash yet)
- First
INSERTsucceeds; second hitsIntegrityError(unique constraint oncontent_hash) - On
IntegrityError: rollback,SELECTthe existing resource, reuse it, delete the duplicate file
Resilience¶
Deferred Deletions¶
Storage deletions are never performed inside a database transaction. Instead:
- During the transaction, storage keys are appended to
_pending_deletions - After
session.commit(),execute_deferred_deletions()is called - Each key is deleted from R2; failures are recorded in the dead-letter table
This prevents inconsistency where a transaction rolls back but files were already deleted.
Dead-Letter Table¶
Failed deletions are recorded in media_deletion_dead_letter with:
- Incrementing attempts counter
- Updated last_attempted_at timestamp
- Capped at max_attempts (default 10) before being abandoned
A scheduler job calls retry_media_deletions() periodically to retry pending entries.
Orphan Scanner¶
find_orphans_batch() identifies MediaResource records with zero attachments:
- Age-gated — only considers resources older than a threshold (default 60 minutes) to avoid deleting resources mid-transaction
- FOR UPDATE locking — prevents TOCTOU races where an attachment is created between the orphan check and deletion
- Orphaned resources are deleted from both the database and R2 storage
Transaction Safety¶
- All CRUD
flush()without committing — the caller controls the transaction boundary SELECT ... FOR UPDATEused when checking orphan status or reusing existing resources- Relocated files tracked in
_pending_relocationsfor rollback cleanup on transaction failure
Entity Slot Map¶
All entities that participate in the MediaResource system and their registered slots:
| Entity | media_entity_type |
Slots |
|---|---|---|
| Product | PRODUCT |
cover_image, back_image, reference_images (list) |
| Subject | SUBJECT |
cover_image |
| Style | STYLE |
cover_image, reference_images (list) |
| Organization | ORGANIZATION |
logo |
| ShotType | SHOT_TYPE |
reference_image |
| GuidelinesShotType | GUIDELINES_SHOT_TYPE |
reference_image |
| BackgroundPreset | BACKGROUND_PRESET |
reference_image |
| ShootingLookOutfit | SHOOTING_LOOK_OUTFIT |
reference_images (list) |
| Prediction | PREDICTION |
result_image |
| Refine | REFINE |
result_image |
| Generation | GENERATION |
(snapshot fields — not yet fully integrated) |
| ShotRevisionImage | SHOT_REVISION_IMAGE |
(not yet integrated) |
Migration Status¶
The MediaResource system is in a dual-format transition period:
What's Complete¶
- All entities listed above (except noted exceptions) use MediaResource for new uploads
- Canonical
media/{resource_id}/file.{ext}paths for all new files - Deduplication, resilient deletion, orphan scanning
MediaResourcePublicin API responses alongside legacy URL strings
Dual-Format Writeback¶
During the transition, services write to both:
- The new MediaResourceAttachment records
- The old *_url string fields on entities (e.g., product.cover_image_url)
This ensures backward compatibility while frontends migrate to the new response format.
Backfill Script¶
A backfill script at scripts/backfill_media_resources.py creates MediaResource records for pre-existing data that only has legacy *_url fields.
Not Yet Integrated¶
| Entity / Field | Status |
|---|---|
ShotRevisionImage |
Manual uploads not yet routed through MediaResource |
Subject.base_images |
JSON field with image URLs; requires schema migration |
Generation snapshot fields |
5 snapshot URL fields (product_snapshot_url, etc.) use relocate=False but full integration pending |
Key Source Files¶
| File | Purpose |
|---|---|
app/services/media_resource_service.py |
Central orchestrator — ingestion, dedup, deletion, response population |
app/models/media_resource.py |
Database models — MediaResource, MediaResourceAttachment, MediaDeletionDeadLetter |
app/schemas/media_resource.py |
Schemas — MediaInput, MediaResourcePublic, MediaSlot, type factories |
app/crud/media_resource.py |
CRUD — queries, attachments, dead-letter, orphan scanner |
app/utils/media_introspection.py |
Schema auto-discovery — discover_media_fields(), extract_media_inputs() |
Related Documentation¶
- Storage Infrastructure — R2 bucket structure, CDN, lifecycle policies
- Product Ingestion — Upload flow with MediaResource integration
- Generation Pipeline — Compute result ingestion via MediaResource
- Backend Architecture — Service layer overview
- Backend Schemas — API schema reference