Rate Limits¶
The Compute API implements rate limiting to ensure fair resource allocation and system stability.
Rate Limit Tiers¶
Rate limits are applied per organization based on subscription tier.
| Tier | Requests/min | Concurrent Tasks | Daily Tasks |
|---|---|---|---|
| Starter | 60 | 5 | 500 |
| Professional | 300 | 25 | 5,000 |
| Enterprise | 1,000 | 100 | Unlimited |
Rate Limit Headers¶
All responses include rate limit information:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1705320000
X-RateLimit-Retry-After: 30
| Header | Description |
|---|---|
X-RateLimit-Limit |
Maximum requests per window |
X-RateLimit-Remaining |
Requests remaining in window |
X-RateLimit-Reset |
Unix timestamp when window resets |
X-RateLimit-Retry-After |
Seconds until retry (when limited) |
Rate Limit Response¶
When rate limited, the API returns:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 30
{
"detail": "Rate limit exceeded",
"retry_after": 30
}
Endpoint-Specific Limits¶
Some endpoints have additional limits:
| Endpoint | Limit | Window |
|---|---|---|
POST /tasks/generation |
30/min | Per organization |
POST /training/ |
5/hour | Per organization |
POST /workflows/ |
20/min | Per organization |
GET /monitoring/* |
120/min | Per user |
Concurrent Task Limits¶
Beyond request rate limits, there are limits on concurrent running tasks:
flowchart LR
subgraph Limits["Task Concurrency"]
GEN[Generation: 10 concurrent]
TRAIN[Training: 2 concurrent]
WF[Workflows: 5 concurrent]
end
When at capacity, new task submissions are queued (not rejected).
Queue Behavior¶
| Scenario | Behavior |
|---|---|
| Under limit | Task starts immediately |
| At limit | Task queued, starts when slot available |
| Queue full | 503 Service Unavailable |
Best Practices¶
Handling Rate Limits¶
import time
import httpx
def make_request_with_retry(url, headers, max_retries=3):
for attempt in range(max_retries):
response = httpx.get(url, headers=headers)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 30))
print(f"Rate limited. Waiting {retry_after}s...")
time.sleep(retry_after)
continue
return response
raise Exception("Max retries exceeded")
async function makeRequestWithRetry(
url: string,
headers: HeadersInit,
maxRetries = 3
): Promise<Response> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, { headers });
if (response.status === 429) {
const retryAfter = parseInt(
response.headers.get("Retry-After") || "30"
);
console.log(`Rate limited. Waiting ${retryAfter}s...`);
await new Promise((r) => setTimeout(r, retryAfter * 1000));
continue;
}
return response;
}
throw new Error("Max retries exceeded");
}
Batch Operations¶
Instead of many individual requests, use batch endpoints where available:
# Instead of multiple single requests
curl -X POST .../tasks/generation -d '{"image": "img1.jpg"}'
curl -X POST .../tasks/generation -d '{"image": "img2.jpg"}'
curl -X POST .../tasks/generation -d '{"image": "img3.jpg"}'
# Use batch endpoint
curl -X POST .../tasks/generation/batch -d '{
"tasks": [
{"image": "img1.jpg"},
{"image": "img2.jpg"},
{"image": "img3.jpg"}
]
}'
Monitor Usage¶
Check your current usage via the monitoring endpoints:
Quota Management¶
View Current Quota¶
{
"tier": "professional",
"daily_tasks": {
"limit": 5000,
"used": 1234,
"remaining": 3766
},
"concurrent_tasks": {
"limit": 25,
"active": 8
},
"resets_at": "2024-01-16T00:00:00Z"
}
Quota Exceeded¶
When daily quota is exceeded:
Status: 403 Forbidden
Requesting Limit Increases¶
For limit increases, contact support with:
- Current organization ID
- Requested limits
- Use case justification
- Expected usage patterns
Enterprise customers can negotiate custom limits.