API Rate Limiting and Best Practices
This guide covers BuyWhere API rate limits, handling 429 responses, and best practices for building reliable agent-native commerce applications.
Rate Limits by Tier
| Tier | Requests/Hour | Requests/Minute (burst) |
|---|---|---|
| Free | 100 | 20 |
| Basic | 1,000 | 100 |
| Pro | 10,000 | 500 |
Enterprise tiers have custom limits — contact support for details.
Rate Limit Headers
Every API response includes rate limit status headers:
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed in the current window |
X-RateLimit-Remaining | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Example response headers:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1712236800
Handling 429 Too Many Requests
When you exceed your rate limit, the API returns HTTP 429:
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded. Retry after 60 seconds."
}
}
The Retry-After Header
The Retry-After header tells you how many seconds to wait:
Retry-After: 47
Always respect this header rather than guessing.
Exponential Backoff
Never retry immediately after a 429. Use exponential backoff to avoid thundering herd problems:
Python Example
import time
import httpx
import random
def fetch_with_backoff(
url: str,
headers: dict,
max_retries: int = 5,
base_delay: float = 1.0,
max_delay: float = 60.0,
) -> httpx.Response:
"""
Fetch with exponential backoff and jitter.
Args:
url: API endpoint URL
headers: Request headers (include Authorization)
max_retries: Maximum number of retry attempts
base_delay: Initial delay in seconds
max_delay: Maximum delay cap in seconds
Returns:
httpx.Response on success
Raises:
httpx.HTTPStatusError: After max_retries exhausted
"""
for attempt in range(max_retries):
response = httpx.get(url, headers=headers, timeout=30.0)
if response.status_code != 429:
return response
retry_after = response.headers.get("Retry-After")
if retry_after:
wait_seconds = int(retry_after)
else:
# Exponential backoff: 1, 2, 4, 8, 16... capped at max_delay
wait_seconds = min(base_delay * (2 ** attempt), max_delay)
# Add jitter (±25%) to prevent synchronized retries
jitter = wait_seconds * 0.25 * (random.random() * 2 - 1)
actual_wait = wait_seconds + jitter
print(f"Rate limited. Attempt {attempt + 1}/{max_retries}. "
f"Waiting {actual_wait:.1f}s...")
time.sleep(actual_wait)
raise httpx.HTTPStatusError(
f"Rate limit retry exhausted after {max_retries} attempts",
request=response.request,
response=response,
)
JavaScript/TypeScript Example
async function fetchWithBackoff(
url: string,
headers: Record<string, string>,
maxRetries = 5,
baseDelayMs = 1000,
maxDelayMs = 60000
): Promise<Response> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, { headers, signal: AbortSignal.timeout(30000) });
if (response.status !== 429) {
return response;
}
const retryAfter = response.headers.get("Retry-After");
let waitMs = retryAfter ? parseInt(retryAfter, 10) * 1000
: Math.min(baseDelayMs * Math.pow(2, attempt), maxDelayMs);
// Add jitter (±25%)
const jitter = waitMs * 0.25 * (Math.random() * 2 - 1);
const actualWait = waitMs + jitter;
console.log(`Rate limited. Attempt ${attempt + 1}/${maxRetries}. Waiting ${actualWait.toFixed(0)}ms...`);
await new Promise(resolve => setTimeout(resolve, actualWait));
}
throw new Error(`Rate limit retry exhausted after ${maxRetries} attempts`);
}
Caching Strategies
Caching dramatically reduces API calls and improves response times for your agents.
Cache Invalidation
The BuyWhere API sets Cache-Control headers on stable endpoints:
Cache-Control: public, max-age=300 # 5-minute cache
For the search endpoint, cache aggressively — results for the same query don't change frequently.
Recommended Cache TTLs
| Endpoint Pattern | Recommended TTL | Notes |
|---|---|---|
GET /v1/search?q=... | 5-15 minutes | Vary by query freshness needs |
GET /v1/products/{id} | 1 hour | Product data changes slowly |
GET /v1/categories | 1 hour | Taxonomy is stable |
GET /v1/brands | 30 minutes | Brand counts fluctuate |
| Price comparison endpoints | 5 minutes | Prices change frequently |
Cache Key Design
Use query parameters as cache keys, but normalize them:
import hashlib
import json
def build_cache_key(base: str, **params) -> str:
"""Build a normalized cache key."""
# Sort params for consistent hashing
normalized = json.dumps(params, sort_keys=True, default=str)
hash_suffix = hashlib.md5(normalized.encode()).hexdigest()[:12]
return f"{base}:{hash_suffix}"
# Always produces same key regardless of param order
key1 = build_cache_key("search", q="nike", limit=20, offset=0)
key2 = build_cache_key("search", offset=0, limit=20, q="nike")
assert key1 == key2
Batch Request Patterns
Instead of making many individual requests, combine operations:
Product Lookup Batching
Use the bulk lookup endpoint instead of looping:
# BAD: 100 individual API calls
product_ids = [1001, 1002, 1003, ...]
for pid in product_ids:
response = httpx.get(f"{BASE_URL}/v1/products/{pid}", headers=headers)
# ...
# GOOD: Single bulk lookup
response = httpx.post(
f"{BASE_URL}/v1/products/bulk-lookup",
headers=headers,
json={"ids": product_ids}
)
products = response.json()["items"]
Search with Facets
Request facets in a single call rather than making separate calls:
# BAD: Two separate API calls
search_results = httpx.get(f"{BASE_URL}/v1/search", params={"q": "nike shoes"})
filters = httpx.get(f"{BASE_URL}/v1/search/filters", params={"q": "nike shoes"})
# GOOD: Include facets in search response
search_results = httpx.get(
f"{BASE_URL}/v1/search",
params={"q": "nike shoes"},
headers={"X-Include-Facets": "true"} # if supported
)
Monitoring Your Usage
Track these metrics to stay within limits:
X-RateLimit-Remaining— Watch for low values- 429 response count — Alert when retry attempts spike
- Request latency — Slow responses often precede rate limiting
class RateLimitMonitor:
def __init__(self, warn_threshold: float = 0.8):
self.warn_threshold = warn_threshold
self.limit = 1000
self.remaining = 1000
def update_from_response(self, response: httpx.Response):
self.limit = int(response.headers.get("X-RateLimit-Limit", self.limit))
self.remaining = int(response.headers.get("X-RateLimit-Remaining", self.remaining))
usage_pct = 1 - (self.remaining / self.limit) if self.limit > 0 else 0
if usage_pct >= self.warn_threshold:
print(f"WARNING: {usage_pct:.0%} of rate limit used. "
f"Consider caching or reducing request frequency.")
def should_throttle(self) -> bool:
return self.remaining < 10 # Throttle when <10 requests left