API Rate Limiting and Best Practices

This guide covers BuyWhere API rate limits, handling 429 responses, and best practices for building reliable agent-native commerce applications.

Rate Limits by Tier

TierRequests/HourRequests/Minute (burst)
Free10020
Basic1,000100
Pro10,000500

Enterprise tiers have custom limits — contact support for details.

Rate Limit Headers

Every API response includes rate limit status headers:

HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the current window
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp when the window resets

Example response headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1712236800

Handling 429 Too Many Requests

When you exceed your rate limit, the API returns HTTP 429:

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Retry after 60 seconds."
  }
}

The Retry-After Header

The Retry-After header tells you how many seconds to wait:

Retry-After: 47

Always respect this header rather than guessing.

Exponential Backoff

Never retry immediately after a 429. Use exponential backoff to avoid thundering herd problems:

Python Example

import time
import httpx
import random

def fetch_with_backoff(
    url: str,
    headers: dict,
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
) -> httpx.Response:
    """
    Fetch with exponential backoff and jitter.

    Args:
        url: API endpoint URL
        headers: Request headers (include Authorization)
        max_retries: Maximum number of retry attempts
        base_delay: Initial delay in seconds
        max_delay: Maximum delay cap in seconds

    Returns:
        httpx.Response on success

    Raises:
        httpx.HTTPStatusError: After max_retries exhausted
    """
    for attempt in range(max_retries):
        response = httpx.get(url, headers=headers, timeout=30.0)

        if response.status_code != 429:
            return response

        retry_after = response.headers.get("Retry-After")
        if retry_after:
            wait_seconds = int(retry_after)
        else:
            # Exponential backoff: 1, 2, 4, 8, 16... capped at max_delay
            wait_seconds = min(base_delay * (2 ** attempt), max_delay)

        # Add jitter (±25%) to prevent synchronized retries
        jitter = wait_seconds * 0.25 * (random.random() * 2 - 1)
        actual_wait = wait_seconds + jitter

        print(f"Rate limited. Attempt {attempt + 1}/{max_retries}. "
              f"Waiting {actual_wait:.1f}s...")
        time.sleep(actual_wait)

    raise httpx.HTTPStatusError(
        f"Rate limit retry exhausted after {max_retries} attempts",
        request=response.request,
        response=response,
    )

JavaScript/TypeScript Example

async function fetchWithBackoff(
  url: string,
  headers: Record<string, string>,
  maxRetries = 5,
  baseDelayMs = 1000,
  maxDelayMs = 60000
): Promise<Response> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, { headers, signal: AbortSignal.timeout(30000) });

    if (response.status !== 429) {
      return response;
    }

    const retryAfter = response.headers.get("Retry-After");
    let waitMs = retryAfter ? parseInt(retryAfter, 10) * 1000
      : Math.min(baseDelayMs * Math.pow(2, attempt), maxDelayMs);

    // Add jitter (±25%)
    const jitter = waitMs * 0.25 * (Math.random() * 2 - 1);
    const actualWait = waitMs + jitter;

    console.log(`Rate limited. Attempt ${attempt + 1}/${maxRetries}. Waiting ${actualWait.toFixed(0)}ms...`);
    await new Promise(resolve => setTimeout(resolve, actualWait));
  }

  throw new Error(`Rate limit retry exhausted after ${maxRetries} attempts`);
}

Caching Strategies

Caching dramatically reduces API calls and improves response times for your agents.

Cache Invalidation

The BuyWhere API sets Cache-Control headers on stable endpoints:

Cache-Control: public, max-age=300  # 5-minute cache

For the search endpoint, cache aggressively — results for the same query don't change frequently.

Recommended Cache TTLs

Endpoint PatternRecommended TTLNotes
GET /v1/search?q=...5-15 minutesVary by query freshness needs
GET /v1/products/{id}1 hourProduct data changes slowly
GET /v1/categories1 hourTaxonomy is stable
GET /v1/brands30 minutesBrand counts fluctuate
Price comparison endpoints5 minutesPrices change frequently

Cache Key Design

Use query parameters as cache keys, but normalize them:

import hashlib
import json

def build_cache_key(base: str, **params) -> str:
    """Build a normalized cache key."""
    # Sort params for consistent hashing
    normalized = json.dumps(params, sort_keys=True, default=str)
    hash_suffix = hashlib.md5(normalized.encode()).hexdigest()[:12]
    return f"{base}:{hash_suffix}"

# Always produces same key regardless of param order
key1 = build_cache_key("search", q="nike", limit=20, offset=0)
key2 = build_cache_key("search", offset=0, limit=20, q="nike")
assert key1 == key2

Batch Request Patterns

Instead of making many individual requests, combine operations:

Product Lookup Batching

Use the bulk lookup endpoint instead of looping:

# BAD: 100 individual API calls
product_ids = [1001, 1002, 1003, ...]
for pid in product_ids:
    response = httpx.get(f"{BASE_URL}/v1/products/{pid}", headers=headers)
    # ...

# GOOD: Single bulk lookup
response = httpx.post(
    f"{BASE_URL}/v1/products/bulk-lookup",
    headers=headers,
    json={"ids": product_ids}
)
products = response.json()["items"]

Search with Facets

Request facets in a single call rather than making separate calls:

# BAD: Two separate API calls
search_results = httpx.get(f"{BASE_URL}/v1/search", params={"q": "nike shoes"})
filters = httpx.get(f"{BASE_URL}/v1/search/filters", params={"q": "nike shoes"})

# GOOD: Include facets in search response
search_results = httpx.get(
    f"{BASE_URL}/v1/search",
    params={"q": "nike shoes"},
    headers={"X-Include-Facets": "true"}  # if supported
)

Monitoring Your Usage

Track these metrics to stay within limits:

  1. X-RateLimit-Remaining — Watch for low values
  2. 429 response count — Alert when retry attempts spike
  3. Request latency — Slow responses often precede rate limiting
class RateLimitMonitor:
    def __init__(self, warn_threshold: float = 0.8):
        self.warn_threshold = warn_threshold
        self.limit = 1000
        self.remaining = 1000

    def update_from_response(self, response: httpx.Response):
        self.limit = int(response.headers.get("X-RateLimit-Limit", self.limit))
        self.remaining = int(response.headers.get("X-RateLimit-Remaining", self.remaining))

        usage_pct = 1 - (self.remaining / self.limit) if self.limit > 0 else 0
        if usage_pct >= self.warn_threshold:
            print(f"WARNING: {usage_pct:.0%} of rate limit used. "
                  f"Consider caching or reducing request frequency.")

    def should_throttle(self) -> bool:
        return self.remaining < 10  # Throttle when <10 requests left