← Back to documentation

What Makes Product Data Actually Usable by AI Agents

The Gap Between Product Data and Agent-Ready Data

There's a significant gap between "we have product data" and "AI agents can actually use this data effectively." Most product APIs were designed for human consumption—rich, detailed, beautifully formatted responses that sound great in a browser but cost too many tokens for agents and create parsing complexity.

Real example: a typical e-commerce API response for a wireless headphone might include:

  • 15 fields of marketing copy
  • 3 image URLs (hero, thumbnail, alt views)
  • HTML-formatted descriptions
  • Nested arrays of specifications
  • Related product suggestions
  • User-generated reviews with avatars

For a human, this is great. For an AI agent paying per token, this is expensive noise.

What Agents Actually Need

AI agents working with product data typically need to:

  1. Find products matching a query
  2. Compare prices across sources
  3. Get structured details for decision-making
  4. Generate buy links for purchases

That's it. The rest is nice-to-have, not must-have.

Token Efficiency

Agents pay per token. A 10-result search response from a traditional e-commerce API might consume 600-1000 tokens in metadata alone. An agent-native API optimizes for this:

// Traditional API (estimated 800 tokens)
{
  "product": {
    "id": "PRD-12345",
    "name": "Sony WH-1000XM5 Wireless Noise Cancelling Headphones - Midnight Black",
    "tagline": "Industry-leading noise cancellation with premium sound quality",
    "description": "<p>The Sony WH-1000XM5 represents the pinnacle...",
    "images": {
      "primary": "https://cdn.example.com/hero.jpg",
      "thumbnail": "https://cdn.example.com/thumb.jpg",
      "gallery": ["https://cdn.example.com/img1.jpg", "..."]
    },
    "specifications": [...],
    "reviews": {...},
    ...
  }
}

// Agent-native API (estimated 150 tokens)
{
  "id": "bw_prod_8823",
  "name": "Sony WH-1000XM5 Wireless Headphones",
  "price_sgd": 398.00,
  "source": "shopee_sg",
  "buy_url": "https://shopee.sg/sony-wh-1000xm5",
  "in_stock": true,
  "rating": 4.8
}

Same information, 5x reduction in token cost.

Core Principles of Agent-Native Product Data

1. Fixed Schema, Always

Agents need to know what fields exist. A response that sometimes includes price and sometimes includes price_sgd (or price_cents, or min_price) creates fragile parsing logic.

# Agents need consistent field names
def parse_product(response):
    return {
        "id": response["id"],
        "name": response["name"],
        "price_sgd": response["price_sgd"],  # Always SGD
        "buy_url": response["buy_url"],
        "in_stock": response["in_stock"]  # Always boolean
    }

No surprises. Same schema every time.

2. Commerce Signals, Not Just Data

Product data for agents needs to include commerce-ready signals:

{
  "confidence_score": 0.94,
  "availability_prediction": "likely_in_stock",
  "price_trend": "stable",
  "affiliate_url": "https://affiliate.buywhere.ai/track/8823"
}
  • confidence_score tells the agent how reliable this data is
  • availability_prediction goes beyond current stock to forecast availability
  • price_trend helps agents advise on timing
  • affiliate_url enables commission-bearing purchases

3. Normalized Across Sources

When you search for "iPhone 15 Pro" across Shopee, Lazada, and Amazon.sg, you should get the same product matched across platforms—not different iPhone listings that happen to have similar names.

{
  "matched_products": [
    {
      "id": "bw_prod_12345",
      "sources": ["shopee_sg", "lazada_sg", "amazon_sg"],
      "price_range_sgd": { "min": 1249, "max": 1399 },
      "lowest_price": {
        "source": "shopee_sg",
        "price_sgd": 1249.00
      }
    }
  ]
}

The agent doesn't need to know how to match products—that's the catalog's job.

4. Error Recovery Signals

When something goes wrong, agents need actionable errors:

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests",
    "retry_after_seconds": 60,
    "suggestion": "Reduce request frequency or upgrade to Pro tier"
  }
}

Not just "error occurred" but "here's what happened and what to do next."

Implementation: Building an Agent-Native Response Handler

import requests
from dataclasses import dataclass
from typing import Optional, List

@dataclass
class Product:
    id: str
    name: str
    price_sgd: float
    source: str
    buy_url: str
    in_stock: bool
    confidence_score: float
    affiliate_url: Optional[str] = None
    
    @classmethod
    def from_api_response(cls, data: dict) -> "Product":
        return cls(
            id=data["id"],
            name=data["name"],
            price_sgd=data["price_sgd"],
            source=data["source"],
            buy_url=data["buy_url"],
            in_stock=data["in_stock"],
            confidence_score=data.get("confidence_score", 0.5),
            affiliate_url=data.get("affiliate_url")
        )

class BuyWhereClient:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.base_url = "https://api.buywhere.ai/v2"
    
    def search(self, query: str, limit: int = 10) -> List[Product]:
        response = requests.get(
            f"{self.base_url}/products",
            headers={"Authorization": f"Bearer {self.api_key}"},
            params={"q": query, "region": "sg", "limit": limit}
        )
        
        if response.status_code != 200:
            self._handle_error(response)
        
        items = response.json().get("items", [])
        return [Product.from_api_response(item) for item in items]
    
    def find_cheapest(self, query: str) -> Optional[Product]:
        products = self.search(query, limit=20)
        
        in_stock = [p for p in products if p.in_stock]
        if not in_stock:
            return None
        
        return min(in_stock, key=lambda p: p.price_sgd)
    
    def _handle_error(self, response):
        error = response.json().get("error", {})
        code = error.get("code", "UNKNOWN")
        
        if code == "RATE_LIMITED":
            raise RateLimitError(error.get("retry_after_seconds", 60))
        elif code == "INVALID_API_KEY":
            raise AuthError("Check your API key")
        else:
            raise APIError(f"{code}: {error.get('message', 'Unknown error')}")

# Usage
client = BuyWhereClient("bw_live_xxxxx")
cheapest = client.find_cheapest("sony wh-1000xm5")

if cheapest:
    print(f"Best price: {cheapest.name} at S${cheapest.price_sgd}")
    print(f"Buy link: {cheapest.buy_url}")

The Trade-offs

Agent-native design isn't free. It involves explicit choices:

What you lose:

  • Rich marketing copy and imagery
  • Deep product specifications
  • User reviews and ratings breakdown
  • Related products and recommendations

What you gain:

  • Predictable parsing
  • Lower token costs
  • Faster response times
  • Reliable agent behavior

For shopping agents focused on price comparison and purchase routing, the trade-off makes sense. For content generation or marketing applications, you'd want richer data.

Schema.org Compatibility

Agent-native doesn't mean non-standard. BuyWhere responses follow Schema.org Product vocabulary:

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Sony WH-1000XM5 Wireless Headphones",
  "brand": {"@type": "Brand", "name": "Sony"},
  "offers": {
    "@type": "Offer",
    "priceCurrency": "SGD",
    "price": "398.00",
    "availability": "https://schema.org/InStock"
  }
}

This means agents can use standard Schema.org parsing logic and still get agent-native efficiency.

Getting Started

If you're building an AI agent that works with products, the data layer matters as much as the model layer. Clean, consistent, token-efficient product data enables reliable agent behavior.

Start with a clean API, not a rich one. You can always add complexity later if needed.

Get an API key at api.buywhere.ai and focus on what your agent should do with the data—not how to parse it.