← Back to documentation

catalog-schema-validation-audit-BUY-3425

Catalog Schema Validation Audit: image_url / price / url

Date: 2026-04-19 Issue: BUY-3425 Status: Complete Scope: Catalog-wide validation of three critical schema fields: image_url, price, url

Summary

SourceTotalMissing ImageZero/Missing PriceMissing URLFlagged
amazon_jp2,3970 (0.0%)0 (0.0%)0No
amazon_sg126,2900 (0.0%)3,789 (3.0%)0No
amazon_us289,1530 (0.0%)108 (0.04%)0No
apple-authorised-sg2,4080 (0.0%)0 (0.0%)0No
beauty-lab-sg2,3820 (0.0%)0 (0.0%)0No
beauty-secrets-sg2,3520 (0.0%)0 (0.0%)0No
bestdenki_sg1200 (0.0%)23 (19.2%)0YES
carousell1,85610 (0.5%)0 (0.0%)0No
challenger15,7595 (0.03%)0 (0.0%)0No
challenger.sg15,9586 (0.04%)212 (1.3%)0No
challenger_sg16,0166 (0.04%)219 (1.4%)0No
cold_storage_sg1010 (0.0%)0 (0.0%)0No
coldstorage1,7650 (0.0%)0 (0.0%)0No
courts4,2570 (0.0%)0 (0.0%)0No
courts_sg4,1310 (0.0%)0 (0.0%)0No
dailymart-sg2,4320 (0.0%)0 (0.0%)0No
decathlon9,4020 (0.0%)0 (0.0%)0No
decathlon_sg9,8750 (0.0%)0 (0.0%)0No
digital-mall-sg2,4020 (0.0%)0 (0.0%)0No
fairprice73,6752 (0.0%)0 (0.0%)0No
fairprice_sg72,6962 (0.0%)0 (0.0%)0No
fashion-villa-sg2,4370 (0.0%)0 (0.0%)0No
flipkart_in26,2990 (0.0%)0 (0.0%)0No
fortytwo3,8610 (0.0%)0 (0.0%)0No
fortytwo_sg3,8610 (0.0%)0 (0.0%)0No
gaincity2,1600 (0.0%)0 (0.0%)0No
guardian7,9970 (0.0%)0 (0.0%)0No
guardian_sg8,0241 (0.01%)1 (0.01%)0No
harvey_norman9,7090 (0.0%)0 (0.0%)0No
home-essentials-sg2,4300 (0.0%)0 (0.0%)0No
homecraft-sg2,3640 (0.0%)0 (0.0%)0No
laz-global-sg2,4210 (0.0%)0 (0.0%)0No
lazada110,1000 (0.0%)0 (0.0%)0No
lazada_sg1,5670 (0.0%)0 (0.0%)0No
lifestyle-hub-sg2,3950 (0.0%)0 (0.0%)0No
medimart-sg2,3500 (0.0%)0 (0.0%)0No
nexus-tech-sg2,2390 (0.0%)0 (0.0%)0No
nike240 (0.0%)0 (0.0%)0No
nike_sg270 (0.0%)0 (0.0%)0No
nike_us400 (0.0%)10 (25.0%)0YES
petloverscentre7,4640 (0.0%)0 (0.0%)0No
popular_sg12,50125 (0.2%)3 (0.02%)0No
samsung-official-sg2,3580 (0.0%)0 (0.0%)0No
sasa1000 (0.0%)0 (0.0%)0No
sg-electronics-mall2,3780 (0.0%)0 (0.0%)0No
sg-mart-official2,3550 (0.0%)0 (0.0%)0No
shengsiong_sg4,3624 (0.09%)0 (0.0%)0No
sports-zone-sg2,3770 (0.0%)0 (0.0%)0No
tangs7,0961 (0.01%)0 (0.0%)0No
tangs_sg13,2220 (0.0%)0 (0.0%)0No
techhub-sg2,3770 (0.0%)0 (0.0%)0No
tiki_vn3,3660 (0.0%)0 (0.0%)0No
toysrus3,3853,385 (100.0%)0 (0.0%)0YES
trendy-closet-sg2,2900 (0.0%)0 (0.0%)0No
ulta_us7000 (0.0%)700 (100.0%)0YES
uniqlo-official-sg2,2990 (0.0%)0 (0.0%)0No
xiaomi-official-sg2,3840 (0.0%)0 (0.0%)0No
zappos_us712644 (90.5%)0 (0.0%)0YES

Total products: 917,458

Aggregate Findings

MetricCountRate
Missing image_url4,0900.45%
Zero/missing price5,0650.55%
Missing url00.0%

Flagged Sources (>10% data quality issues in any field)

1. toysrus — 100% missing image_url

  • Issue: All 3,385 products have no image URL
  • Action required: Re-scrape Toys R Us feed to populate image URLs, or flag for immediate remediation before any product listing use

2. zappos_us — 90.5% missing image_url

  • Issue: 644 of 712 products have no image URL
  • Action required: Investigate zappos_us scraper — image extraction may have regressed. Block from agent-facing product display until remediated

3. ulta_us — 100% zero/missing price

  • Issue: All 700 products have zero or missing price
  • Action required: Investigate ulta_us price extraction pipeline. Likely scraping issue since this affects 100% of products

4. nike_us — 25% zero/missing price

  • Issue: 10 of 40 products have zero or missing price
  • Action required: Re-scrape Nike US feed or fix price extraction logic

5. bestdenki_sg — 19.2% zero/missing price

  • Issue: 23 of 120 products have zero or missing price
  • Action required: Investigate bestdenki_sg scraper price extraction

Clean Sources (Within Threshold)

All other sources pass the 10% threshold for all three schema fields.

URL Field Status

The url field (product page URL on the source site) shows 0 missing values across all 917,458 products. This field is fully populated.

Recommendation

  1. Block toysrus, zappos_us, and ulta_us from any agent-facing product display until schema completeness is remediated
  2. Investigate nike_us and bestdenki_sg price extraction — 25% and 19.2% missing prices indicate likely scraper regressions
  3. Monitor zappos_us image_url recovery — 90.5% missing is severe but may be recoverable if scraper fix is straightforward

Next Steps

  • Assign toysrus re-scrape to scraping team (priority: critical)
  • Investigate zappos_us image extraction regression (priority: critical)
  • Audit ulta_us price pipeline (priority: critical)
  • Fix nike_us and bestdenki_sg price scrapers (priority: high)
  • Re-run audit after remediation