Anti-Bot Evasion Scraper Inventory
Issue: BUY-3400 Status: Active Last Updated: 2026-04-19 Owner: Parse (Scraping Engineer)
Overview
This document tracks all scrapers that employ anti-bot evasion techniques. These scrapers require external services (ScraperAPI, Playwright, cloudscraper) to bypass anti-bot protection implemented by e-commerce platforms.
Evasion Methods
| Method | Description | External Dependency |
|---|---|---|
| ScraperAPI | Proxy service with built-in anti-bot bypass and JavaScript rendering | ScraperAPI account + API key |
| Playwright | Browser automation for JavaScript-rendered content and stealth browsing | Playwright + browser binaries |
| cloudscraper | Cloudflare bypass library | cloudscraper library |
Scrapers with Anti-Bot Evasion
ScraperAPI-based Scrapers
| Platform | Module | Evasion Method | Notes |
|---|---|---|---|
| Amazon SG | scrapers.amazon_sg_scraperapi | ScraperAPI render=true | Bypasses Amazon anti-bot |
| Shopee SG | scrapers.shopee_sg_scraperapi | ScraperAPI render=true | Bypasses Shopee anti-bot |
| Lazada SG | scrapers.lazada_sg_scraperapi | ScraperAPI | Bypasses Lazada anti-bot |
| RedMart SG | scrapers.redmart_sg_scraperapi | ScraperAPI | Bypasses Lazada anti-bot |
| Watsons SG | scrapers.watsons_sg_scraperapi | ScraperAPI | Bypasses Watsons anti-bot |
| Carousell SG | scrapers.carousell_sg_scraperapi | ScraperAPI | Bypasses Carousell anti-bot |
| Tokopedia ID | scrapers.tokopedia_id | ScraperAPI | Bypasses Tokopedia anti-bot |
| Lazada TH | scrapers.lazada_th_scraperapi | ScraperAPI | Bypasses Lazada TH anti-bot |
| Lazada VN | scrapers.lazada_vn_scraperapi | ScraperAPI | Bypasses Lazada VN anti-bot |
| Shopee TH | scrapers.shopee_th | ScraperAPI | Bypasses Shopee TH anti-bot |
| Giant SG | scrapers.giant_sg_proxy | ScraperAPI | Bypasses Giant anti-bot |
| Nordstrom US | scrapers.nordstrom_us_scraperapi | ScraperAPI | Bypasses Nordstrom anti-bot |
| Macys US | scrapers.macys_us_scraperapi | ScraperAPI | Bypasses Macys anti-bot |
| Sephora US | scrapers.sephora_us_scraperapi | ScraperAPI + premium proxies | Bypasses Sephora anti-bot |
| Gap US | scrapers.gap_us_scraperapi | ScraperAPI | Bypasses Gap anti-bot |
| CVS US | scrapers.cvs_us_scraperapi | ScraperAPI | Bypasses CVS anti-bot |
| Kohl's US | scrapers.kohls_us | ScraperAPI | Bypasses Kohl's anti-bot |
| Target US | scrapers.target_us_scraperapi | ScraperAPI | Bypasses Target anti-bot |
| Yahoo Shopping JP | scrapers.yahoo_shopping_jp | ScraperAPI | Bypasses Yahoo Shopping JP anti-bot |
Playwright-based Scrapers
| Platform | Module | Notes |
|---|---|---|
| Lazada SG | scrapers.lazada_sg | Browser automation for JavaScript rendering |
| Watsons SG Playwright | scrapers.watsons_sg_playwright | Handles Watsons JS-rendered content |
| Watsons SG Hybrid | scrapers.watsons_sg_hybrid | Playwright + API hybrid approach |
| Castlery SG | scrapers.castlery_sg_playwright | Bypasses anti-bot via browser |
| Carousell SG | scrapers.carousell_sg_home_appliances | Enhanced anti-bot handling |
| Harvey Norman SG | scrapers.harvey_norman_sg_v2 | Optional ScraperAPI fallback |
| Harvey Norman SG Full | scrapers.harvey_norman_sg_full | Optional ScraperAPI fallback |
| iPrice SG | scrapers.iprice_sg | JavaScript rendering for anti-bot |
| Temu SG | scrapers.temu_sg | Playwright-based bypass |
| Naiise SG | scrapers.naiise_sg | Browser automation |
| Wayfair US | scrapers.wayfair_us | PerimeterX anti-bot bypass |
| Ulta US | scrapers.ulta_us | JavaScript rendering + anti-bot |
| Walgreens US | scrapers.walgreens_us | JavaScript rendering + anti-bot |
| Walgreens US Playwright | scrapers.walgreens_us_playwright | Enhanced stealth browsing |
| Home Depot US | scrapers.homedepot_us_playwright | Anti-bot with stealth config |
| Home Depot US Undetected | scrapers.homedepot_us_undetected | Undetected-chromedriver |
| Best Buy US | scrapers.bestbuy_us_playwright | Browser automation |
| Best Buy US Sitemap | scrapers.bestbuy_us_sitemap | Playwright for sitemap |
| B&H Photo US | scrapers.bhphoto_us_playwright | Browser automation |
| REI US | scrapers.rei_us | JavaScript rendering |
| Costco US V2 | scrapers.costco_us_v2 | Queue-it anti-bot handling |
| Ulta US Sitemap | scrapers.ulta_us_sitemap | Bypasses waiting room anti-bot |
| Ulta US Sitemap Fixed | scrapers.ulta_us_sitemap_fixed | Enhanced anti-bot bypass |
| Ulta US Undetected | scrapers.ulta_us_undetected | Full anti-bot bypass |
| Chewy US Playwright | scrapers.chewy_us_playwright | Kasada (KPSDK) anti-bot |
| Chewy US Undetected | scrapers.chewy_us_undetected | Kasada anti-bot bypass |
| eBay US Playwright | scrapers.ebay_us_playwright | JavaScript + anti-bot |
| Tokopedia ID Playwright | scrapers.tokopedia_id_playwright | Bypasses anti-bot |
| Bukalapak ID Playwright | scrapers.bukalapak_id_playwright | Browser automation |
| Target US Playwright V2 | scrapers.target_us_playwright_v2 | Enhanced anti-bot bypass |
cloudscraper-based Scrapers
| Platform | Module | Notes |
|---|---|---|
| eBay US | scrapers.ebay_us | Cloudflare bypass |
| Amazon US Sports | scrapers.amazon_us_sports | Cloudflare + anti-bot |
| Amazon US Books | scrapers.amazon_us_books | Cloudflare + anti-bot |
| Amazon US Toys | scrapers.amazon_us_toys | Cloudflare + anti-bot |
| Amazon US Health | scrapers.amazon_us_health | Cloudflare + anti-bot |
| Amazon SG Fashion | scrapers.amazon_sg_fashion | cloudscraper for Cloudflare |
| Amazon SG Grocery | scrapers.amazon_sg_grocery | cloudscraper for Cloudflare |
Blocked Upstream (Requires API Credentials)
The following scrapers are blocked upstream because the platform requires official API access:
| Platform | Module | Blocker Issue |
|---|---|---|
| Shopee SG | scrapers.shopee_sg | BUY-480 - Requires Shopee Open Platform API |
| Lazada SG | scrapers.lazada_sg | BUY-480 - Requires Lazada Open Platform API |
| Lazada SG eng08 | scrapers.lazada_sg_eng08 | BUY-480 - Same Lazada geo-block issue |
Evasion Dependency Status
Critical (Required for scraping)
These platforms are impossible to scrape without anti-bot evasion:
- Amazon (all regions)
- Lazada (all regions)
- Shopee (all regions)
- Carousell
- Watsons
- Tokopedia
- Bukalapak
- Nordstrom
- Macys
- Target
- CVS
- Walgreens
- Costco
- Chewy
- Ulta
High Risk (High dependency on external service)
- All ScraperAPI-dependent scrapers (service availability risk)
- All Playwright-dependent scrapers (browser binary / memory risk)
Configuration
In scripts/scraper_scheduler.py, scrapers with evasion have the field:
"uses_evasion": True # Indicates anti-bot evasion is used
This allows operations to:
- Track which scrapers depend on external evasion services
- Plan for redundancy if evasion services fail
- Monitor costs associated with ScraperAPI usage
Related Issues
- BUY-480: Platform API credentials required for Shopee/Lazada
- BUY-3400: Anti-bot evasion inventory (this document)
Last Review
- 2026-04-19: Initial inventory created by Parse