Security researchers have documented how smart TVs and other consumer devices are being used as exit nodes in a massive web-scraping operation that supplies data to AI companies. The system, operated by Bright Data (formerly Luminati), turns always-on devices into residential proxies that bypass anti-bot protections to gather training data for AI models.
400 Million Residential IPs Power AI Data Collection
Bright Data operates what it advertises as the largest residential proxy network in the world, with more than 400 million residential IPs. A security researcher reverse-engineered the iOS SDK that Bright Data embeds in consumer apps and found it turns devices into exit nodes that relay web-scraping traffic for the AI industry. The system routes scraping jobs through residential connections from Comcast, T-Mobile, and other consumer internet providers, allowing requests to arrive at target sites from paying customer IPs rather than blocked datacenter addresses.
Smart TV Apps Include Hidden Proxy Functionality
Bright Data publishes a list of app partners that includes makers of smart-TV apps such as PlayWorks Digital, CloudTV, and Longvision. With Bright's SDK embedded, a viewer's smart TV becomes part of a global proxy network that crawls and scrapes the web. The consent-sourced pool claims more than 150 million IPs. Users encounter opt-in screens in free apps that often obscure the true nature of what they are consenting to, making household devices direct participants in the AI data collection economy without transparent disclosure.
AI Companies Rely on Residential Proxies to Bypass Anti-Bot Defenses
AI companies depend on web-scraped content for pre-training, retrieval, agent grounding, and search. However, services like Cloudflare, DataDome, and HUMAN throttle or block requests from known cloud IPs. Residential proxies solve this problem by making scraping traffic appear to come from legitimate home connections. Together, devices in the Bright Data network gather petabytes of public web data from different locations and IP addresses, which is then resold to companies to train AI models.
Privacy Risks Extend Beyond Data Theft
The immediate risk is not hacked accounts or stolen data, but rather that home connections and bandwidth get used as someone else's scraping infrastructure. Users may unknowingly become liable for traffic patterns, rate limiting, or IP bans that result from scraping activity they didn't initiate. The research, published on Include Security's blog on June 6, 2026, reached the Hacker News front page, highlighting growing concern about how consumer devices are being monetized in the AI training data economy.
Key Takeaways
- Bright Data operates a residential proxy network with more than 400 million IPs, including smart TVs, to bypass anti-bot protections for AI web scraping
- Smart TV app makers including PlayWorks Digital, CloudTV, and Longvision embed Bright Data's SDK, turning devices into scraping exit nodes
- AI companies use residential proxies to gather training data because datacenter IPs are blocked by Cloudflare, DataDome, and HUMAN
- Users' home connections become liable for scraping traffic patterns, rate limiting, and IP bans from activity they didn't initiate
- The system operates through opt-in screens in free apps that often obscure the true nature of user consent