Reddit Blocks Internet Archive to End Sneaky AI Scraping
Reddit blocks Internet Archive from indexing most content to prevent AI data scraping and seeks to monetize data access amid ongoing disputes with AI firms, company statements show.
- Reddit has announced that it will start blocking bots from The Internet Archive's Wayback Machine due to concerns about AI projects accessing Reddit content from this resource.
- The Internet Archive, which maintains data on 866 billion web pages, plays a valuable role in preserving digital history, but Reddit's move will significantly limit its capacity on this front.
- Reddit's decision to restrict access to its data for AI firms seems financially motivated, hoping to spur more lucrative licensing deals like those struck with OpenAI and Google, which are expected to generate over $200 million in revenue over the next three years.
42 Articles
42 Articles
Reddit bars Internet Archive from its website, sparking access concerns
Reddit is limiting the Internet Archive’s Wayback Machine to only its home page, accusing AI companies of scraping user data in violation of its policies. The move, which follows Reddit’s deals with some AI firms and lawsuits against others, has sparked debate over user privacy, open-web ideals, and financial motives.
AI data wars push Reddit to block the Wayback Machine
As the battle to train artificial intelligence models becomes more intense and Reddit’s rich content library becomes more valuable, the social media giant has taken steps to block the Internet Archive from indexing its pages. While the Wayback Machine has historically recorded all Reddit pages, comments and user profiles, the company has put limits on what the system can scrape. Moving forward, it will only be permitted to archive the site’s hom…
Coverage Details
Bias Distribution
- 50% of the sources lean Left
Factuality
To view factuality data please Upgrade to Premium