Published 1 day ago • loading... • Updated 10 hours ago

Reddit Blocks Internet Archive to End Sneaky AI Scraping

Reddit has announced that it will start blocking bots from The Internet Archive's Wayback Machine due to concerns about AI projects accessing Reddit content from this resource.
The Internet Archive, which maintains data on 866 billion web pages, plays a valuable role in preserving digital history, but Reddit's move will significantly limit its capacity on this front.
Reddit's decision to restrict access to its data for AI firms seems financially motivated, hoping to spur more lucrative licensing deals like those struck with OpenAI and Google, which are expected to generate over $200 million in revenue over the next three years.

Insights by Ground AI

Does this summary seem wrong?

1 Podcast or Opinion

Livestream

Daily Tech News Show

Daily Technology and News livestream featuring Tom Merritt, Sarah Lane and Robb Dunewood

AOL Might Let You Cancel Your Account Now - DTNS Live 5079

Daily Tech News Show discuss Reddit blocking the Wayback Machine to protect its data from AI scraping.

1 day ago

Listen to Full Episode Full Episode Unlock Timestamp

Get Vantage — Podcasts, Ratings, Timestamps

Podcasts & Opinions

42 Articles

ZDNet

Reposted by

IT Security News - cybersecurity, infosecurity news

Center

Reddit blocks the Internet Archive from crawling its data - here's why

The social media platform is cracking down on backdoor data harvesting.

23 hours ago·United States

Read Full Article

The Blaze

Reposted by

Conservative Review

Right

Reddit bars Internet Archive from its website, sparking access concerns

Reddit is limiting the Internet Archive’s Wayback Machine to only its home page, accusing AI companies of scraping user data in violation of its policies. The move, which follows Reddit’s deals with some AI firms and lawsuits against others, has sparked debate over user privacy, open-web ideals, and financial motives.

1 day ago·United States

Read Full Article

Fast Company

Lean Left

AI data wars push Reddit to block the Wayback Machine

As the battle to train artificial intelligence models becomes more intense and Reddit’s rich content library becomes more valuable, the social media giant has taken steps to block the Internet Archive from indexing its pages. While the Wayback Machine has historically recorded all Reddit pages, comments and user profiles, the company has put limits on what the system can scrape. Moving forward, it will only be permitted to archive the site’s hom…

1 day ago

Read Full Article

9to5Mac

Center

Reddit blocks non-profit Wayback Machine from archiving the site

The Internet Archive’s Wayback Machine is one of the most valuable free services available on the web, ensuring that important...

1 day ago

Read Full Article

The News

Lean Right

Reddit blocks Internet Archive to thwart AI data scrapers

Reddit has announced to block the Internet Archive from indexing popular Reddit threads to prevent artificial intelligence firms from scrapping the content for training purposes. According to...

1 day ago·Pakistan

Read Full Article