To stop AI scrapers, Reddit shuts down the Wayback machine

If you have ever used the Wayback Machine to view a thread that has since disappeared or to retrieve an old Reddit post, that window is about to close.

In response to allegations that certain artificial intelligence (AI) firms have been surreptitiously navigating the Internet Archive’s Wayback Machine to get around its data restrictions, Reddit has declared that it is shutting down the majority of its website.

What is Internet Archive?

A non-profit organization called the Internet Archive is committed to conserving as much of the history of the internet as possible, including books, cultural artifacts, and outdated websites. Anyone can view how a webpage appeared at a particular moment in time using its Wayback Machine, even if it has since been altered or removed. But according to Reddit, the archive has also been retaining posts that users have deleted, which it claims is a privacy concern.

In a statement to The Verge, Reddit spokesperson Tim Rathschmidt stated, “Internet Archive offers a service to the open web, but we’ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine.” “We’re restricting some of their access to Reddit data to protect Redditors until they can defend their site and adhere to platform policies (such as protecting user privacy and removing removed content).”

Reddit claims to have informed the Internet Archive beforehand, and the new limitations have been in place since yesterday.

Reddit posts, comments, and profiles will no longer be able to be saved by the Wayback Machine as a result of the modification. Now, it can only save the Reddit homepage. The archive, which keeps snippets of Reddit’s extensive discussions, has long been a favorite among reporters, scholars, and interested users. It will no longer serve as a complete historical record, but rather as a snapshot of the day’s top stories.

This action fits into a broader pattern: as AI companies race to find content to train their models, Reddit has been tightening control over its data for years. Millions have reportedly been made through deals with Google and OpenAI, and Reddit has made it clear that AI companies must request access if they wish to do so.

For years, Reddit has been enforcing stricter controls over its data, especially as AI firms compete for content to feed their models. Reddit has made it clear that AI companies must pay to access the platform, despite reports that deals with Google and OpenAI have brought in millions of dollars. The business even filed a lawsuit against Anthropic, an AI start-up, earlier this year, alleging that it had scraped the website without authorization.

Timeloop Machine Director Mark Graham made It Known, “We have a longstanding relationship with Reddit and keep going to have ongoing discussions about such a matter.”

Reddit claims the action is about protecting user privacy and following its guidelines, but some are concerned it could erase parts of the internet’s past. A piece of online culture that might have been preserved is lost forever when a post disappears from Reddit and cannot be archived.

Leave a Reply

Your email address will not be published. Required fields are marked *