Who blocked Internet Archive crawlers?

Question

Hans Steiner · Accepted Answer

Internet Archive crawler blocked by major publishers

Twenty-three major news organizations and Reddit have blocked the Internet Archive’s crawler, cutting off automated access to web pages that many journalists and researchers rely on.

The move matters because the Internet Archive’s Wayback Machine functions as a fallback copy of the live web. When publishers block crawling, they can reduce the Archive’s ability to capture new versions of stories, preserve context, and support investigations after pages are edited, removed, or take down. Journalists and advocacy groups have responded by organizing efforts to protect the Archive and push back against access restrictions.

What’s happening, and why it matters

In practice, blocking a crawler doesn’t delete content that’s already been archived, but it can slow or halt future preservation. That affects:

Accountability: older versions of reporting and disclosures can become harder to retrieve.
Research: citations and historical verification can be disrupted when new material can’t be captured.
Access: users who depend on archived pages for practical or educational use lose another route around takedowns and paywalls.

Support is building

The reports say journalists and advocacy groups signed a letter urging support for the Internet Archive amid growing restrictions from large sites. The broader theme is tension between copyright/anti-scraping positions and the role of archiving in keeping information retrievable.

For publishers, the decision signals a tightening stance on automated web capture. For the rest of the internet ecosystem, it raises the risk that the “live web” becomes less recoverable over time, especially as more sites rely on dynamic content and aggressive access controls.