Skip to main content
UsedBy.ai
All articles
Trend Analysis3 min read
Published: May 20, 2026

The FiveThirtyEight Index and the Recovery of Data Journalism Archives

The FiveThirtyEight Index uses a Python-based crawler to surface 21,350 unique URLs from the Wayback Machine's CDX API, bypassing the site-wide redirect implemented by ABC News in early 2025 (Source:

Marcus Webb
Marcus Webb
Senior Backend Analyst

The FiveThirtyEight Index uses a Python-based crawler to surface 21,350 unique URLs from the Wayback Machine's CDX API, bypassing the site-wide redirect implemented by ABC News in early 2025 (Source: GitHub, Editor & Publisher).

The Pitch

Ben Welsh, a News Applications Editor at Reuters, has built a searchable directory for the 16-year history of FiveThirtyEight after Disney effectively erased the publication's legacy. It provides a clean Svelte-based interface for content that corporate owners attempted to bury behind an ABC News Politics redirect. Data journalists and backend engineers are currently using it to recover historical datasets that were previously considered lost to the "link rot" of 2025.

Under the Hood

The technical architecture is straightforward: a Python crawler interacts with the Internet Archive’s CDX API to map the publication's history from 2008 to 2024 (Source: GitHub). This index acts as a specialized pointer, mapping 21,350 unique articles to their most stable archived snapshots (Source: Reddit r/fivethirtyeight). By using a Svelte frontend, Welsh avoids the overhead of the standard Wayback Machine UI, making the archive searchable for the first time since the original site’s demise.

However, the tool is a portal, not a mirror. Complex JavaScript-heavy interactives—such as the famous 2016 election models or the "P-hacking" interactive—remain partially or fully broken (Source: HN). These legacy assets frequently fail because they call backend scripts or data JSONs that weren't captured during the original crawls.

There are significant gaps in the current documentation. We don't know yet if the index fully covers the "projects.fivethirtyeight.com" subdomains, which housed the most computationally heavy election models (Source: UsedBy Dossier). Furthermore, the legal status of this "content rehydration" remains unconfirmed, as it is unclear if Disney has issued updated robots.txt instructions or legal challenges to the Internet Archive regarding these specific assets.

The tool’s survival is entirely dependent on the Internet Archive’s infrastructure. If the Wayback Machine faces further legal pressure or technical outages, this index becomes a directory of dead links. It is a fragile layer of discovery built on top of a volatile storage medium.

Marcus's Take

The FiveThirtyEight Index is a necessary piece of digital archaeology, but it highlights the precarious nature of modern web engineering. As a backend analyst, I find the dependency on the Wayback Machine’s CDX API both elegant and dangerously thin. Use this for retrieving text-based historical data or verifying past reporting, but do not expect it to serve as a reliable environment for running legacy data visualizations. It is a library catalog, not the library itself.


Ship clean code,
Marcus.

Marcus Webb
Marcus Webb

Marcus Webb - Senior Backend Analyst at UsedBy.ai

Related Articles

Stay Ahead of AI Adoption Trends

Get our latest reports and insights delivered to your inbox. No spam, just data.