M
MercyNews
Home
Back

Reddit History Preserved: New Tool Archives 2.38B Posts Offline

Hacker News13h ago
3 min read
📋

Key Facts

  • ✓ The tool processes the 3.28TB Pushshift torrent containing 2.38 billion Reddit posts.
  • ✓ It generates static HTML, requiring no JavaScript or external internet connection to browse.
  • ✓ Includes a full REST API with 30+ endpoints and an MCP server for AI integration.
  • ✓ Deployment options range from a simple USB drive to a Tor hidden service.
  • ✓ The project is built using Python, PostgreSQL, Jinja2, and Docker.
  • ✓ It is released into the Public Domain on GitHub.

In This Article

  1. The Digital Time Capsule
  2. How It Works
  3. Total Ownership
  4. Advanced Capabilities
  5. Deployment Options
  6. Looking Ahead

The Digital Time Capsule#

Reddit's ecosystem has undergone a seismic shift in recent years. With the effective death of the public API and the disappearance of third-party applications, access to the platform's vast repository of discussions has become increasingly restricted. The Pushshift dataset, a critical resource for researchers and archivists, has faced repeated threats of being cut off, leaving the future of Reddit's collective knowledge in jeopardy.

Now, a new open-source project offers a definitive solution. A developer has built a tool capable of transforming the entire 3.28TB torrent of Reddit history into a fully functional, offline-accessible archive. This innovation ensures that once the data is downloaded, it belongs to the user forever—immune to corporate decisions, API keys, or internet connectivity.

How It Works#

The core function of the tool is deceptively simple yet powerful. It ingests compressed data dumps from Reddit (in .zst format), as well as archives from Voat and Ruqqus, and generates static HTML files. This approach eliminates the need for complex server infrastructure or constant internet access. Users simply open the generated index.html file in any browser to navigate through posts and comments.

For those requiring advanced functionality, an optional Docker stack with PostgreSQL can be deployed. This remains entirely on the user's machine, providing full-text search capabilities without external requests. The system is designed for maximum flexibility and privacy:

  • No JavaScript or external tracking
  • Works on air-gapped machines
  • Serves content over a local LAN (e.g., Raspberry Pi)
  • Can be distributed via USB drive

"Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away."

— Project Developer

Total Ownership#

The primary value proposition is data sovereignty. Once the Pushshift torrent is downloaded and processed, the user owns the data. There are no API keys to manage, no rate limits to navigate, and no Terms of Service changes that can revoke access. This is a critical development for anyone relying on Reddit data for long-term projects or research.

Once you have the data, you own it. No API keys, no rate limits, no ToS changes can take it away.

The tool scales efficiently. The PostgreSQL backend ensures that memory usage remains constant regardless of dataset size. While a single instance can handle tens of millions of posts, the full 2.38 billion post dataset can be managed by running multiple instances segmented by topic. This architecture makes preserving the entirety of Reddit's history a feasible task for individuals and small organizations.

Advanced Capabilities#

Beyond simple browsing, the archive is built for integration and automation. It ships with a full REST API featuring over 30 endpoints. Users can query posts, comments, users, subreddits, and perform aggregations directly against their local database.

Perhaps most notably, the project includes a Model Context Protocol (MCP) server with 29 tools. This allows AI applications to query the local Reddit archive directly, opening up new possibilities for AI-driven analysis and data mining without relying on cloud services. The developer built the tool using Python, PostgreSQL, Jinja2 templates, and Docker, utilizing Claude Code in an experiment of AI-assisted development.

Deployment Options#

The tool is designed to be accessible to users with varying levels of technical expertise. It supports a wide range of hosting scenarios, from the simplest to the most secure. The available self-hosting options include:

  • USB Drive / Local Folder: The most basic setup; just open the HTML files.
  • Home Server (LAN): Serve the archive to devices on a Raspberry Pi or similar hardware.
  • Tor Hidden Service: Two commands enable access via Tor without port forwarding.
  • VPS with HTTPS: Standard web hosting for public or private access.
  • GitHub Pages: Suitable for hosting smaller archives.

A live demo of the archiver is available online, showcasing the static browsing experience. The project code is released into the Public Domain via GitHub, encouraging widespread adoption and contribution.

Looking Ahead#

The release of this archiver tool represents a significant step in the preservation of digital culture. As platforms evolve and restrict access, the ability for individuals to maintain their own archives becomes increasingly valuable. This project provides a robust, scalable, and private method for ensuring that the 2.38 billion posts that constitute Reddit's history remain accessible for future generations.

By democratizing access to massive datasets, the tool empowers researchers, developers, and enthusiasts to continue their work without fear of platform instability. It stands as a testament to the open-source community's ability to respond to centralized control with decentralized solutions.

Continue scrolling for more

AI Transforms Mathematical Research and Proofs
Technology

AI Transforms Mathematical Research and Proofs

Artificial intelligence is shifting from a promise to a reality in mathematics. Machine learning models are now generating original theorems, forcing a reevaluation of research and teaching methods.

Just now
4 min
170
Read Article
NASA Crew-11 to Return Early Due to Medical Issue
Science

NASA Crew-11 to Return Early Due to Medical Issue

Astronauts Zena Cardman, Mike Fincke, Kimi Yui, and Oleg Platonov are departing the International Space Station days ahead of schedule due to a health concern. NASA officials confirm the situation is stable.

2h
5 min
13
Read Article
Mainland Capital Fuels Hong Kong Property Recovery
Economics

Mainland Capital Fuels Hong Kong Property Recovery

Surging mainland Chinese investment in Hong Kong’s commercial real estate sector has helped set the stage for a 'measured recovery' in 2026, according to Colliers.

2h
5 min
15
Read Article
Hong Kong Leader to Address New Legco on Tai Po Fire
Politics

Hong Kong Leader to Address New Legco on Tai Po Fire

Chief Executive John Lee Ka-chiu is set to address the new Legislative Council as it convenes for its first meeting, with the aftermath of the Tai Po fire dominating the agenda.

2h
3 min
12
Read Article
Russia Opens Crypto Market to Non-Qualified Investors
Cryptocurrency

Russia Opens Crypto Market to Non-Qualified Investors

Anatoly Aksakov confirms a draft bill is ready to let non-qualified investors trade crypto, marking a significant shift in Russia's digital asset regulations.

2h
5 min
13
Read Article
Golden Globes Ratings Dip to 8.7 Million in 2026
Entertainment

Golden Globes Ratings Dip to 8.7 Million in 2026

The 83rd annual Golden Globes reached an average of 8.7 million viewers on Sunday night, marking the ceremony's third year in a row of airing on CBS.

2h
5 min
12
Read Article
Technology

ASCII Clouds: Visualizing Code as Art

A new project transforms source code into stunning ASCII art clouds, blending programming with visual creativity and earning praise from the tech community.

2h
4 min
11
Read Article
US DOJ Releases Documents on Operation Absolute Resolve
Politics

US DOJ Releases Documents on Operation Absolute Resolve

Partially redacted documents from the US Department of Justice shed new light on the scope and details of Operation Absolute Resolve, a major federal initiative.

2h
5 min
12
Read Article
Revolut Stablecoin Payments Surge 156% in 2025
Cryptocurrency

Revolut Stablecoin Payments Surge 156% in 2025

Stablecoin transfer volumes on Revolut have skyrocketed 156% in 2025, with customers actively using digital currencies for everyday payments between $100 and $500.

2h
5 min
13
Read Article
ICE Agent Accused of Stealing iPhone from Minor
Crime

ICE Agent Accused of Stealing iPhone from Minor

A minor alleges an ICE agent confiscated his iPhone during an arrest, only for the device to resurface in a used-electronics vending machine. The incident raises questions about agent conduct and property handling.

3h
4 min
12
Read Article
🎉

You're all caught up!

Check back later for more stories

Back to Home