Unveiled Open-Source Instrument Gathering Over 150,000 Pieces of Digital Proof from the Web
The Auto Archiver, a groundbreaking tool developed by Bellingcat, has been publicly launched in 2022. This versatile tool is designed to help users archive web pages and social media posts, making it an invaluable resource for researchers, journalists, and investigators.
The Auto Archiver offers customization based on specific needs, allowing users to tailor the archiving process to their unique requirements. It can be configured using a YAML file or a new interface called the configuration editor.
One of the key features of the Auto Archiver is its modular architecture. This structure consists of six classes: Feeder, Extractor, Enricher, Formatter, Storage, and Database. Each module serves a specific purpose, such as specifying where to read URLs, downloading media, increasing the value of archived content, storing files, and saving a record of the archive.
The Auto Archiver is not limited to Google Sheets; it can be integrated into other systems, including ATLOS, a collaborative investigations platform that has been used by Bellingcat and the Centre for Information Resilience. It can also use proxies to bypass rate-limiting or blocking by certain platforms.
For simpler archiving tasks, other tools like the Wayback Machine, Archive.today, WebRecorder's browser extension ArchiveWebPage, or paid options like Hunchly may be more suitable. However, the Auto Archiver's modular architecture allows for easier integration with other systems, making it a more versatile option in many cases.
The Auto Archiver has been adopted by a wide range of users, from large newsrooms and NGOs to individual researchers, journalists, activists, archivists, academics, and developers. It has been used by Bellingcat's journalists to preserve information on various fast-moving events, including the Jan. 6 riots.
Several investigative journalism organisations and individual researchers have also adopted the Auto Archiver for open-source investigations, including various fact-checking groups and digital forensic analysts worldwide.
The Auto Archiver now includes detailed documentation for installation, configuration, and debugging, making it more accessible to a wider audience. An updated version of the Auto Archiver has been announced, featuring a user-friendly interface, new modular structure, new features like chain of custody, perceptual hashing for deduplication, and techniques to avoid anti-bot measures and captchas on websites, and a user-friendly tool to configure the Auto Archiver.
To date, the Auto Archiver has preserved over 150,000 web pages and social media posts. It is designed for teams using a shared instance and can archive a large number of URLs in a collaborative environment. Moreover, it provides a way to generate metadata that ensures others can trust that archived content has not been tampered with.
The Auto Archiver has two new projects under open source code licenses: the Auto Archiver API and the Auto Archiver Web Interface. These additions further expand the tool's capabilities and accessibility, making it an even more powerful resource for preserving digital evidence.
Read also:
- A continuous command instructing an entity to halts all actions, repeated numerous times.
- Oxidative Stress in Sperm Abnormalities: Impact of Reactive Oxygen Species (ROS) on Sperm Harm
- Is it possible to receive the hepatitis B vaccine more than once?
- Transgender Individuals and Menopause: A Question of Occurrence?