ARCH
ARCH (Archives Research Compute Hub) is a platform for building research collections, analyzing them computationally, and generating datasets from terabytes and even petabytes of data. ARCH supports the open publication and preservation of user-generated datasets created from thousands of libraries, archives, and memory organizations worldwide, giving researchers, students, and information professionals the power to study and understand digital collections in new ways.
Interested? Start now!
For pricing and service offering inquiries, please fill out our interest form. For other questions or to talk to ARCH staff, contact us at [email protected].
ARCH Features
Build: Curate a research collection for analysis using primary source web, text, and image digital collections.
Access: Generate more than a dozen different datasets (full text, images, pdfs, graph data, etc.) from primary source digital collections with the click of a button. Download generated datasets in-browser or via API.
Analyze: Easily work with research-ready datasets, both through in-browser visualizations in ARCH and in interactive computational tools like Jupyter Notebooks, Google CoLab, Gephi, and Voyant, and others.
Publish and Preserve: Publish datasets with one click on archive.org, where they can be openly accessed and shared. All published datasets are preserved in perpetuity.
Support: Technical support, online training, and extensive help center documentation are all available.
Streamline Data-Driven Research
ARCH leverages the Internet Archive’s non-profit infrastructure and open-source tools to streamline the computational use of digital collections. Librarians, collection managers, and educators can offer ARCH to their researchers and students to facilitate sophisticated research processes that would otherwise require coding/scripting skills and significant computing resources.
Publications
Recent research using ARCH software and datasets:
- Networks of Trust: The Growth of Anti-vaccine Activism, 1982-2014 (2026)
- Documenting Traces of Latin American Feminist Movements: Digital Counterarchives as Sites of Memory and Resistance (2025)
- Digital pioneers: Mormon mommy bloggers and building the “Bloggernacle” (2025)
- Follow the updates! Reconstructing past practices with web archive data (2024)
- Capturing and documenting the wider health impacts of the COVID-19 pandemic through the Remember Rebuild Saskatchewan Initiative: Protocol for a mixed methods interdisciplinary project (2023)
- Sliding data: Feminist methodologies pathways (2023)
- From healthy communities to toxic debates: Disqus’ changing ideas about comment moderation (2022)
- Protection and distortion: the space-time of born-digital heritage (2022)
Explore our service offerings

Web Archiving
Archive-It is our user-controlled, web-based service for creating curated, publicly accessible web archives and born-digital collections.
Learn about Archive-It
Text & Data Mining
ARCH is our research and data processing service that helps users easily build, analyze and study large digital collections computationally and create datasets and data visualizations.
Learn about ARCH
Digital Preservation
Vault is our low-cost, easy-to-use digital repository and preservation service to store, manage, and preserve digital files and collections.
Learn about Vault