Archive-It Research Services
Archive-It Research Services (ARS) is expanding the ways that Archive-It partners can enable access to their archives by providing datasets extracted from partner collections. The add-on service will allow any Archive-It partner to give users, researchers, scholars, developers, and other patrons easily-analyzed datasets that contain key metadata elements, link graphs, named entities, and other data derived from the resources within their collections. By supporting access in aggregate to partner archives, ARS aims to facilitate new types of use, research, and analysis of the significant historical records from the web that Archive-It partners are working to collect, preserve, and make accessible.
The ARS supporting documentation describes the type of datasets currently available, provides guides to requesting and acquiring these datasets, and describes some example use cases and types of analysis these datasets enable.
Why Archive-It Research Services?
This page describes the goals, objectives, and origins of the program.
Types of Datasets Currently Available
WAT: Web Archive Transformation files feature key metadata elements that represent every crawled resource in a collection and are derived from a collection’s WARC files.
LGA: Longitudinal Graph Analysis files feature a complete list of what URIs link to what URIs, along with a timestamp, within an entire collection.
WANE: Web Archive Named Entities uses named-entity recognition tools to generate a list of all the people, places, and organizations mentioned in each URI in a collection along with a timestamp of URI capture.
Service Details
This page offers information on ARS service details for current Archive-It subscribers as well as for independent researchers, patrons, and users.
Guides to Downloading ARS datasets
This page offer guides to downloading the different ARS datasets, with both command-line and browser-based instructions.
ARS Talks and Publications
This page provides links to papers, presentations, tutorials, and other materials related to ARS, research using web archives, and research data methodologies.
Questions and inquiries can be sent to aitresearchservices@archive.org.