Computational Research Use of Web Archive Data

The ALEXANDRIA project

Researcher: L3S Research Center, University of Hannover, Germany (Computer Science)
Description: Advances semantic and time-based indexing for Web archives using human-compiled knowledge available on the Web, to efficiently index, retrieve and explore information about entities and events from the past.
Funding: European Research Council Advanced Grants
Web Archives for Longitudinal Knowledge

Graph of Canadian government domain

Researcher: University of Waterloo, the University of Alberta, and York
University (Canadian Digital
Description: Data mining and building exploratory interfaces for the study of historical Canadian web data.
Funding: Social Sciences and
Humanities Research Council,
Technology Companies

Global Event and Trend Archive Research & Integrated Digital Event Archiving

News & Tweet Processing and Archiving Architecture

Researcher: Digital Library Research Laboratory, Virginia Tech University
Description: This project devises interactive, integrated, digital library/archive systems coupled
with linked and expert-curated web collections supporting research on urgent global challenge
events and initiatives.
Funding: National Science Foundation
Project: PoliInformatics & The .gov Web Domain Analysis

Researcher: PoliInformatics group
(University of Washington & George
Mason University, Political Science
Description: Data mining archival federal
websites for statistical analysis, network
approaches and computerized text
scraping and analytics of political trends
such as legislation and policy on topics
such as terrorism, peacekeeping, and
Funding: National Science Foundation
Attention to Financial Crisis across .gov domain

Tempas: Historical Web Archive Search and Information Retrieval

Researcher: Alexandria Project, University of Hannover
Description: Research and development for information retrieval and text and URL search for
web archive collections. Funding: European Commission.

Storytelling for Summarizing Collections in Web Archives

Collection = thematic sample from the Web & Story = arranged sample from the collection

Researcher: Web Science and Digital Libraries Research Group, Old Dominion University
Description: Develop tools and techniques for integrating “storytelling” social media and web
archiving to systematically link curation, preservation, and narrative web tools.
Funding: Institute for Museum and Library Services

NetLab & Historical Internet Data for the Arts and Humanities

Researcher: Research Infrastructure for the Study of Archived Web Materials Network
Description: Conducting a number of research-driven projects to contribute to the establishment,
test and development of a research infrastructure for the study of online as well as archived
internet materials. Also working in concert with a number of national libraries, as well as
Internet Archive, to support the study of national web domains.
Funding: European Commission | References: &

An Archival Reconstruction of the Former Yugoslav Top Level Domain

Researcher: Anat Ben-David, Professor of Sociology, The Open University of Israel, others
Description: Analyze the deleted .yu domain to problematize the ties between the live and
archived Web, and to both question and demonstrate the utility of Web archives as a primary source for historiography. References:

The colors of the national Web: visual data analysis of the historical Yugoslav Web domain

Researcher: Open University of Israel, University of Haifa, others.
Description: Using visual data analysis as a method for studying national web domains by
analyzing color profiles of images extracted from a specific ccTLD in the web archive.


Description: Interactive visualization of the history of the web based on longitudinal link structure between archived websites to enable study of the underlying structures of the web.

World Wide Web of Humanities

Researcher: Oxford Internet Institute
Description: Establishing a framework for e-Humanities research using available open source
tools and technologies and archived web content to create novel research interfaces to the first of
many, scholarly, e-Humanities web collections.
Funding: Joint Information Systems Committee & National Endowment for the Humanities

Using Archival Resources to Conduct Data-Intensive Internet Research & Newspapers and the long-term implications of hyperlinking

Researcher: Matt Weber, School of Information & Communication, Rutgers University
Description: The BCC-SBE Collaborative Research Project, "Using Archival Resources to
Conduct Data-Intensive Internet Research," has three goals: (1) to build a community of scholars
focused on tackling next-generation questions of Internet research through the use of archival
digital data; (2) to create sample databases and develop a prototype research tool,
HistoryTracker, using data from the Internet Archive, a library of Web pages from the World
Wide Web; and (3) to maintain an active community of scholars using the cutting-edge
community platform HUBzero. Funding: National Science Foundation
References: &


Researcher: Greg Niemeyer Director, Berkeley Center for New Media, UC Berkeley
Description: GIF Collider is an endlessly looping computer program exploring the trove of 4.5
million GIF (Graphic Interchange File) image animations created between 1996 to 2007 by users of Geocities, a free web hosting service.

Efficient and Effective Search Services Over Archival Webs

Researcher: Brian D. Davison, Professor of Computer Science & Engineering, Lehigh University
Description: Investigating efficient and effective approaches to store, index, and retrieve web
content from large-scale historical archives. The temporal content and structure of the archives
are mined to exploit temporal characteristics that can improve search result ranking.
Funding: National Science Foundation