Computational Research Use of Web Archive Data

The ALEXANDRIA project

Researcher: L3S Research Center, University of Hannover, Germany (Computer Science)
Description: Advances semantic and time-based indexing for Web archives using human-compiled knowledge available on the Web, to efficiently index, retrieve and explore information about entities and events from the past.
Funding: European Research Council Advanced Grants
References: http://alexandria-project.eu/ & https://www.l3s.de/home

Web Archives for Longitudinal Knowledge

Graph of Canadian government domain

Researcher: University of Waterloo, the University of Alberta, and York
University (Canadian Digital
Humanities)
Description: Data mining and building exploratory interfaces for the study of historical Canadian web data.
Funding: Social Sciences and
Humanities Research Council,
Technology Companies http://webarchives.ca

Global Event and Trend Archive Research & Integrated Digital Event Archiving


News & Tweet Processing and Archiving Architecture

Researcher: Digital Library Research Laboratory, Virginia Tech University
Description: This project devises interactive, integrated, digital library/archive systems coupled
with linked and expert-curated web collections supporting research on urgent global challenge
events and initiatives.
Funding: National Science Foundation
References: http://www.eventsarchive.org/ & http://www.cs.vt.edu/

Project: PoliInformatics & The .gov Web Domain Analysis

Researcher: PoliInformatics group
(University of Washington & George
Mason University, Political Science
Departments)
Description: Data mining archival federal
websites for statistical analysis, network
approaches and computerized text
scraping and analytics of political trends
such as legislation and policy on topics
such as terrorism, peacekeeping, and
jurisprudence.
Funding: National Science Foundation
References:
https://thepoliticalmethodologist.com/2017/03/16/the-gov-internet-archive-a-bigdata-
resource-for-political-science/ -
http://poliinformatics.org/

Attention to Financial Crisis across .gov domain

Tempas: Historical Web Archive Search and Information Retrieval

Researcher: Alexandria Project, University of Hannover
Description: Research and development for information retrieval and text and URL search for
web archive collections. Funding: European Commission.
References: http://tempas.l3s.de/v2/index

Storytelling for Summarizing Collections in Web Archives

Collection = thematic sample from the Web & Story = arranged sample from the collection

Researcher: Web Science and Digital Libraries Research Group, Old Dominion University
Description: Develop tools and techniques for integrating “storytelling” social media and web
archiving to systematically link curation, preservation, and narrative web tools.
Funding: Institute for Museum and Library Services
References: http://bit.ly/2irUwMh

NetLab & Historical Internet Data for the Arts and Humanities

Researcher: Research Infrastructure for the Study of Archived Web Materials Network
Description: Conducting a number of research-driven projects to contribute to the establishment,
test and development of a research infrastructure for the study of online as well as archived
internet materials. Also working in concert with a number of national libraries, as well as
Internet Archive, to support the study of national web domains.
Funding: European Commission | References: http://www.netlab.dk/ & http://resaw.eu/

An Archival Reconstruction of the Former Yugoslav Top Level Domain

Researcher: Anat Ben-David, Professor of Sociology, The Open University of Israel, others
Description: Analyze the deleted .yu domain to problematize the ties between the live and
archived Web, and to both question and demonstrate the utility of Web archives as a primary source for historiography. References: http://bit.ly/2i3Bv3y

The colors of the national Web: visual data analysis of the historical Yugoslav Web domain

Researcher: Open University of Israel, University of Haifa, others.
Description: Using visual data analysis as a method for studying national web domains by
analyzing color profiles of images extracted from a specific ccTLD in the web archive.
Reference: http://bit.ly/2ipIWVU

Webverse

Description: Interactive visualization of the history of the web based on longitudinal link structure between archived websites to enable study of the underlying structures of the web.
References: http://webverse.archive.org/

World Wide Web of Humanities

Researcher: Oxford Internet Institute
Description: Establishing a framework for e-Humanities research using available open source
tools and technologies and archived web content to create novel research interfaces to the first of
many, scholarly, e-Humanities web collections.
Funding: Joint Information Systems Committee & National Endowment for the Humanities
References: http://wwwoh-access.archive.org/wwwoh/about.htm

Using Archival Resources to Conduct Data-Intensive Internet Research & Newspapers and the long-term implications of hyperlinking

Researcher: Matt Weber, School of Information & Communication, Rutgers University
Description: The BCC-SBE Collaborative Research Project, "Using Archival Resources to
Conduct Data-Intensive Internet Research," has three goals: (1) to build a community of scholars
focused on tackling next-generation questions of Internet research through the use of archival
digital data; (2) to create sample databases and develop a prototype research tool,
HistoryTracker, using data from the Internet Archive, a library of Web pages from the World
Wide Web; and (3) to maintain an active community of scholars using the cutting-edge
community platform HUBzero. Funding: National Science Foundation
References: http://matthewsweber.com/current-research/nsf-internet-archive/ &
http://wwwconference.org/proceedings/www2014/companion/p1031.pdf

GifCollider

Researcher: Greg Niemeyer Director, Berkeley Center for New Media, UC Berkeley
Description: GIF Collider is an endlessly looping computer program exploring the trove of 4.5
million GIF (Graphic Interchange File) image animations created between 1996 to 2007 by users of Geocities, a free web hosting service.
References: http://www.bampfa.org/program/gif-collider-10

Efficient and Effective Search Services Over Archival Webs


Researcher: Brian D. Davison, Professor of Computer Science & Engineering, Lehigh University
Description: Investigating efficient and effective approaches to store, index, and retrieve web
content from large-scale historical archives. The temporal content and structure of the archives
are mined to exploit temporal characteristics that can improve search result ranking.
Funding: National Science Foundation
References: http://wume.cse.lehigh.edu/projects/archive/