LGA files are web graph files that provide researchers access to the linking behaviors of all resources within the entirety of a web archive across time. By studying the longitudinal details of what webpages links to what other webpages, one can determine networks of influence, the formation and decline of web communities, determine hosts or domains of importance within a collection, and other insights into how websites interact with one another through their linking activity. For more information on LGA files, see LGA Overview and Technical Details.

For additional resources on network analysis, especially from a humanities perspective, readers may want to consult the Introduction to Network Analysis and Representation by Elijah Meeks and Maya Krishnan, Demystifying Networks by Scott Weingart, and the related posts featured on Digital Humanities Now.

Visualizing a network graph of top-level websites in the Human Rights collection

Gephi Timeline visualization of top level websites for Columbia University's Human Rights collection generated using the LGA dataset. The visualization shows the dynamic linking behavior of over 17,000 websites over a period of 6 years. The edges between these websites are weighted by the number of unique links between them (only websites that share more than 100 unique links are represented). There are over 25,000 weighted edges representing an aggregate of over 4 billion unique links between these websites over time. While a bit hard to parse as an animated .gif of this size, the Gephi visualization allows one to explore community formation and website associations over time in the collection.

Plotting top image links within the Fashion Blogs collection

In this example, we used the LGA file from the Fashion Blogs collection of LIM College to determine the 500 most-linked-to image URLs within the collection from four different time periods over a year, roughly corresponding to Spring, Summer, Fall, and Winter. We then plotted each set of 500 top images according to brightness (x-axis) and hue (y-axis) using the ImagePlot visualization tool with the hopes that each season's final plot would correspond to common assumptions about seasonal fashion choices. Though technically a failed research inquiry, as each season's plot bore only minor difference (possibly due to the nature of the archival collection), it still makes for a nice visualization! It also, more to the point, demonstrates how LGA files can support a variety of analytical methods beyond network graph analysis by revealing the most-linked resources within an overall collection.

We will be documenting here, and via the Archive-It blog and other presentations and papers, other representative analytical use cases and corresponding visualizations from our internal work and that of researchers. More coming soon!