Other Resources

Archival Web Datasets

UK Web Archive Open Data

The UK Web Archive has released a number of datasets, tools, and APIs related to the .uk web domain crawls. The affiliated BUDDAH (Big UK Domain Data for the Arts and Humanities) project, a collaboration between the British Library, the Institute of Historical Research, University of London, the Oxford Internet Institute and Aarhus University is currently connecting researchers with this data.

Common Crawl

Common Crawl is non-profit that builds and maintains "an open repository of web crawl data that can be accessed and analyzed by anyone."