Other Resources
Archival Web Datasets
UK Web Archive Open Data
The UK Web Archive has released a number of datasets, tools, and APIs related to the .uk web domain crawls. The affiliated BUDDAH (Big UK Domain Data for the Arts and Humanities) project, a collaboration between the British Library, the Institute of Historical Research, University of London, the Oxford Internet Institute and Aarhus University is currently connecting researchers with this data.
Common Crawl
Common Crawl is non-profit that builds and maintains "an open repository of web crawl data that can be accessed and analyzed by anyone."