When dealing with wat files from a mix of ARC and WARC, the capture URL and capture date must be handled differently:
Also the date formats are different. The Pig script has to deal with these differences, PITA.
However, if you look at the WARC records in the WAT file, the WARC record's header fields contain the URL and date in the same form, for both ARC and WARC-based wats. For example:
In both cases, the WARC record headers have the "target" url and the capture date, and the date is in WARC form.
It would be nice if our ArchiveJSONViewLoader() could provide access to these, so that a Pig script-writer didn't have to deal with the unimportant differences between the ARC and WARC headers inside the JSON block.