ARCRecord.computeMetaData() fails due to space in mime-type in record header.

Description

The ARC reading code cannot (or at least does not) recover and the exception unwinds the stack, aborting the reading of the ARC file.

In the file FS-359228.arc.gz, there are records which have a header line with a mime-type containing a space, such as:

1 http://www.aki.ku.dk:80/zmuc/ento/Entohome.old 130.225.206.11 19971111190358 Mozilla/2.02Gold (Win95 2885

Notice the mime-type field is "Mozilla/2.02Gold (Win95".

The ARCRecord.computeMetaData() method is passed the list of values, which it assumes adheres to a fixed set of fields. It checks to see if the number of values matches the number of expected fields, and if not it does apply a bit of guessing to see if it can fix things.

It appears that the code was tweaked to handle one particular case where a space appears in the mime-type, but it doesn't handle all cases, the code is:

1 2 3 4 if (keys.size() != values.size()) { // Early ARCs had a space in mimetype. if (values.size() == (keys.size() + 1) && values.get(4).toLowerCase().startsWith("charset=")) {

so it looks like someone added a specific check for a case where a mime-type starting with "charset=" had a space in it.

Environment

None

Status

Assignee

Unassigned

Reporter

Aaron Binns

Labels

None

Group Assignee

None

ZendeskID

None

Estimated Difficulty

None

Actual Difficulty

None

Priority

Major
Configure