ARCRecord.computeMetaData() fails due to space in mime-type in record header.

Description

The ARC reading code cannot (or at least does not) recover and the exception unwinds the stack, aborting the reading of the ARC file.

In the file FS-359228.arc.gz, there are records which have a header line with a mime-type containing a space, such as:

Notice the mime-type field is "Mozilla/2.02Gold (Win95".

The ARCRecord.computeMetaData() method is passed the list of values, which it assumes adheres to a fixed set of fields. It checks to see if the number of values matches the number of expected fields, and if not it does apply a bit of guessing to see if it can fix things.

It appears that the code was tweaked to handle one particular case where a space appears in the mime-type, but it doesn't handle all cases, the code is:

so it looks like someone added a specific check for a case where a mime-type starting with "charset=" had a space in it.

Environment

None

Assignee

Unassigned

Reporter

Aaron Binns

Labels

None

Issue Category

None

Group Assignee

None

ZendeskID

None

Estimated Difficulty

None

Actual Difficulty

None

Priority

Major
Configure