Uploaded image for project: 'Heritrix'
  1. Heritrix
  2. HER-1865

JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Heritrix 3.0.0
    • Fix Version/s: Heritrix 3.1.0-beta
    • Component/s: Release Notes
    • Labels:
      None

      Description

      JDK6u23 fixed a longstanding bug in Snoracle's GZIPInputStream where it would stop reading the concatenation of many GZIP members after the first. Our workaround, GzippedInputStream, awkwardly gave us the ability to continue reading past each member boundary, and in fact find the (compressed) boundary offsets for the benefit of on W/ARC record range indexing - but depended on the old buggy behavior.

      We need a way to get the compressed offsets with the new GZIPInputStream behavior – it's likely to be different but easier. And ideally we need an approach/codebase that works in both pre- and post-JDK6u23 systems without operator intervention, and the other classes (W/ARC reading and random access) to work in either era. (Our systems mostly haven't moved past JDK6u22 yet, but partners have started to, and we may soon.)

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                gojomo Gordon Mohr
                Reporter:
                gojomo Gordon Mohr
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Zendesk