Heritrix
  1. Heritrix
  2. HER-1865

JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: Heritrix 3.0.0
    • Fix Version/s: Heritrix 3.1.0-beta
    • Component/s: Release Notes
    • Labels:
      None

      Description

      JDK6u23 fixed a longstanding bug in Snoracle's GZIPInputStream where it would stop reading the concatenation of many GZIP members after the first. Our workaround, GzippedInputStream, awkwardly gave us the ability to continue reading past each member boundary, and in fact find the (compressed) boundary offsets for the benefit of on W/ARC record range indexing - but depended on the old buggy behavior.

      We need a way to get the compressed offsets with the new GZIPInputStream behavior – it's likely to be different but easier. And ideally we need an approach/codebase that works in both pre- and post-JDK6u23 systems without operator intervention, and the other classes (W/ARC reading and random access) to work in either era. (Our systems mostly haven't moved past JDK6u22 yet, but partners have started to, and we may soon.)

        Activity

        Hide
        Erik Hetzner added a comment -

        I should have checked here first, I've spent hours on this one.
        In any case, I will attach a test case.

        Show
        Erik Hetzner added a comment - I should have checked here first, I've spent hours on this one. In any case, I will attach a test case.
        Hide
        Gordon Mohr added a comment -

        I'm working on a replacement for GzippedInputStream that works on both sides of the JDK6u23 behavior change, and gives control on whether you want 'EOF each member' or read-straight-through. It should appear in H3's SVN by the end of the week.. though it also changes the interface with readers a bit, and that would benefit from more real-use testing.

        Show
        Gordon Mohr added a comment - I'm working on a replacement for GzippedInputStream that works on both sides of the JDK6u23 behavior change, and gives control on whether you want 'EOF each member' or read-straight-through. It should appear in H3's SVN by the end of the week.. though it also changes the interface with readers a bit, and that would benefit from more real-use testing.
        Hide
        Erik Hetzner added a comment -

        Hi Gordon. Thanks, I will be glad to test (I am pulling in heritrix commons 3.1.1-SNAPSHOT). In the meantime I am using u22.

        Show
        Erik Hetzner added a comment - Hi Gordon. Thanks, I will be glad to test (I am pulling in heritrix commons 3.1.1-SNAPSHOT). In the meantime I am using u22.
        Hide
        Erik Hetzner added a comment -

        By the way, can you confirm that this should have no effect on W/ARC writing in crawling? I can't see why it would, but we are set to do a system-wide upgrade to u23 in the next weeks, and I want to be sure that crawling will not be effected. Thanks!

        Show
        Erik Hetzner added a comment - By the way, can you confirm that this should have no effect on W/ARC writing in crawling? I can't see why it would, but we are set to do a system-wide upgrade to u23 in the next weeks, and I want to be sure that crawling will not be effected. Thanks!
        Hide
        Noah Levitt added a comment -

        > Hi Gordon. Thanks, I will be glad to test (I am pulling in heritrix commons 3.1.1-SNAPSHOT). In the meantime I am using u22.

        Fyi, it's a little confusing but trunk is now called 3.0.1-SNAPSHOT so you should use that to stay up to date.

        Show
        Noah Levitt added a comment - > Hi Gordon. Thanks, I will be glad to test (I am pulling in heritrix commons 3.1.1-SNAPSHOT). In the meantime I am using u22. Fyi, it's a little confusing but trunk is now called 3.0.1-SNAPSHOT so you should use that to stay up to date.
        Hide
        Erik Hetzner added a comment -
        Show
        Erik Hetzner added a comment - Hi Noah, OK, thanks. This is built off: https://archive-crawler.svn.sourceforge.net/svnroot/archive-crawler/trunk/heritrix3/ right?
        Hide
        Gordon Mohr added a comment -

        As far as I know (and can imagine), this only affects gzip-reading via GZIPInputStream, not writing via GZIPOutputStream... and I don't know of any other problems with JDK6u23, though we haven't exhaustively tested it yet.

        Show
        Gordon Mohr added a comment - As far as I know (and can imagine), this only affects gzip-reading via GZIPInputStream, not writing via GZIPOutputStream... and I don't know of any other problems with JDK6u23, though we haven't exhaustively tested it yet.
        Hide
        Erik Hetzner added a comment -

        Thanks, Gordon. We will test our setup with u23.

        Noah, I found the 3.0.1-SNAPSHOT on builds.archive.org. Thanks.

        Show
        Erik Hetzner added a comment - Thanks, Gordon. We will test our setup with u23. Noah, I found the 3.0.1-SNAPSHOT on builds.archive.org. Thanks.
        Hide
        Gordon Mohr added a comment -

        Initial commit of new more useful workaround class. Commit comment:

        HER-1865 JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling

        • GZIPMembersInputStream
          new workaround class that offers choice of pre/post JDK6u23 behavior and new accessors for finding member boundary offsets
        • GZIPMembersInputStreamTest
          test for above
        • ArchiveUtils, ARCWriter
          move static compress utilities to ArchiveUtils
        • ArchiveReader, (W)ARCReaderFactory, (W)ARCWriterTest
          adapt to use GZIPMembersInputStream
        • GzippedInputStream(Test)
          deleted

        Needs more 6u23-and-after testing.

        Show
        Gordon Mohr added a comment - Initial commit of new more useful workaround class. Commit comment: HER-1865 JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling GZIPMembersInputStream new workaround class that offers choice of pre/post JDK6u23 behavior and new accessors for finding member boundary offsets GZIPMembersInputStreamTest test for above ArchiveUtils, ARCWriter move static compress utilities to ArchiveUtils ArchiveReader, (W)ARCReaderFactory, (W)ARCWriterTest adapt to use GZIPMembersInputStream GzippedInputStream(Test) deleted Needs more 6u23-and-after testing.
        Hide
        Erik Hetzner added a comment -

        I am seeing the following error using the a build of heritrix-commons 3.0.1-SNAPSHOT (svn ):

        java.lang.RuntimeException: After retry (Offset 200295).

        Using a build against (before the gzip fix) does not seem to have this problem.

        java version "1.6.0_14"
        Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
        Java HotSpot(TM) Server VM (build 14.0-b16, mixed mode)

        I can try to provide more information if it is not clear what the problem is.

        Thanks!

        Show
        Erik Hetzner added a comment - I am seeing the following error using the a build of heritrix-commons 3.0.1-SNAPSHOT (svn ): java.lang.RuntimeException: After retry (Offset 200295). Using a build against (before the gzip fix) does not seem to have this problem. java version "1.6.0_14" Java(TM) SE Runtime Environment (build 1.6.0_14-b08) Java HotSpot(TM) Server VM (build 14.0-b16, mixed mode) I can try to provide more information if it is not clear what the problem is. Thanks!
        Hide
        Gordon Mohr added a comment -

        A full stack would be helpful – is it triggered by your test code or some other action? On an ARC or a WARC? Are all ARCs/WARCs affected, or just one in particular?

        Show
        Gordon Mohr added a comment - A full stack would be helpful – is it triggered by your test code or some other action? On an ARC or a WARC? Are all ARCs/WARCs affected, or just one in particular?
        Hide
        Erik Hetzner added a comment -

        I see this with ARC files, but not WARCs.

        Exception in thread "main" java.io.IOException: Resetting to invalid mark
        	at java.io.BufferedInputStream.reset(BufferedInputStream.java:416)
        	at com.google.common.io.CountingInputStream.reset(CountingInputStream.java:87)
        	at org.archive.io.GZIPMembersInputStream.compressedSeek(GZIPMembersInputStream.java:142)
        	at org.archive.io.arc.ARCReaderFactory$CompressedARCReader.<init>(ARCReaderFactory.java:352)
        	at org.archive.io.arc.ARCReaderFactory.getArchiveReader(ARCReaderFactory.java:120)
        	at org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:105)
        	at org.archive.io.ArchiveReaderFactory.getArchiveReader(ArchiveReaderFactory.java:128)
        	at org.archive.io.ArchiveReaderFactory.getArchiveReader(ArchiveReaderFactory.java:110)
        	at org.archive.io.ArchiveReaderFactory.get(ArchiveReaderFactory.java:105)
        	at org.cdlib.was.ngIndexer.Utility$.eachArc(Utility.scala:53)
        	at org.cdlib.was.ngIndexer.Test$.main(Test.scala:22)
        	at org.cdlib.was.ngIndexer.Test.main(Test.scala)
        

        I can try to track down more if you like. I don't think I am doing much exotic here, but it is possible that I am doing something wrong.

        This is with heritrix-commons-3.0.1-20110219.030027-84.jar,

        java version "1.6.0_24"
        Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
        Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

        Linux XXX.cdlib.org 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:44 UTC 2011 x86_64 GNU/Linux

        Thanks, Gordon!

        Show
        Erik Hetzner added a comment - I see this with ARC files, but not WARCs. Exception in thread "main" java.io.IOException: Resetting to invalid mark at java.io.BufferedInputStream.reset(BufferedInputStream.java:416) at com.google.common.io.CountingInputStream.reset(CountingInputStream.java:87) at org.archive.io.GZIPMembersInputStream.compressedSeek(GZIPMembersInputStream.java:142) at org.archive.io.arc.ARCReaderFactory$CompressedARCReader.<init>(ARCReaderFactory.java:352) at org.archive.io.arc.ARCReaderFactory.getArchiveReader(ARCReaderFactory.java:120) at org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:105) at org.archive.io.ArchiveReaderFactory.getArchiveReader(ArchiveReaderFactory.java:128) at org.archive.io.ArchiveReaderFactory.getArchiveReader(ArchiveReaderFactory.java:110) at org.archive.io.ArchiveReaderFactory.get(ArchiveReaderFactory.java:105) at org.cdlib.was.ngIndexer.Utility$.eachArc(Utility.scala:53) at org.cdlib.was.ngIndexer.Test$.main(Test.scala:22) at org.cdlib.was.ngIndexer.Test.main(Test.scala) I can try to track down more if you like. I don't think I am doing much exotic here, but it is possible that I am doing something wrong. This is with heritrix-commons-3.0.1-20110219.030027-84.jar, java version "1.6.0_24" Java(TM) SE Runtime Environment (build 1.6.0_24-b07) Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode) Linux XXX.cdlib.org 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:44 UTC 2011 x86_64 GNU/Linux Thanks, Gordon!
        Hide
        Gordon Mohr added a comment -

        It seems the path through the generic ArchiveReaderFactory had a problem – an obsolete nonsensical -1 offset, once having another meaning. There wasn't a test to cover that case. There is now and a more sensible 0 default offset is used. Please let me know if this resolves for you.

        Commit comments:

        HER-1865 JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling

        • ArchiveReaderFactory
          discard obsolete -1 flag value creating problems with usual uses

        HER-1865 JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling

        • ArchiveReaderFactoryTest
          test cases for generic reader-get methods – one of which had obsolete STREAM_ALL=-1 problem
        Show
        Gordon Mohr added a comment - It seems the path through the generic ArchiveReaderFactory had a problem – an obsolete nonsensical -1 offset, once having another meaning. There wasn't a test to cover that case. There is now and a more sensible 0 default offset is used. Please let me know if this resolves for you. Commit comments: HER-1865 JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling ArchiveReaderFactory discard obsolete -1 flag value creating problems with usual uses HER-1865 JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling ArchiveReaderFactoryTest test cases for generic reader-get methods – one of which had obsolete STREAM_ALL=-1 problem
        Hide
        Erik Hetzner added a comment -

        Hi Gordon,

        A first test with a new heritrix-commons build seems to have the same error.

        I can look into this more tomorrow. I can also create a more minimal test case for you.

        Thanks for looking at this!

        Show
        Erik Hetzner added a comment - Hi Gordon, A first test with a new heritrix-commons build seems to have the same error. I can look into this more tomorrow. I can also create a more minimal test case for you. Thanks for looking at this!
        Hide
        Erik Hetzner added a comment -

        Confirmed with the test class I have posted above. Here is the stack trace. The only changes are the line numbers of ArchiveReader.

        java.io.IOException: Resetting to invalid mark
        at java.io.BufferedInputStream.reset(BufferedInputStream.java:416)
        at com.google.common.io.CountingInputStream.reset(CountingInputStream.java:87)
        at org.archive.io.GZIPMembersInputStream.compressedSeek(GZIPMembersInputStream.java:142)
        at org.archive.io.arc.ARCReaderFactory$CompressedARCReader.<init>(ARCReaderFactory.java:352)
        at org.archive.io.arc.ARCReaderFactory.getArchiveReader(ARCReaderFactory.java:120)
        at org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:105)
        at org.archive.io.ArchiveReaderFactory.getArchiveReader(ArchiveReaderFactory.java:123)
        at org.archive.io.ArchiveReaderFactory.getArchiveReader(ArchiveReaderFactory.java:105)
        at org.archive.io.ArchiveReaderFactory.get(ArchiveReaderFactory.java:100)
        at TestJava6.main(TestJava6.java:11)

        Show
        Erik Hetzner added a comment - Confirmed with the test class I have posted above. Here is the stack trace. The only changes are the line numbers of ArchiveReader. java.io.IOException: Resetting to invalid mark at java.io.BufferedInputStream.reset(BufferedInputStream.java:416) at com.google.common.io.CountingInputStream.reset(CountingInputStream.java:87) at org.archive.io.GZIPMembersInputStream.compressedSeek(GZIPMembersInputStream.java:142) at org.archive.io.arc.ARCReaderFactory$CompressedARCReader.<init>(ARCReaderFactory.java:352) at org.archive.io.arc.ARCReaderFactory.getArchiveReader(ARCReaderFactory.java:120) at org.archive.io.arc.ARCReaderFactory.get(ARCReaderFactory.java:105) at org.archive.io.ArchiveReaderFactory.getArchiveReader(ArchiveReaderFactory.java:123) at org.archive.io.ArchiveReaderFactory.getArchiveReader(ArchiveReaderFactory.java:105) at org.archive.io.ArchiveReaderFactory.get(ArchiveReaderFactory.java:100) at TestJava6.main(TestJava6.java:11)
        Hide
        Gordon Mohr added a comment -

        OK, I'm able to reproduce a similar but not identical exception here on a large, diverse ARC.

        There was another similar stack that could be triggered by the (new) ArchiveReaderFactoryTest.testGetPath test, even on tiny synthetic ARCs, but that was fixed by yesterday's commits.

        If you can share your test ARC (via email or an URL to download – feel free to send me directly) I can be sure to cover all cases.

        Show
        Gordon Mohr added a comment - OK, I'm able to reproduce a similar but not identical exception here on a large, diverse ARC. There was another similar stack that could be triggered by the (new) ArchiveReaderFactoryTest.testGetPath test, even on tiny synthetic ARCs, but that was fixed by yesterday's commits. If you can share your test ARC (via email or an URL to download – feel free to send me directly) I can be sure to cover all cases.
        Hide
        Gordon Mohr added a comment -

        Here's a nutshell version of the whole sordid story:

        One, I was overconfident in our preexisting and new unit tests for this work; there were failures even in pre-JDK6u23 cases of reading real ARCs/WARCs. I believe those are now all fixed, but will test more on a wider variety of real ARCs/WARCs before sounding the all-clear.

        Two, I had thought the tests were passing in JDK6u24 – but in fact an older JDK was being used. Testing the right JDK revealed...

        Three, there's a deeper problem with JDK6u23-JDK6u24 - the GZIPInputStream no longer handles GZIP members with optional 'extra fields' correctly. (Traditionally Alexa used on particular extra field to mark their ARCs, and we've continued that practice. More recently some are using an extra field to hint how to do a long-skip over the current member.) The JDK6u23-24 bug looks like a bit of sloppy editing by someone fixing the prior bug; as a result I expect very little 'natural' GZIP data with extra fields can be read with the JDK6u23-34 GZIPInputStream (and maliciously-crafted GZIP data could decompress to totally different data in Java compared to standard GUNZIP!). I've reported the issue to Oracle; a bug record may appear here soon: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7022417 . Notably, this bug does not appear in the OpenJDK7/JDK7 preview release.

        Four, the packaging-limitations/private-modifiers/package-protected modifiers in GZIPInputStream and InflaterInputStream make many easy tactics for patching around the bug when in the affected JDKs 6u23-24-?? difficult. I think that pulling both those classes from OpenJDK7 into a local package will serve as a workaround, and if those classes (changed only in their name, home package, and imports) remain GPL-with-classpath-exception, bundling them in our distribution should be OK. Thus we'd have consistent, as-designed-for-6u23-and-later behavior no matter what the underlying JRE/JDK.

        Five, this JDK7-behavior avoids all of the so-far reproduced exceptions, but seems to sometimes skip a member-boundary, and thus a full record, when iterating through with the old ArchiveReader code. So that's still an issue I need to investigate and address.

        Show
        Gordon Mohr added a comment - Here's a nutshell version of the whole sordid story: One, I was overconfident in our preexisting and new unit tests for this work; there were failures even in pre-JDK6u23 cases of reading real ARCs/WARCs. I believe those are now all fixed, but will test more on a wider variety of real ARCs/WARCs before sounding the all-clear. Two, I had thought the tests were passing in JDK6u24 – but in fact an older JDK was being used. Testing the right JDK revealed... Three, there's a deeper problem with JDK6u23-JDK6u24 - the GZIPInputStream no longer handles GZIP members with optional 'extra fields' correctly. (Traditionally Alexa used on particular extra field to mark their ARCs, and we've continued that practice. More recently some are using an extra field to hint how to do a long-skip over the current member.) The JDK6u23-24 bug looks like a bit of sloppy editing by someone fixing the prior bug; as a result I expect very little 'natural' GZIP data with extra fields can be read with the JDK6u23-34 GZIPInputStream (and maliciously-crafted GZIP data could decompress to totally different data in Java compared to standard GUNZIP!). I've reported the issue to Oracle; a bug record may appear here soon: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7022417 . Notably, this bug does not appear in the OpenJDK7/JDK7 preview release. Four, the packaging-limitations/private-modifiers/package-protected modifiers in GZIPInputStream and InflaterInputStream make many easy tactics for patching around the bug when in the affected JDKs 6u23-24-?? difficult. I think that pulling both those classes from OpenJDK7 into a local package will serve as a workaround, and if those classes (changed only in their name, home package, and imports) remain GPL-with-classpath-exception, bundling them in our distribution should be OK. Thus we'd have consistent, as-designed-for-6u23-and-later behavior no matter what the underlying JRE/JDK. Five, this JDK7-behavior avoids all of the so-far reproduced exceptions, but seems to sometimes skip a member-boundary, and thus a full record, when iterating through with the old ArchiveReader code. So that's still an issue I need to investigate and address.
        Hide
        Erik Hetzner added a comment -

        Thanks for all of this, Gordon. We will stick to 6u22 for the time being, but I am happy to test any fixes.

        Show
        Erik Hetzner added a comment - Thanks for all of this, Gordon. We will stick to 6u22 for the time being, but I am happy to test any fixes.
        Hide
        Gordon Mohr added a comment -

        Pulling in the OpenJDK7 classes provided a consistent base that matches the Java future; the custom GZIPMembersInputStream can simulate the old EOF-behavior and offer reliable pointed to member-offsets. But, the structure of the readTrailer/readHeader around member boundaries means you only achieve total certainty that one member has finished (and passed its gzip checksum) after committing to reading at least a byte from the next member. Accepting that limitation prevented the skips I'd seen. Commit comment:

        HER-1865 JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling

        • OpenJDK7GZIPInputStream.java, OpenJDK7InflatedInputStream.java
          backport the working GZIP implementation from OpenJDK7, with minimal
          renames/imports/private-to-protected changes to enable member-at-a-time reading
        • GZIPMembersInputStream
          base on OpenJDK7 implementation to avoid pre- & post-6u23 codepaths, and 6u23/24 readHeader bug
          override readTrailer to set member-end; update comment to reflect end-uncertainty without EOF-per-member-mode
        • GZIPMembersInputStreamTest, ARCWriterTest
          test tweaks

        This works on the CDL example ARC, and my other tests, but could still benefit from testing on a wider range of W/ARCs, for both lack-of-error-output and agreement with previous indexing-scans.

        Show
        Gordon Mohr added a comment - Pulling in the OpenJDK7 classes provided a consistent base that matches the Java future; the custom GZIPMembersInputStream can simulate the old EOF-behavior and offer reliable pointed to member-offsets. But, the structure of the readTrailer/readHeader around member boundaries means you only achieve total certainty that one member has finished (and passed its gzip checksum) after committing to reading at least a byte from the next member. Accepting that limitation prevented the skips I'd seen. Commit comment: HER-1865 JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling OpenJDK7GZIPInputStream.java, OpenJDK7InflatedInputStream.java backport the working GZIP implementation from OpenJDK7, with minimal renames/imports/private-to-protected changes to enable member-at-a-time reading GZIPMembersInputStream base on OpenJDK7 implementation to avoid pre- & post-6u23 codepaths, and 6u23/24 readHeader bug override readTrailer to set member-end; update comment to reflect end-uncertainty without EOF-per-member-mode GZIPMembersInputStreamTest, ARCWriterTest test tweaks This works on the CDL example ARC, and my other tests, but could still benefit from testing on a wider range of W/ARCs, for both lack-of-error-output and agreement with previous indexing-scans.
        Hide
        Erik Hetzner added a comment -

        Thanks, Gordon. I am testing with your fixes, but am encountering other issues with my code. I will let you know if I encounter any more problems.

        Show
        Erik Hetzner added a comment - Thanks, Gordon. I am testing with your fixes, but am encountering other issues with my code. I will let you know if I encounter any more problems.
        Hide
        Erik Hetzner added a comment -

        See HER-1878.

        Show
        Erik Hetzner added a comment - See HER-1878 .
        Hide
        Gordon Mohr added a comment -

        With HER-1878 (hopefully) fixed, as well as HER-1881 (which showed up with the new code), and this code having received some more use in Wayback indexing, I'm going to consider this fixed, and let future problems (if any) get new issue numbers.

        Show
        Gordon Mohr added a comment - With HER-1878 (hopefully) fixed, as well as HER-1881 (which showed up with the new code), and this code having received some more use in Wayback indexing, I'm going to consider this fixed, and let future problems (if any) get new issue numbers.
        Hide
        Søren Vejrup Carlsen added a comment -

        Would it be difficult to implement this fix in the H1 branch as well?

        Show
        Søren Vejrup Carlsen added a comment - Would it be difficult to implement this fix in the H1 branch as well?
        Hide
        Will Johnson added a comment -

        Is there any fix (or suggestions for an approach) possible for 2.0.2? Looking at the code it seems like the design of the ArchiveReaderFactory relies on the broken up streams that are no longer provided per the spec. Also, it seems like copying the OpenJDK code will cause Heretrix (and therefore my code) to be GPL since you're making a modified work and not linking.

        Show
        Will Johnson added a comment - Is there any fix (or suggestions for an approach) possible for 2.0.2? Looking at the code it seems like the design of the ArchiveReaderFactory relies on the broken up streams that are no longer provided per the spec. Also, it seems like copying the OpenJDK code will cause Heretrix (and therefore my code) to be GPL since you're making a modified work and not linking.
        Hide
        Gordon Mohr added a comment -

        Regarding use in prior versions: the bug only affects reading, so you could move to the H3 codebase for (W)ARC-reading while leaving crawling on whatever version is convenient. (We would also welcome a contributed backport.)

        Regarding 2.0.2 specifically: I'd highly recommend moving to H3. There have been many fixes and improvements, and no further H2.0.x releases are expected.

        Regarding OpenJDK reuse: No code reuse can cause any other code's license to change automatically. Only the author(s) make the choice of license. Improper reuse/relicensing could open a project to allegations that they do not have permission to reuse the GPL code in a particular fashion, which might then have to be cured (and relicensing to GPL is sometimes but not always a possible cure).

        I am not a lawyer, but I believe our reuse is in accordance with the Oracle and related affiliate's copyrights, and the GPL-with-classpath-exception licensing of OpenJDK code. The two changed classes, OpenJDK7InflaterInputStream (from InflaterInputStream) and OpenJDK7GZIPInputStream (from GZIPInputStream) remain code licensed under the GPL with the classpath exception. Other code only refers/links to that code, in the exact same manner as the OpenJDK versions would be linked (when running on OpenJDK). If Oracle's/OpenJDK's lawyers have an alternative interpretation, we would adjust our use. (For example, if necessary we could put those 2 classes into their own more-clearly-distinct GPL-with-classpath-licensed package/library.) But the whole issue will likely become moot when the JDK6 bugs are fixed or JDK7 use becomes the norm, and this backport-plus-hackery is no longer a necessary workaround.

        Show
        Gordon Mohr added a comment - Regarding use in prior versions: the bug only affects reading, so you could move to the H3 codebase for (W)ARC-reading while leaving crawling on whatever version is convenient. (We would also welcome a contributed backport.) Regarding 2.0.2 specifically: I'd highly recommend moving to H3. There have been many fixes and improvements, and no further H2.0.x releases are expected. Regarding OpenJDK reuse: No code reuse can cause any other code's license to change automatically. Only the author(s) make the choice of license. Improper reuse/relicensing could open a project to allegations that they do not have permission to reuse the GPL code in a particular fashion, which might then have to be cured (and relicensing to GPL is sometimes but not always a possible cure). I am not a lawyer, but I believe our reuse is in accordance with the Oracle and related affiliate's copyrights, and the GPL-with-classpath-exception licensing of OpenJDK code. The two changed classes, OpenJDK7InflaterInputStream (from InflaterInputStream) and OpenJDK7GZIPInputStream (from GZIPInputStream) remain code licensed under the GPL with the classpath exception. Other code only refers/links to that code, in the exact same manner as the OpenJDK versions would be linked (when running on OpenJDK). If Oracle's/OpenJDK's lawyers have an alternative interpretation, we would adjust our use. (For example, if necessary we could put those 2 classes into their own more-clearly-distinct GPL-with-classpath-licensed package/library.) But the whole issue will likely become moot when the JDK6 bugs are fixed or JDK7 use becomes the norm, and this backport-plus-hackery is no longer a necessary workaround.

          People

          • Assignee:
            Gordon Mohr
            Reporter:
            Gordon Mohr
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: