Error connectiong to https site

Description

Not exactly a heritrix bug per se, but I wanted to track this issue.

We are unable to crawl the url: https://netfiles.uiuc.edu/akachi2/home
We get the following error:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 SSLException@HTTP javax.net.ssl.SSLException: Received fatal alert: bad_record_mac at sun.security.ssl.Alerts.getSSLException(Alerts.java:208) at sun.security.ssl.Alerts.getSSLException(Alerts.java:154) at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:1977) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1093) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1328) at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:702) at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:122) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) at org.archive.io.RecordingOutputStream.flush(RecordingOutputStream.java:389) at org.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(HttpConnection.java:860) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:1984) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1000) at org.archive.httpclient.HttpRecorderGetMethod.execute(HttpRecorderGetMethod.java:116) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:346) at org.archive.crawler.fetcher.FetchHTTP.innerProcess(FetchHTTP.java:497) at org.archive.crawler.framework.Processor.process(Processor.java:112) at org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:302) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:151)

This seems to be more a Java issue than a heritrix issue; I am unable to get the file using the tika-app jar:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ❤ java -jar tika-app-1.2.jar https://netfiles.uiuc.edu/akachi2/home Exception in thread "main" javax.net.ssl.SSLException: Received fatal alert: bad_record_mac at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:234) at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:395) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:412) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:108) Caused by: javax.net.ssl.SSLException: Received fatal alert: bad_record_mac at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190) at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:136) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:1720) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1165) at com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1149) at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434) at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:166) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172) at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2308) at java.net.URLConnection.getContentType(URLConnection.java:485) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getContentType(HttpsURLConnectionImpl.java:382) at org.apache.tika.io.TikaInputStream.get(TikaInputStream.java:380) ... 2 more

I don't know if there is anything to be done here, but I thought I'd report the issue.

Environment

ubuntu quantal, java 6 & 7

Status

Assignee

Unassigned

Reporter

Erik Hetzner

Labels

None

Group Assignee

None

ZendeskID

None

Estimated Difficulty

None

Actual Difficulty

None

Affects versions

Heritrix 1.14.3
Heritrix 3.1.1

Priority

Major
Configure