NullPointerException when getting cookies

Description

The following NPE condition has been observed repeatedly in 2 separate crawls:

2014-07-16T14:56:53.257Z -5 1 http://byssan.is/benelli-ultra-light-28-review/ LLLEL http://byssan.is/author/admin/page/9/ unknown #029 - - http://byssan.is/ err=java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.http.client.protocol.RequestAddCookies.process(RequestAddCookies.java:159)
at org.apache.http.protocol.ImmutableHttpProcessor.process(ImmutableHttpProcessor.java:131)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:193)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.archive.modules.fetcher.FetchHTTPRequest.execute(FetchHTTPRequest.java:632)
at org.archive.modules.fetcher.FetchHTTP.innerProcess(FetchHTTP.java:658)
at org.archive.modules.Processor.innerProcessResult(Processor.java:175)
at org.archive.modules.Processor.process(Processor.java:142)
at org.archive.modules.ProcessorChain.process(ProcessorChain.java:131)
at org.archive.crawler.framework.ToeThread.run(ToeThread.java:148)
java.lang.NullPointerException
at org.apache.http.client.protocol.RequestAddCookies.process(RequestAddCookies.java:159)
at org.apache.http.protocol.ImmutableHttpProcessor.process(ImmutableHttpProcessor.java:131)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:193)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.archive.modules.fetcher.FetchHTTPRequest.execute(FetchHTTPRequest.java:632)
at org.archive.modules.fetcher.FetchHTTP.innerProcess(FetchHTTP.java:658)
at org.archive.modules.Processor.innerProcessResult(Processor.java:175)
at org.archive.modules.Processor.process(Processor.java:142)
at org.archive.modules.ProcessorChain.process(ProcessorChain.java:131)
at org.archive.crawler.framework.ToeThread.run(ToeThread.java:148)

Frequently, most of these stacktraces may be squashed unless the -XX:-OmitStackTraceInFastThrow JVM option is set.

Discussing this on the mailing list Noah commented:

I found a few -5s in some of our currently active crawl jobs. Most of them also have no stack trace. I found one with the same stack trace as yours.
Excerpt from RequestAddCookies.java:

154 final List<Cookie> cookies = new
ArrayList<Cookie>(cookieStore.getCookies());
155 // Find cookies matching the given origin
156 final List<Cookie> matchedCookies = new ArrayList<Cookie>();
157 final Date now = new Date();
158 for (final Cookie cookie : cookies) {
159 if (!cookie.isExpired(now)) {
160 if (cookieSpec.match(cookie, cookieOrigin)) {

So it would appear that one of the cookies in the list returned by
cookieStore.getCookies() is null. It could be a threading issue, where another thread adds or removes a cookie at the same time that line 154 there is in the middle of making its copy. If that is the problem we could probably fix it by synchronizing BdbCookieStore.getCookies() and having it return a copy of the list. It seems wasteful to copy the list when it's about to get copied again, but I'm not sure there's a better way, if that's really the issue.

Environment

None

Status

Assignee

Unassigned

Reporter

Kristinn Sigurðsson

Labels

None

Issue Category

None

Group Assignee

None

ZendeskID

None

Estimated Difficulty

None

Actual Difficulty

None

Fix versions

Priority

Major
Configure