RuntimeException in AMQPUrlReceiver kills StarterRestarter?

Description

Hi,

We're using the AMQPUrlReceiver and in three of four of our crawlers, H3 has stopped picking up messages. In at least two of the three 'hung' crawler jobs, there was an error like this:

and then, a little later, an error like this:

I believe an uncaught exception in a thread will at least take out that thread, and possibly a whole threadpool. In which case, the error we are seeing will presumably take out the StarterRestarter and prevent any closed RabbitMQ connection from being re-opened.

Firstly, does this sound about right? Secondly, should the BDB error we are seeing ever happen? Thirdly, should we switch to catching RuntimeExceptions in the UrlConsumer.

Thanks,
Andy Jackson

Environment

Linux

Activity

Show:
Andrew Jackson
September 17, 2015, 2:52 PM

Sorry - didn't mean to mark this a Major. Feel free to downgrade it.

Andrew Jackson
September 21, 2015, 11:10 AM

See here for our changes to AMQPUrlReceiver to make this more robust: https://github.com/internetarchive/heritrix3/pull/128

Noah Levitt
September 23, 2015, 7:40 PM

The first exception doesn't look like it would have killed any threads. It's not only not uncaught, it's not even thrown:

BdbMultipleWorkQueues.java:349

It looks like the second exception is the real issue here, and that seems to be what you've addressed in your pull request.

Assignee

Unassigned

Reporter

Andrew Jackson

Labels

None

Issue Category

None

Group Assignee

None

ZendeskID

None

Estimated Difficulty

None

Actual Difficulty

None

Affects versions

Priority

Major
Configure