Hi,
We're using the AMQPUrlReceiver and in three of four of our crawlers, H3 has stopped picking up messages. In at least two of the three 'hung' crawler jobs, there was an error like this:
and then, a little later, an error like this:
I believe an uncaught exception in a thread will at least take out that thread, and possibly a whole threadpool. In which case, the error we are seeing will presumably take out the StarterRestarter and prevent any closed RabbitMQ connection from being re-opened.
Firstly, does this sound about right? Secondly, should the BDB error we are seeing ever happen? Thirdly, should we switch to catching RuntimeExceptions in the UrlConsumer.
Thanks,
Andy Jackson
Linux
Sorry - didn't mean to mark this a Major. Feel free to downgrade it.
See here for our changes to AMQPUrlReceiver to make this more robust: https://github.com/internetarchive/heritrix3/pull/128
The first exception doesn't look like it would have killed any threads. It's not only not uncaught, it's not even thrown:
BdbMultipleWorkQueues.java:349
It looks like the second exception is the real issue here, and that seems to be what you've addressed in your pull request.