We're updating the issue view to help you get more done. 

RuntimeException in AMQPUrlReceiver kills StarterRestarter?

Description

Hi,

We're using the AMQPUrlReceiver and in three of four of our crawlers, H3 has stopped picking up messages. In at least two of the three 'hung' crawler jobs, there was an error like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 2015-09-14 17:13:54.154 SEVERE thread-57 org.archive.crawler.frontier.BdbMultipleWorkQueues.put() URI enqueueing failed; OperationStatus.KEYEXIST http://www.google-analytics.com/r/__utm.gif?utmwv=5.6.6&utms=1&utmn=1086570298&utmhn=sedoparking.com&utmcs=UTF-8&utmsr=1024x768&utmvp=1280x985&utmsc=32-bit&utmul=c&utmje=0&utmfl=-&utmdt=firstclassclaims.co.uk%C2%A0-%C2%A0firstclassclaims%20Resources%20and%20Information.&utmhid=1211709328&utmr=http%3A%2F%2Ffirstclassclaims.co.uk%2F&utmp=1021%2F1&utmht=1442234170949&utmac=UA-19309218-3&utmcc=__utma%3D1.549524138.1442234171.1442234171.1442234171.1%3B%2B__utmz%3D1.1442234171.1.1.utmcsr%3Dfirstclassclaims.co.uk%7Cutmccn%3D(referral)%7Cutmcmd%3Dreferral%7Cutmcct%3D%2F%3B&utmjid=2020018054&utmredir=1&utmu=qhCAAAAAAAAAAAAAAAAAAAAE~ java.lang.RuntimeException at org.archive.crawler.frontier.BdbMultipleWorkQueues.put(BdbMultipleWorkQueues.java:349) at org.archive.crawler.frontier.BdbWorkQueue.insertItem(BdbWorkQueue.java:144) at org.archive.crawler.frontier.WorkQueue.insert(WorkQueue.java:404) at org.archive.crawler.frontier.WorkQueue.enqueue(WorkQueue.java:150) at org.archive.crawler.frontier.WorkQueueFrontier.sendToQueue(WorkQueueFrontier.java:393) at org.archive.crawler.frontier.WorkQueueFrontier.processScheduleAlways(WorkQueueFrontier.java:333) at org.archive.crawler.frontier.AbstractFrontier.receive(AbstractFrontier.java:554) at org.archive.crawler.util.SetBasedUriUniqFilter.addForce(SetBasedUriUniqFilter.java:104) at org.archive.crawler.frontier.WorkQueueFrontier.processScheduleIfUnique(WorkQueueFrontier.java:376) at org.archive.crawler.frontier.WorkQueueFrontier.schedule(WorkQueueFrontier.java:356) at org.archive.crawler.frontier.AMQPUrlReceiver$UrlConsumer.handleDelivery(AMQPUrlReceiver.java:236) at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:140) at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:76) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

and then, a little later, an error like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 DefaultExceptionHandler: Consumer org.archive.crawler.frontier.AMQPUrlReceiver$UrlConsumer@7c7c4c4a (amq.ctag-lTrPs85xlBDjbVldRAlCqA) method handleDelivery for channel AMQChannel(amqp://guest@192.168.45.26:5672/,1) threw an exception for channel AMQChannel(amqp://guest@192.168.45.26:5672/,1): com.rabbitmq.client.AlreadyClosedException: clean connection shutdown; reason: Attempt to use closed channel at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:190) at com.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:291) at com.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:285) at com.rabbitmq.client.impl.ChannelN.basicAck(ChannelN.java:907) at org.archive.crawler.frontier.AMQPUrlReceiver$UrlConsumer.handleDelivery(AMQPUrlReceiver.java:253) at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:140) at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:76) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 2015-09-14 17:13:56.286 WARNING thread-31 org.archive.crawler.frontier.AMQPUrlReceiver.onApplicationEvent() failed to pause flow on amqp channel java.io.IOException at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:106) at com.rabbitmq.client.impl.AMQChannel.wrap(AMQChannel.java:102) at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:124) at com.rabbitmq.client.impl.ChannelN.flow(ChannelN.java:1061) at com.rabbitmq.client.impl.ChannelN.flow(ChannelN.java:61) at org.archive.crawler.frontier.AMQPUrlReceiver.onApplicationEvent(AMQPUrlReceiver.java:337) at org.archive.crawler.frontier.AMQPUrlReceiver.onApplicationEvent(AMQPUrlReceiver.java:59) at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:97) at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:303) at org.archive.crawler.framework.CrawlController.sendCrawlStateChangeEvent(CrawlController.java:327) at org.archive.crawler.framework.CrawlController.completePause(CrawlController.java:413) at org.archive.crawler.framework.CrawlController.requestCrawlStart(CrawlController.java:364) at org.archive.crawler.framework.CrawlJob$1.run(CrawlJob.java:430) Caused by: com.rabbitmq.client.ShutdownSignalException: connection error; reason: {#method<connection.close>(reply-code=540, reply-text=NOT_IMPLEMENTED - active=false, class-id=20, method-id=20), null, ""} at com.rabbitmq.utility.ValueOrException.getValue(ValueOrException.java:67) at com.rabbitmq.utility.BlockingValueOrException.uninterruptibleGetValue(BlockingValueOrException.java:33) at com.rabbitmq.client.impl.AMQChannel$BlockingRpcContinuation.getReply(AMQChannel.java:343) at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:216) at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:118) ... 10 more

I believe an uncaught exception in a thread will at least take out that thread, and possibly a whole threadpool. In which case, the error we are seeing will presumably take out the StarterRestarter and prevent any closed RabbitMQ connection from being re-opened.

Firstly, does this sound about right? Secondly, should the BDB error we are seeing ever happen? Thirdly, should we switch to catching RuntimeExceptions in the UrlConsumer.

Thanks,
Andy Jackson

Environment

Linux

Status

Assignee

Unassigned

Reporter

Andrew Jackson

Affects versions

Heritrix 3.3.0

Priority

Major