Shutdown engine on SEVERE errors

Description

If encounterting SEVERE errors, running jobs should be tried to get stopped and the heritrix engine should be shutdown. An NFS mount that contained heritrix's job directory went down last night. This lead to gazillion of Exceptions of the form:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Oct 06, 2013 5:19:40 PM org.archive.crawler.frontier.BdbWorkQueue peekItem SEVERE: peekItem failure; retrying (in thread 'ToeThread #23: ') com.sleepycat.je.EnvironmentFailureException: (JE 4.1.6) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 4.1.6) /mnt/sammy-data/heritrix_jobs/two-hops-no-requests-html-only/state fetchTarget of 0x112/0x941961 parent IN=62673966 IN class=com.sleepycat.je.tree.BIN lastFullVersion=0x183/0x2a17d parent.getDirty()=true state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed. at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:196) at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1439) at com.sleepycat.je.Database.checkEnv(Database.java:1778) at com.sleepycat.je.Database.openCursor(Database.java:625) at org.archive.crawler.frontier.BdbMultipleWorkQueues.getNextNearestItem(BdbMultipleWorkQueues.java:297) at org.archive.crawler.frontier.BdbMultipleWorkQueues.get(BdbMultipleWorkQueues.java:258) at org.archive.crawler.frontier.BdbWorkQueue.peekItem(BdbWorkQueue.java:103) at org.archive.crawler.frontier.WorkQueue.peek(WorkQueue.java:173) at org.archive.crawler.frontier.WorkQueueFrontier.findEligibleURI(WorkQueueFrontier.java:651) at org.archive.crawler.frontier.AbstractFrontier.next(AbstractFrontier.java:452) at org.archive.crawler.framework.ToeThread.run(ToeThread.java:133) Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 4.1.6) /mnt/sammy-data/heritrix_jobs/two-hops-no-requests-html-only/state fetchTarget of 0x112/0x941961 parent IN=62673966 IN class=com.sleepycat.je.tree.BIN lastFullVersion=0x183/0x2a17d parent.getDirty()=true state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed. at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1332) at com.sleepycat.je.tree.BIN.fetchTarget(BIN.java:1367) at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2499) at com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1545) at com.sleepycat.je.dbi.CursorImpl.getNextWithKeyChangeStatus(CursorImpl.java:1692) at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1617) at com.sleepycat.je.Cursor.retrieveNextAllowPhantoms(Cursor.java:2485) at com.sleepycat.je.Cursor.retrieveNext(Cursor.java:2304) at com.sleepycat.je.Cursor.getNext(Cursor.java:1013) at org.archive.crawler.frontier.BdbMultipleWorkQueues.getNextNearestItem(BdbMultipleWorkQueues.java:313) ... 6 more Caused by: java.io.FileNotFoundException: /mnt/sammy-data/heritrix_jobs/two-hops-no-requests-html-only/state/00000112.jdb (Input/output error) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:118) at com.sleepycat.je.log.FileManager$1.<init>(FileManager.java:995) at com.sleepycat.je.log.FileManager.openFileHandle(FileManager.java:994) at com.sleepycat.je.log.FileManager.getFileHandle(FileManager.java:890) at com.sleepycat.je.log.LogManager.getLogSource(LogManager.java:1074)

The local disk that contains the heritrix_out.log ran full within minutes because of this flood of exceptions. In such non recoverable cases, the engine should be shutdown while trying to save as much state as possible.

Environment

Debian 7.1

Status

Assignee

Unassigned

Reporter

Jean-Pierre Bergamin

Labels

None

Group Assignee

None

ZendeskID

None

Estimated Difficulty

None

Actual Difficulty

None

Affects versions

Heritrix 3.1.1

Priority

Critical
Configure