occasional strange memory leak after crawl finishes

Description

Some archive-it instances of heritrix have been running into OutOfMemoryError. See

Inspecting the heap dump with Eclipse Memory Analyzer in each case reveals old BDB environments hanging around and eating up tons of heap. For instance:

Class Name

Shallow Heap

Retained Heap

com.sleepycat.je.dbi.EnvironmentImpl @ 0xedcc6a98

392

4,285,176

com.sleepycat.je.dbi.EnvironmentImpl @ 0xe064c1d8

392

238,190,072

:: envHome java.io.File @ 0xe064c478

40

40

:::: java.lang.String @ 0xe064c878

32

144

:::::: value char[43] @ 0xe064c890 /1/ait-h3-jobs/2820-20130330030145721/state

112

112

com.sleepycat.je.dbi.EnvironmentImpl @ 0xd9a1f798

392

121,974,272

com.sleepycat.je.dbi.EnvironmentImpl @ 0xd9414350

392

4,543,640

com.sleepycat.je.dbi.EnvironmentImpl @ 0xcb7c9308

392

238,433,608

:: envHome java.io.File @ 0xcb7c93f8

40

40

:::: path java.lang.String @ 0xcb7c9410

32

144

:::::: value char[43] @ 0xcb7c9428 /1/ait-h3-jobs/2203-20130311041352998/state

112

112

Total: 5 entries

In each case what's holding on to EnvironmentImpl is the BdbFrontier via CrawlController via ToePool via AlertThreadGroup. E.g.

Class Name

Shallow Heap

Retained Heap

com.sleepycat.je.dbi.EnvironmentImpl @ 0xcb7c9308

392

238,433,608

:: envImpl com.sleepycat.je.txn.BasicLocker @ 0xcb7c9fa0

152

864

:::: handleLocker com.sleepycat.je.Database @ 0xcb7c9f20

96

1,056

:::::: queueDb org.archive.bdb.StoredQueue @ 0xcb85c788

48

1,704

:::::::: retiredQueues org.archive.crawler.frontier.BdbFrontier @ 0xcb7db328

352

50,896

:::::::::: frontier org.archive.crawler.framework.CrawlController @ 0xcb7dbbc0

176

1,493,304

:::::::::::: controller org.archive.crawler.framework.ToePool @ 0xcb7eac70

88

1,494,440

:::::::::::::: [0] java.lang.ThreadGroup[4] @ 0xcb860508

56

56

:::::::::::::::: groups org.archive.crawler.reporting.AlertThreadGroup @ 0xcb7db258

88

520

:::::::::::::::::: [25] java.lang.ThreadGroup[128] @ 0xd99c1778

1,048

13,344

:::::::::::::::::::: groups java.lang.ThreadGroup @ 0xcb035118

72

13,632

None of these ToePools have any ToeThreads left.

Class Name

Shallow Heap

Retained Heap

org.archive.crawler.framework.ToePool @ 0xcb7eac70

88

1,494,440

:: <class> class org.archive.crawler.framework.ToePool @ 0xcb2bdfe8

8

8

:: name java.lang.String @ 0xcb2c3ce0

32

72

:: parent org.archive.crawler.reporting.AlertThreadGroup @ 0xcb7db258

88

520

:: controller org.archive.crawler.framework.CrawlController @ 0xcb7dbbc0

176

1,493,304

:: threads java.lang.Thread[128] @ 0xcb7eacb0

1,048

1,048

:::: <class> class java.lang.Thread[] @ 0xcb0564f8

0

0

:: Total: 5 entries

When each ToeThread exits, it calls ThreadGroup.remove() on its parent ThreadGroup, the ToePool. When the last one of these happens and the ToePool discovers that it's empty, it calls ThreadGroup.remove() on its parent, the AlertThreadGroup. All of this happens automatically as part of normal java thread processing.

So I'm not sure how we end up with a situation where AlertThreadGroup holds on to a reference to an empty ToePool. Maybe a java bug?

But an easy way to remedy the big problem is to have ToePool discard its reference to CrawlController at crawl finish. That should let the big stuff be gc'd. (A handful of stray empty ToePools and AlertThreadGroups is not ideal, but very unlikely to cause real problems like OOME.)

Environment

None

Status

Assignee

Unassigned

Reporter

Noah Levitt

Labels

None

Group Assignee

None

ZendeskID

None

Estimated Difficulty

None

Actual Difficulty

None

Affects versions

Heritrix 3.1.1

Priority

Critical
Configure