Open issues

Shutdown engine on SEVERE errors
HER-2054
occasional strange memory leak after crawl finishes
HER-2036
Heritrix in Amazon Cloud
HER-1991
MirrorWriterProcess Example
HER-1968
Fix/rethink default case-flattening canonicalization (LowercaseRule)
HER-1945
heritrix hitting non existent URLs in wix.com/app-market
HER-2096
appCtx.getBean() does no longer work in scripting console
HER-2093
Improve feedback after specifying errornous command line arguments
HER-2091
heritrix is missing facility to shutdown from console
HER-2090
RuntimeException in AMQPUrlReceiver kills StarterRestarter?
HER-2088
HTML extractor fails to extract CSS from a link tag
HER-2086
password
HER-2079
Link Analysis with Apache Giraph (Cluster mode)
HER-2077
Enable configuration of log4j in libraries
HER-2075
Limited Parallelism
HER-2060
Rotate heritrix_out.log
HER-2055
support crawling without any dns resolution (can be useful when crawling through proxy)
HER-2050
do something about w/arc reading code
HER-2049
ftp protocol robots.txt
HER-2047
FetchWhois mishandles certain tlds
HER-2046
Possible deadlock
HER-2045
Controlling download time from a URI
HER-2043
should follow redirects from /robots.txt and respect directives found
HER-2038
Upgrade Guava to latest version
HER-2033
The Heritrix Crawler can proceed with active links only. In Xpath we have to stop till //a, we cant make any changes in these links.
HER-2028
Error connectiong to https site
HER-2026
ARCRecord.computeMetaData() fails due to space in mime-type in record header.
HER-2025
ARCRecord.readHttpHeader() fails if HTTP response lacks empty line between HTTP headers and HTTP body.
HER-2023
dns: scheme ignored when creating SURT
HER-2012
Possible race condition in org.archive.util.FileUtils.ensureWriteableDirectory() leading to data loss
HER-2009
Heritrix3 how incremental crawler?If someone know mail to me. thank you. imnu06@126.com
HER-2008
option to let heritrix decide automatically on number of toe threads based on available heap
HER-2007
way to reenqueue failed url other than in original spot at the head of the queue
HER-2003
Website in .au Heritrix gave SEVERE log message regarding public suffix list
HER-2000
Expand ExtractorHTML to extract html from conditional comments
HER-1998
ToeThread Fatal Exception: "kryo.SerializationException: Buffer limit exceeded" in BdbMultipleWorkQueues.get
HER-1996
CLONE - Force generation of report files
HER-1995
web ui/api unresponsive during PersistLoadProcessor preload
HER-1992
Link with '%' character does not get encoded, leading to 400 - bad request
HER-1988
REST upload crawler-bean.cxml to XML extension rather than CXML
HER-1986
response codes 404, and 500 due to invalid URIs
HER-1982
H3: flawed failed-checkpoint recovery in CheckpointService.requestCrawlCheckpoint may prevent future checkpoints
HER-1980
Bug in shouldProcessRule in WriterPoolProcessor. It doesn't work
HER-1978
Bug in shouldProcessRule in WriterPoolProcessor. It doesn't work
HER-1977
Edge case in parsing WARC header - org.archive.io.warc.WARCRecord.java:parseHeaders()
HER-1975
deprecate shortReportLineTo method in Reporter interface
HER-1971
getDigest on ARCRecordMetadata has undocumented issues.
HER-1970
Job is Finished but cannot terminate - no reports generated
HER-1967
deleteSheet() deletes DecideRuledSheetAssociation but not the surt associations
HER-1966
Hosts visited in crawl report seems to be sum of hosts in hosts report - not sum of hosts with #urls > 0
HER-1961
issue 1 of 528