Issues

ToeThread Fatal Exception: "kryo.SerializationException: Buffer limit exceeded" in BdbMultipleWorkQueues.get
HER-1996
heritrix hitting non existent URLs in wix.com/app-market
HER-2096
Heritrix ignores robots.txt
HER-2092
appCtx.getBean() does no longer work in scripting console
HER-2093
heritrix is missing facility to shutdown from console
HER-2090
Improve feedback after specifying errornous command line arguments
HER-2091
RuntimeException in AMQPUrlReceiver kills StarterRestarter?
HER-2088
HTML extractor fails to extract CSS from a link tag
HER-2086
duplicate user agent records in robots.txt cause overwriting of rules
HER-2083
password
HER-2079
Link Analysis with Apache Giraph (Cluster mode)
HER-2077
url alone not sufficient to identify unique unit of web content, should be something like canonicalize(url+headers)
HER-1665
Enable configuration of log4j in libraries
HER-2075
Using 'sun.security.tools.KeyTool' restricts to Oracle-based JVM's.
HER-2072
Add option to prefer the non-DNS resolves
HER-2069
[Optionally?] accelerated transition to terminated state after STOP issued
HER-2067
Identify programs with minimal Closed Captioning
HER-2061
H3: manifest of all files (esp. W/ARCs) from a job, access to W/ARCs, ability to delete/clear
HER-1778
WorkQueueFrontier - add log of queue lifecycle
HER-935
improved completion time estimates (queue/total)
HER-1013
H3: improve crawler capacity/state reporting for participation in pool of crawling machines
HER-1779
console rates off after checkpoint-resume
HER-1818
H3: offer web operations to delete job/dir/files (cleaning up local machine/crawler state)
HER-1777
crawl-manifest.txt not produced by H3; update and improve manifest functionality
HER-1735
evaluate H3 in context of IPv6
HER-1887
1-25 of 528