| | files uploaded to action directory with http have .bin extension added, causing heritrix to ignore them | | | | | Fixed | Jun 24, 2011 | Aug 29, 2019 | | |
| | Ambiguity between srcset urls and data:image base64 encoded image | | | | | Fixed | Mar 8, 2019 | Apr 24, 2019 | | |
| | Add support for extracting URLs from img srcset attribute | | | | | Fixed | Oct 20, 2016 | Mar 8, 2019 | | |
| | ToeThread Fatal Exception: "kryo.SerializationException: Buffer limit exceeded" in BdbMultipleWorkQueues.get | | | | | Unresolved | Mar 7, 2012 | Jan 24, 2019 | | |
| | why do we write the header "WARC-Truncated: length" in warc revisit records? | | | | | Cool Story, Bro | Nov 4, 2009 | Sep 18, 2018 | | |
| | NPE in BdbMultipleWorkQueues.delete() -- queue stuck? | | | | | Incomplete | Feb 16, 2007 | Sep 17, 2018 | | |
| | heritrix hitting non existent URLs in wix.com/app-market | | | | | Unresolved | Aug 31, 2017 | Aug 31, 2017 | | |
| | Webserver response 307 to 302 causes infinite redirect | | | | | Obsolete | Oct 28, 2008 | Apr 5, 2017 | Nov 4, 2008 | |
| | Crawl M3U8 files and capture resources they describe | | | | | Not a Bug | Oct 28, 2016 | Oct 28, 2016 | | |
| | Heritrix ignores robots.txt | | | | | Unresolved | Jun 2, 2016 | Jun 7, 2016 | | |
| | appCtx.getBean() does no longer work in scripting console | | | | | Unresolved | Jun 2, 2016 | Jun 7, 2016 | | |
| | Are URLs including 'Japanese Full Space' supported? | | | | | Fixed | Oct 28, 2015 | May 26, 2016 | | |
| | java 8 keytool issue | | | | | Fixed | Aug 13, 2015 | May 6, 2016 | | |
| | CoderMalfunctionError: java.nio.BufferOverflowException | | | | | Obsolete | Feb 16, 2007 | Mar 29, 2016 | | |
| | Allow submission of non-login GET forms | | | | | Obsolete | Dec 9, 2014 | Mar 29, 2016 | Dec 25, 2014 | |
| | heritrix is missing facility to shutdown from console | | | | | Unresolved | Feb 4, 2016 | Feb 8, 2016 | | |
| | checkpointing gives error on Windows | | | | | Fixed | Jun 23, 2011 | Feb 5, 2016 | | |
| | Improve feedback after specifying errornous command line arguments | | | | | Unresolved | Feb 4, 2016 | Feb 4, 2016 | | |
| | HostsReport issues | | | | | Fixed | Jul 13, 2015 | Jan 15, 2016 | | |
| | RuntimeException in AMQPUrlReceiver kills StarterRestarter? | | | | | Unresolved | Sep 17, 2015 | Sep 23, 2015 | | |
| | JVM terminated without running Heritrix. | | | | | Duplicate | Sep 16, 2015 | Sep 23, 2015 | | |
| | ServerNotModified WARC revisit records incorrectly record WARC-Payload-Digest | | | | | Fixed | Mar 11, 2015 | Aug 20, 2015 | | |
| | HTML extractor fails to extract CSS from a link tag | | | | | Unresolved | Aug 20, 2015 | Aug 20, 2015 | | |
| | Expand hosts-report.txt with novel bytes, novel urls counts | | | | | Duplicate | May 19, 2008 | Jul 13, 2015 | | |
| | duplicate user agent records in robots.txt cause overwriting of rules | | | | | Unresolved | Jun 25, 2015 | Jun 27, 2015 | | |
| | WARCWriterProcessor no longer prints hop path and link context for outlinks in meta data records | | | | | Fixed | Mar 26, 2015 | Mar 27, 2015 | | |
| | spam | | | | | Cool Story, Bro | Mar 23, 2015 | Mar 23, 2015 | | |
| | password | | | | | Unresolved | Mar 1, 2015 | Mar 1, 2015 | | |
| | Seeds Report missing redirect URLs for 301 / 302 responses | | | | | Fixed | Oct 22, 2014 | Nov 12, 2014 | | |
| | Link Analysis with Apache Giraph (Cluster mode) | | | | | Unresolved | Nov 7, 2014 | Nov 7, 2014 | | |
| | url alone not sufficient to identify unique unit of web content, should be something like canonicalize(url+headers) | | | | | Unresolved | Aug 21, 2009 | Oct 20, 2014 | | |
| | IllegalStateException "got suspicious value" in IpAddressSetDecideRule when | | | | | Fixed | Oct 3, 2014 | Oct 4, 2014 | | |
| | Enable configuration of log4j in libraries | | | | | Unresolved | Oct 3, 2014 | Oct 3, 2014 | | |
| | NullPointerException when getting cookies | | | | | Fixed | Jul 16, 2014 | Oct 3, 2014 | | |
| | Using 'sun.security.tools.KeyTool' restricts to Oracle-based JVM's. | | | | | Unresolved | Aug 6, 2014 | Aug 20, 2014 | | |
| | Support 'Crawl-Delay' and 'Allow' robots.txt directives | | | | | Fixed | Jan 25, 2007 | Aug 6, 2014 | | |
| | WarcWriterProcessor writes full body of revisited items | | | | | Fixed | Jul 25, 2014 | Jul 25, 2014 | | |
| | [junit] [ERROR] TEST org.archive.util.ms.DocTest FAILED (due to testdata folder missing in pre-release tarball) | | | | | Fixed | Mar 14, 2007 | Jul 23, 2014 | | |
| | ByteReplayCharSequence ignores specified/non-default character encodings | | | | | Fixed | Jun 5, 2008 | Jul 23, 2014 | | |
| | Investigate alt parsers (JTidy, HTMLParser, etc.) | | | | | Won't Fix | Feb 17, 2007 | Jul 23, 2014 | | |
| | NPE in FetchHTTP.saveCookies | | | | | Fixed | Feb 16, 2007 | Jul 23, 2014 | | |
| | Heritrix robot violating robots.txt and robots META tags at French National Library | | | | | Won't Fix | Aug 27, 2007 | Jul 23, 2014 | | |
| | StripSessonCFIDs missing from modules/BaseRule.options | | | | | Fixed | Apr 30, 2007 | Jul 23, 2014 | | |
| | heritrix fails to save accept-headers in an override | | | | | Fixed | Feb 16, 2007 | Jul 23, 2014 | | |
| | [maven] [build] maven2 build on Mac/Windows may not find libraries, classes necessary (esp. JSPC JSP precompiling step) | | | | | Obsolete | Jul 10, 2007 | Jul 23, 2014 | | |
| | move from Filters to DecideRules is done, but still no replacement for ContentTypeRegExpFilter exists | | | | | Fixed | Mar 14, 2007 | Jul 23, 2014 | | |
| | Adding decide rules in an override (via UI, or via an already-existing settings tree) does not work. | | | | | Fixed | Apr 27, 2007 | Jul 23, 2014 | | |
| | extend WARC metadata to contain source-seed | | | | | Fixed | Mar 14, 2007 | Jul 23, 2014 | | |
| | BdbModule triggers an IllegalStateException("Database already exists : uri_history") when used with PersistStoreProcessor/PersistLoadProcessor for recrawl | | | | | Won't Fix | Sep 6, 2009 | Jul 23, 2014 | | |
| | NoClassDefFoundError when starting a job | | | | | Incomplete | Feb 16, 2007 | Jul 23, 2014 | | |