Issues
Ambiguity between srcset urls and data:image base64 encoded image | Unassigned | Adam Miller | Fixed | Mar 8, 2019 | Apr 24, 2019 | ||||||
heritrix hitting non existent URLs in wix.com/app-market | Unassigned | Vangelis Banos | Unresolved | Aug 31, 2017 | Aug 31, 2017 | ||||||
Crawl M3U8 files and capture resources they describe | Unassigned | Barbara Miller | Not a Bug | Oct 28, 2016 | Oct 28, 2016 | ||||||
Add support for extracting URLs from img srcset attribute | Unassigned | Adam Miller | Fixed | Oct 20, 2016 | Mar 8, 2019 | ||||||
appCtx.getBean() does no longer work in scripting console | Unassigned | Robert Jäschke | Unresolved | Jun 2, 2016 | Jun 7, 2016 | ||||||
Heritrix ignores robots.txt | Unassigned | Robert Jäschke | Unresolved | Jun 2, 2016 | Jun 7, 2016 | ||||||
Improve feedback after specifying errornous command line arguments | Unassigned | Karl-Philipp Richter | Unresolved | Feb 4, 2016 | Feb 4, 2016 | ||||||
heritrix is missing facility to shutdown from console | Unassigned | Karl-Philipp Richter | Unresolved | Feb 4, 2016 | Feb 8, 2016 | ||||||
![]() | Are URLs including 'Japanese Full Space' supported? | Unassigned | Masahiro Shimada | Fixed | Oct 28, 2015 | May 26, 2016 | |||||
RuntimeException in AMQPUrlReceiver kills StarterRestarter? | Unassigned | Andrew Jackson | Unresolved | Sep 17, 2015 | Sep 23, 2015 | ||||||
JVM terminated without running Heritrix. | Unassigned | programmer | Duplicate | Sep 16, 2015 | Sep 23, 2015 | ||||||
HTML extractor fails to extract CSS from a link tag | Unassigned | Kristinn Sigurðsson | Unresolved | Aug 20, 2015 | Aug 20, 2015 | ||||||
java 8 keytool issue | Unassigned | Luck Colors | Fixed | Aug 13, 2015 | May 6, 2016 | ||||||
HostsReport issues | Unassigned | Kristinn Sigurðsson | Fixed | Jul 13, 2015 | Jan 15, 2016 | ||||||
duplicate user agent records in robots.txt cause overwriting of rules | Unassigned | Robert Jäschke | Unresolved | Jun 25, 2015 | Jun 27, 2015 | ||||||
WARCWriterProcessor no longer prints hop path and link context for outlinks in meta data records | Unassigned | Adam Miller | Fixed | Mar 26, 2015 | Mar 27, 2015 | ||||||
spam | Unassigned | c | Cool Story, Bro | Mar 23, 2015 | Mar 23, 2015 | ||||||
ServerNotModified WARC revisit records incorrectly record WARC-Payload-Digest | Unassigned | Kristinn Sigurðsson | Fixed | Mar 11, 2015 | Aug 20, 2015 | ||||||
![]() | password | Unassigned | connor taylor | Unresolved | Mar 1, 2015 | Mar 1, 2015 | |||||
Allow submission of non-login GET forms | Unassigned | Hunter Stern | Obsolete | Dec 9, 2014 | Mar 29, 2016 | Dec 25, 2014 | |||||
Link Analysis with Apache Giraph (Cluster mode) | Unassigned | Zhang Xiang | Unresolved | Nov 7, 2014 | Nov 7, 2014 | ||||||
Seeds Report missing redirect URLs for 301 / 302 responses | Unassigned | Adam Miller | Fixed | Oct 22, 2014 | Nov 12, 2014 | ||||||
Enable configuration of log4j in libraries | Unassigned | Kristinn Sigurðsson | Unresolved | Oct 3, 2014 | Oct 3, 2014 | ||||||
IllegalStateException "got suspicious value" in IpAddressSetDecideRule when | Unassigned | Kristinn Sigurðsson | Fixed | Oct 3, 2014 | Oct 4, 2014 | ||||||
Using 'sun.security.tools.KeyTool' restricts to Oracle-based JVM's. | Unassigned | Thorbjørn Ravn Andersen | Unresolved | Aug 6, 2014 | Aug 20, 2014 | ||||||
WarcWriterProcessor writes full body of revisited items | Unassigned | Kristinn Sigurðsson | Fixed | Jul 25, 2014 | Jul 25, 2014 | ||||||
NullPointerException when getting cookies | Unassigned | Kristinn Sigurðsson | Fixed | Jul 16, 2014 | Oct 3, 2014 | ||||||
Add option to prefer the non-DNS resolves | Unassigned | Andres Aguilar | Unresolved | Jun 6, 2014 | Jun 6, 2014 | ||||||
ExtractorHTML shouldn't treat codebase contents as embeds | Unassigned | Kristinn Sigurðsson | Fixed | Jun 4, 2014 | Jun 4, 2014 | ||||||
[Optionally?] accelerated transition to terminated state after STOP issued | Unassigned | Aaron Ximm | Unresolved | May 21, 2014 | May 21, 2014 | ||||||
dont't use dns search domains on name resolution | Unassigned | samuel stoller | Duplicate | Apr 22, 2014 | Apr 24, 2014 | ||||||
deadlock in frontier | Unassigned | Noah Levitt | Fixed | Apr 2, 2014 | Apr 25, 2014 | ||||||
Flash extractor not parsing initactions section of swf for possible links | Unassigned | Hunter Stern | Fixed | Feb 13, 2014 | Mar 1, 2014 | Feb 20, 2014 | |||||
Heritrix adding port to Host header | Unassigned | Hunter Stern | Fixed | Feb 3, 2014 | Jul 17, 2014 | Feb 13, 2014 | |||||
WorkQueueFrontier.deleteURIs mishandles deletions from retired queues | Unassigned | Kristinn Sigurðsson | Fixed | Jan 15, 2014 | Jan 16, 2014 | ||||||
Identify programs with minimal Closed Captioning | Unassigned | Roger G Macdonald | Unresolved | Jan 13, 2014 | Jan 13, 2014 | ||||||
Limited Parallelism | Unassigned | Shaofeng Liu | Unresolved | Jan 9, 2014 | Jan 9, 2014 | ||||||
support url with two consecutive question marks "??" | Unassigned | Noah Levitt | Fixed | Dec 7, 2013 | Dec 7, 2013 | ||||||
"Failed to start bean 'bdb'" when trying to build and launch a job which was stopped or to build and launch a job from a checkpoint. | Unassigned | Arkiver | Duplicate | Dec 3, 2013 | Dec 12, 2013 | ||||||
Heritrix install manual | Unassigned | Janis | Unresolved | Nov 22, 2013 | Nov 22, 2013 | ||||||
on checkpoint w/arcs are closed and new ones started; add option not to do that | Unassigned | Noah Levitt | Fixed | Oct 11, 2013 | Oct 11, 2013 | ||||||
Rotate heritrix_out.log | Unassigned | Jean-Pierre Bergamin | Unresolved | Oct 8, 2013 | Oct 8, 2013 | ||||||
Shutdown engine on SEVERE errors | Unassigned | Jean-Pierre Bergamin | Unresolved | Oct 8, 2013 | Oct 8, 2013 | ||||||
Bogus seed numbers in crawl-report | Unassigned | Jean-Pierre Bergamin | Unresolved | Oct 8, 2013 | Oct 8, 2013 | ||||||
option to forget all but latest checkpoint | Unassigned | Noah Levitt | Done | Sep 11, 2013 | Sep 11, 2013 | ||||||
support crawling without any dns resolution (can be useful when crawling through proxy) | Unassigned | Noah Levitt | Unresolved | Sep 10, 2013 | Sep 10, 2013 | ||||||
do something about w/arc reading code | Unassigned | Noah Levitt | Unresolved | Sep 10, 2013 | Sep 10, 2013 | ||||||
checkpoint-resumed crawl job stats are inconsistent-- some start from 0, some resume from checkpoint numbers | Unassigned | Noah Levitt | Fixed | Sep 7, 2013 | Sep 11, 2013 | ||||||
ftp protocol robots.txt | Unassigned | Noah Levitt | Unresolved | Sep 7, 2013 | Sep 7, 2013 | ||||||
FetchWhois mishandles certain tlds | Unassigned | Noah Levitt | Unresolved | Sep 6, 2013 | Sep 6, 2013 |
1-50 of 1000+