Issues

Select view

Select search mode

Task
Ambiguity between srcset urls and data:image base64 encoded image
Unassigned
Adam Miller
Major
Fixed
Mar 8, 2019
Apr 24, 2019
Bug
heritrix hitting non existent URLs in wix.com/app-market
Unassigned
Vangelis Banos
Major
Unresolved
Aug 31, 2017
Aug 31, 2017
Improvement
Crawl M3U8 files and capture resources they describe
Unassigned
Barbara Miller
Major
Not a Bug
Oct 28, 2016
Oct 28, 2016
Improvement
Add support for extracting URLs from img srcset attribute
Unassigned
Adam Miller
Major
Fixed
Oct 20, 2016
Mar 8, 2019
Bug
appCtx.getBean() does no longer work in scripting console
Unassigned
Robert Jäschke
Major
Unresolved
Jun 2, 2016
Jun 7, 2016
Bug
Heritrix ignores robots.txt
Unassigned
Robert Jäschke
Minor
Unresolved
Jun 2, 2016
Jun 7, 2016
Improvement
Improve feedback after specifying errornous command line arguments
Unassigned
Karl-Philipp Richter
Major
Unresolved
Feb 4, 2016
Feb 4, 2016
Bug
heritrix is missing facility to shutdown from console
Unassigned
Karl-Philipp Richter
Major
Unresolved
Feb 4, 2016
Feb 8, 2016
Question
Are URLs including 'Japanese Full Space' supported?
Unassigned
Masahiro Shimada
Minor
Fixed
Oct 28, 2015
May 26, 2016
Bug
RuntimeException in AMQPUrlReceiver kills StarterRestarter?
Unassigned
Andrew Jackson
Major
Unresolved
Sep 17, 2015
Sep 23, 2015
Bug
JVM terminated without running Heritrix.
Unassigned
programmer
Critical
Duplicate
Sep 16, 2015
Sep 23, 2015
Bug
HTML extractor fails to extract CSS from a link tag
Unassigned
Kristinn Sigurðsson
Major
Unresolved
Aug 20, 2015
Aug 20, 2015
Bug
java 8 keytool issue
Unassigned
Luck Colors
Blocker
Fixed
Aug 13, 2015
May 6, 2016
Improvement
HostsReport issues
Unassigned
Kristinn Sigurðsson
Major
Fixed
Jul 13, 2015
Jan 15, 2016
Bug
duplicate user agent records in robots.txt cause overwriting of rules
Unassigned
Robert Jäschke
Minor
Unresolved
Jun 25, 2015
Jun 27, 2015
Bug
WARCWriterProcessor no longer prints hop path and link context for outlinks in meta data records
Unassigned
Adam Miller
Major
Fixed
Mar 26, 2015
Mar 27, 2015
Bug
spam
Unassigned
c
Major
Cool Story, Bro
Mar 23, 2015
Mar 23, 2015
Bug
ServerNotModified WARC revisit records incorrectly record WARC-Payload-Digest
Unassigned
Kristinn Sigurðsson
Major
Fixed
Mar 11, 2015
Aug 20, 2015
Question
password
Unassigned
connor taylor
Major
Unresolved
Mar 1, 2015
Mar 1, 2015
New Feature
Allow submission of non-login GET forms
Unassigned
Hunter Stern
Minor
Obsolete
Dec 9, 2014
Mar 29, 2016
Dec 25, 2014
Task
Link Analysis with Apache Giraph (Cluster mode)
Unassigned
Zhang Xiang
Major
Unresolved
Nov 7, 2014
Nov 7, 2014
Bug
Seeds Report missing redirect URLs for 301 / 302 responses
Unassigned
Adam Miller
Major
Fixed
Oct 22, 2014
Nov 12, 2014
Improvement
Enable configuration of log4j in libraries
Unassigned
Kristinn Sigurðsson
Major
Unresolved
Oct 3, 2014
Oct 3, 2014
Bug
IllegalStateException "got suspicious value" in IpAddressSetDecideRule when
Unassigned
Kristinn Sigurðsson
Major
Fixed
Oct 3, 2014
Oct 4, 2014
Bug
Using 'sun.security.tools.KeyTool' restricts to Oracle-based JVM's.
Unassigned
Thorbjørn Ravn Andersen
Minor
Unresolved
Aug 6, 2014
Aug 20, 2014
Bug
WarcWriterProcessor writes full body of revisited items
Unassigned
Kristinn Sigurðsson
Major
Fixed
Jul 25, 2014
Jul 25, 2014
Bug
NullPointerException when getting cookies
Unassigned
Kristinn Sigurðsson
Major
Fixed
Jul 16, 2014
Oct 3, 2014
Improvement
Add option to prefer the non-DNS resolves
Unassigned
Andres Aguilar
Minor
Unresolved
Jun 6, 2014
Jun 6, 2014
Bug
ExtractorHTML shouldn't treat codebase contents as embeds
Unassigned
Kristinn Sigurðsson
Major
Fixed
Jun 4, 2014
Jun 4, 2014
Improvement
[Optionally?] accelerated transition to terminated state after STOP issued
Unassigned
Aaron Ximm
Minor
Unresolved
May 21, 2014
May 21, 2014
Bug
dont't use dns search domains on name resolution
Unassigned
samuel stoller
Major
Duplicate
Apr 22, 2014
Apr 24, 2014
Bug
deadlock in frontier
Unassigned
Noah Levitt
Critical
Fixed
Apr 2, 2014
Apr 25, 2014
Bug
Flash extractor not parsing initactions section of swf for possible links
Unassigned
Hunter Stern
Major
Fixed
Feb 13, 2014
Mar 1, 2014
Feb 20, 2014
Bug
Heritrix adding port to Host header
Unassigned
Hunter Stern
Major
Fixed
Feb 3, 2014
Jul 17, 2014
Feb 13, 2014
Bug
WorkQueueFrontier.deleteURIs mishandles deletions from retired queues
Unassigned
Kristinn Sigurðsson
Minor
Fixed
Jan 15, 2014
Jan 16, 2014
Task
Identify programs with minimal Closed Captioning
Unassigned
Roger G Macdonald
Minor
Unresolved
Jan 13, 2014
Jan 13, 2014
Improvement
Limited Parallelism
Unassigned
Shaofeng Liu
Major
Unresolved
Jan 9, 2014
Jan 9, 2014
Bug
support url with two consecutive question marks "??"
Unassigned
Noah Levitt
Major
Fixed
Dec 7, 2013
Dec 7, 2013
Bug
"Failed to start bean 'bdb'" when trying to build and launch a job which was stopped or to build and launch a job from a checkpoint.
Unassigned
Arkiver
Critical
Duplicate
Dec 3, 2013
Dec 12, 2013
New Feature
Heritrix install manual
Unassigned
Janis
Trivial
Unresolved
Nov 22, 2013
Nov 22, 2013
Improvement
on checkpoint w/arcs are closed and new ones started; add option not to do that
Unassigned
Noah Levitt
Major
Fixed
Oct 11, 2013
Oct 11, 2013
Improvement
Rotate heritrix_out.log
Unassigned
Jean-Pierre Bergamin
Major
Unresolved
Oct 8, 2013
Oct 8, 2013
Improvement
Shutdown engine on SEVERE errors
Unassigned
Jean-Pierre Bergamin
Critical
Unresolved
Oct 8, 2013
Oct 8, 2013
Bug
Bogus seed numbers in crawl-report
Unassigned
Jean-Pierre Bergamin
Minor
Unresolved
Oct 8, 2013
Oct 8, 2013
New Feature
option to forget all but latest checkpoint
Unassigned
Noah Levitt
Major
Done
Sep 11, 2013
Sep 11, 2013
Improvement
support crawling without any dns resolution (can be useful when crawling through proxy)
Unassigned
Noah Levitt
Major
Unresolved
Sep 10, 2013
Sep 10, 2013
Improvement
do something about w/arc reading code
Unassigned
Noah Levitt
Major
Unresolved
Sep 10, 2013
Sep 10, 2013
Bug
checkpoint-resumed crawl job stats are inconsistent-- some start from 0, some resume from checkpoint numbers
Unassigned
Noah Levitt
Major
Fixed
Sep 7, 2013
Sep 11, 2013
Improvement
ftp protocol robots.txt
Unassigned
Noah Levitt
Major
Unresolved
Sep 7, 2013
Sep 7, 2013
Bug
FetchWhois mishandles certain tlds
Unassigned
Noah Levitt
Major
Unresolved
Sep 6, 2013
Sep 6, 2013
1-50 of 1000+
...