Issues

Select view

Select search mode

Bug
ToeThread Fatal Exception: "kryo.SerializationException: Buffer limit exceeded" in BdbMultipleWorkQueues.get
Unassigned
Gordon Mohr
Major
Unresolved
Mar 7, 2012
Jan 24, 2019
Bug
heritrix hitting non existent URLs in wix.com/app-market
Unassigned
Vangelis Banos
Major
Unresolved
Aug 31, 2017
Aug 31, 2017
Bug
Heritrix ignores robots.txt
Unassigned
Robert Jäschke
Minor
Unresolved
Jun 2, 2016
Jun 7, 2016
Bug
appCtx.getBean() does no longer work in scripting console
Unassigned
Robert Jäschke
Major
Unresolved
Jun 2, 2016
Jun 7, 2016
Bug
heritrix is missing facility to shutdown from console
Unassigned
Karl-Philipp Richter
Major
Unresolved
Feb 4, 2016
Feb 8, 2016
Improvement
Improve feedback after specifying errornous command line arguments
Unassigned
Karl-Philipp Richter
Major
Unresolved
Feb 4, 2016
Feb 4, 2016
Bug
RuntimeException in AMQPUrlReceiver kills StarterRestarter?
Unassigned
Andrew Jackson
Major
Unresolved
Sep 17, 2015
Sep 23, 2015
Bug
HTML extractor fails to extract CSS from a link tag
Unassigned
Kristinn Sigurðsson
Major
Unresolved
Aug 20, 2015
Aug 20, 2015
Bug
duplicate user agent records in robots.txt cause overwriting of rules
Unassigned
Robert Jäschke
Minor
Unresolved
Jun 25, 2015
Jun 27, 2015
Question
password
Unassigned
connor taylor
Major
Unresolved
Mar 1, 2015
Mar 1, 2015
Task
Link Analysis with Apache Giraph (Cluster mode)
Unassigned
Zhang Xiang
Major
Unresolved
Nov 7, 2014
Nov 7, 2014
Improvement
url alone not sufficient to identify unique unit of web content, should be something like canonicalize(url+headers)
Unassigned
Noah Levitt
Major
Unresolved
Aug 21, 2009
Oct 20, 2014
Improvement
Enable configuration of log4j in libraries
Unassigned
Kristinn Sigurðsson
Major
Unresolved
Oct 3, 2014
Oct 3, 2014
Bug
Using 'sun.security.tools.KeyTool' restricts to Oracle-based JVM's.
Unassigned
Thorbjørn Ravn Andersen
Minor
Unresolved
Aug 6, 2014
Aug 20, 2014
Improvement
Add option to prefer the non-DNS resolves
Unassigned
Andres Aguilar
Minor
Unresolved
Jun 6, 2014
Jun 6, 2014
Improvement
[Optionally?] accelerated transition to terminated state after STOP issued
Unassigned
Aaron Ximm
Minor
Unresolved
May 21, 2014
May 21, 2014
Task
Identify programs with minimal Closed Captioning
Unassigned
Roger G Macdonald
Minor
Unresolved
Jan 13, 2014
Jan 13, 2014
Improvement
H3: manifest of all files (esp. W/ARCs) from a job, access to W/ARCs, ability to delete/clear
Unassigned
Gordon Mohr
Major
Unresolved
Jun 3, 2010
Jan 10, 2014
Improvement
WorkQueueFrontier - add log of queue lifecycle
Unassigned
Gordon Mohr
Minor
Unresolved
Feb 17, 2007
Jan 10, 2014
Improvement
improved completion time estimates (queue/total)
Unassigned
Gordon Mohr
Minor
Unresolved
Feb 17, 2007
Jan 10, 2014
Improvement
H3: improve crawler capacity/state reporting for participation in pool of crawling machines
Unassigned
Gordon Mohr
Major
Unresolved
Jun 3, 2010
Jan 10, 2014
Bug
console rates off after checkpoint-resume
Unassigned
Gordon Mohr
Major
Unresolved
Sep 3, 2010
Jan 10, 2014
Improvement
H3: offer web operations to delete job/dir/files (cleaning up local machine/crawler state)
Unassigned
Gordon Mohr
Major
Unresolved
Jun 3, 2010
Jan 10, 2014
Task
crawl-manifest.txt not produced by H3; update and improve manifest functionality
Unassigned
Hunter Stern
Major
Unresolved
Jan 14, 2010
Jan 10, 2014
Jan 19, 2010
Improvement
evaluate H3 in context of IPv6
Unassigned
Gordon Mohr
Major
Unresolved
May 11, 2011
Jan 10, 2014
Bug
BASE HREF of enclosing HTML not used by SWFExtractor
Unassigned
Gordon Mohr
Major
Unresolved
Oct 25, 2010
Jan 10, 2014
Bug
link named "tail alert log..." does not show all alerts
Unassigned
Travis Wellman
Minor
Unresolved
Sep 20, 2011
Jan 10, 2014
Improvement
H3: "add some color"
Unassigned
Gordon Mohr
Major
Unresolved
Aug 31, 2010
Jan 10, 2014
Improvement
bring back the progress-bar
Unassigned
Gordon Mohr
Major
Unresolved
Dec 7, 2009
Jan 10, 2014
Improvement
deprecate shortReportLineTo method in Reporter interface
Unassigned
Travis Wellman
Major
Unresolved
Nov 16, 2011
Jan 10, 2014
Bug
recovery-log scanning generates more error output than is reasonable
Unassigned
Gordon Mohr
Major
Unresolved
Nov 8, 2010
Jan 10, 2014
Improvement
human readable number formats in console
Unassigned
Steve Sisney
Minor
Unresolved
Jul 9, 2009
Jan 10, 2014
Improvement
Write W/ARC per domain/host/seed/etc.
Unassigned
Gordon Mohr
Minor
Unresolved
Feb 17, 2007
Jan 10, 2014
Improvement
Springify(?):Simple guided field-based configuration UI
Unassigned
Gordon Mohr
Minor
Unresolved
Aug 28, 2008
Jan 10, 2014
Improvement
checkpoint directories for logs
Unassigned
Travis Wellman
Major
Unresolved
Aug 17, 2011
Jan 10, 2014
Improvement
Limited Parallelism
Unassigned
Shaofeng Liu
Major
Unresolved
Jan 9, 2014
Jan 9, 2014
Improvement
support Google's robots.txt wildcards ('*') and end-anchor ('$')
Unassigned
Gordon Mohr
Major
Unresolved
Apr 3, 2009
Dec 13, 2013
New Feature
Heritrix install manual
Unassigned
Janis
Trivial
Unresolved
Nov 22, 2013
Nov 22, 2013
Improvement
canonicalization losing docs: make content&result sensitive
Unassigned
Gordon Mohr
Minor
Unresolved
Feb 17, 2007
Nov 14, 2013
Improvement
Rotate heritrix_out.log
Unassigned
Jean-Pierre Bergamin
Major
Unresolved
Oct 8, 2013
Oct 8, 2013
Improvement
Shutdown engine on SEVERE errors
Unassigned
Jean-Pierre Bergamin
Critical
Unresolved
Oct 8, 2013
Oct 8, 2013
Bug
Bogus seed numbers in crawl-report
Unassigned
Jean-Pierre Bergamin
Minor
Unresolved
Oct 8, 2013
Oct 8, 2013
Improvement
support crawling without any dns resolution (can be useful when crawling through proxy)
Unassigned
Noah Levitt
Major
Unresolved
Sep 10, 2013
Sep 10, 2013
Improvement
do something about w/arc reading code
Unassigned
Noah Levitt
Major
Unresolved
Sep 10, 2013
Sep 10, 2013
Bug
Possible deadlock
Unassigned
Kristinn Sigurðsson
Major
Unresolved
Jul 29, 2013
Sep 9, 2013
Improvement
ftp protocol robots.txt
Unassigned
Noah Levitt
Major
Unresolved
Sep 7, 2013
Sep 7, 2013
Bug
FetchWhois mishandles certain tlds
Unassigned
Noah Levitt
Major
Unresolved
Sep 6, 2013
Sep 6, 2013
Improvement
support 'nofollow' in links
Unassigned
Michael Stack
Minor
Unresolved
Feb 17, 2007
Aug 13, 2013
Bug
Redirects of robots.txt treated as valid robots, null pointer exception
Unassigned
Niels van Hecke
Minor
Unresolved
Jul 3, 2013
Jul 3, 2013
Bug
Controlling download time from a URI
Unassigned
Smriti Malhotra
Major
Unresolved
Jun 24, 2013
Jun 24, 2013
1-50 of 528
...