Issues

Select view

Select search mode

Bug
files uploaded to action directory with http have .bin extension added, causing heritrix to ignore them
Unassigned
Noah Levitt
Major
Fixed
Jun 24, 2011
Aug 29, 2019
Task
Ambiguity between srcset urls and data:image base64 encoded image
Unassigned
Adam Miller
Major
Fixed
Mar 8, 2019
Apr 24, 2019
Improvement
Add support for extracting URLs from img srcset attribute
Unassigned
Adam Miller
Major
Fixed
Oct 20, 2016
Mar 8, 2019
Bug
ToeThread Fatal Exception: "kryo.SerializationException: Buffer limit exceeded" in BdbMultipleWorkQueues.get
Unassigned
Gordon Mohr
Major
Unresolved
Mar 7, 2012
Jan 24, 2019
Task
why do we write the header "WARC-Truncated: length" in warc revisit records?
Unassigned
Noah Levitt
Minor
Cool Story, Bro
Nov 4, 2009
Sep 18, 2018
Bug
NPE in BdbMultipleWorkQueues.delete() -- queue stuck?
Unassigned
Gordon Mohr
Major
Incomplete
Feb 16, 2007
Sep 17, 2018
Bug
heritrix hitting non existent URLs in wix.com/app-market
Unassigned
Vangelis Banos
Major
Unresolved
Aug 31, 2017
Aug 31, 2017
Bug
Webserver response 307 to 302 causes infinite redirect
Unassigned
Dominic Dela Cruz
Minor
Obsolete
Oct 28, 2008
Apr 5, 2017
Nov 4, 2008
Improvement
Crawl M3U8 files and capture resources they describe
Unassigned
Barbara Miller
Major
Not a Bug
Oct 28, 2016
Oct 28, 2016
Bug
Heritrix ignores robots.txt
Unassigned
Robert Jäschke
Minor
Unresolved
Jun 2, 2016
Jun 7, 2016
Bug
appCtx.getBean() does no longer work in scripting console
Unassigned
Robert Jäschke
Major
Unresolved
Jun 2, 2016
Jun 7, 2016
Question
Are URLs including 'Japanese Full Space' supported?
Unassigned
Masahiro Shimada
Minor
Fixed
Oct 28, 2015
May 26, 2016
Bug
java 8 keytool issue
Unassigned
Luck Colors
Blocker
Fixed
Aug 13, 2015
May 6, 2016
Bug
CoderMalfunctionError: java.nio.BufferOverflowException
Unassigned
Gordon Mohr
Minor
Obsolete
Feb 16, 2007
Mar 29, 2016
New Feature
Allow submission of non-login GET forms
Unassigned
Hunter Stern
Minor
Obsolete
Dec 9, 2014
Mar 29, 2016
Dec 25, 2014
Bug
heritrix is missing facility to shutdown from console
Unassigned
Karl-Philipp Richter
Major
Unresolved
Feb 4, 2016
Feb 8, 2016
Bug
checkpointing gives error on Windows
Unassigned
Hunter Stern
Minor
Fixed
Jun 23, 2011
Feb 5, 2016
Improvement
Improve feedback after specifying errornous command line arguments
Unassigned
Karl-Philipp Richter
Major
Unresolved
Feb 4, 2016
Feb 4, 2016
Improvement
HostsReport issues
Unassigned
Kristinn Sigurðsson
Major
Fixed
Jul 13, 2015
Jan 15, 2016
Bug
RuntimeException in AMQPUrlReceiver kills StarterRestarter?
Unassigned
Andrew Jackson
Major
Unresolved
Sep 17, 2015
Sep 23, 2015
Bug
JVM terminated without running Heritrix.
Unassigned
programmer
Critical
Duplicate
Sep 16, 2015
Sep 23, 2015
Bug
ServerNotModified WARC revisit records incorrectly record WARC-Payload-Digest
Unassigned
Kristinn Sigurðsson
Major
Fixed
Mar 11, 2015
Aug 20, 2015
Bug
HTML extractor fails to extract CSS from a link tag
Unassigned
Kristinn Sigurðsson
Major
Unresolved
Aug 20, 2015
Aug 20, 2015
Improvement
Expand hosts-report.txt with novel bytes, novel urls counts
Unassigned
Michael Magin
Major
Duplicate
May 19, 2008
Jul 13, 2015
Bug
duplicate user agent records in robots.txt cause overwriting of rules
Unassigned
Robert Jäschke
Minor
Unresolved
Jun 25, 2015
Jun 27, 2015
Bug
WARCWriterProcessor no longer prints hop path and link context for outlinks in meta data records
Unassigned
Adam Miller
Major
Fixed
Mar 26, 2015
Mar 27, 2015
Bug
spam
Unassigned
c
Major
Cool Story, Bro
Mar 23, 2015
Mar 23, 2015
Question
password
Unassigned
connor taylor
Major
Unresolved
Mar 1, 2015
Mar 1, 2015
Bug
Seeds Report missing redirect URLs for 301 / 302 responses
Unassigned
Adam Miller
Major
Fixed
Oct 22, 2014
Nov 12, 2014
Task
Link Analysis with Apache Giraph (Cluster mode)
Unassigned
Zhang Xiang
Major
Unresolved
Nov 7, 2014
Nov 7, 2014
Improvement
url alone not sufficient to identify unique unit of web content, should be something like canonicalize(url+headers)
Unassigned
Noah Levitt
Major
Unresolved
Aug 21, 2009
Oct 20, 2014
Bug
IllegalStateException "got suspicious value" in IpAddressSetDecideRule when
Unassigned
Kristinn Sigurðsson
Major
Fixed
Oct 3, 2014
Oct 4, 2014
Improvement
Enable configuration of log4j in libraries
Unassigned
Kristinn Sigurðsson
Major
Unresolved
Oct 3, 2014
Oct 3, 2014
Bug
NullPointerException when getting cookies
Unassigned
Kristinn Sigurðsson
Major
Fixed
Jul 16, 2014
Oct 3, 2014
Bug
Using 'sun.security.tools.KeyTool' restricts to Oracle-based JVM's.
Unassigned
Thorbjørn Ravn Andersen
Minor
Unresolved
Aug 6, 2014
Aug 20, 2014
New Feature
Support 'Crawl-Delay' and 'Allow' robots.txt directives
Unassigned
Gordon Mohr
Major
Fixed
Jan 25, 2007
Aug 6, 2014
Bug
WarcWriterProcessor writes full body of revisited items
Unassigned
Kristinn Sigurðsson
Major
Fixed
Jul 25, 2014
Jul 25, 2014
Bug
[junit] [ERROR] TEST org.archive.util.ms.DocTest FAILED (due to testdata folder missing in pre-release tarball)
Unassigned
Olaf Freyer
Major
Fixed
Mar 14, 2007
Jul 23, 2014
Bug
ByteReplayCharSequence ignores specified/non-default character encodings
Unassigned
Gordon Mohr
Critical
Fixed
Jun 5, 2008
Jul 23, 2014
Improvement
Investigate alt parsers (JTidy, HTMLParser, etc.)
Unassigned
Gordon Mohr
Major
Won't Fix
Feb 17, 2007
Jul 23, 2014
Bug
NPE in FetchHTTP.saveCookies
Unassigned
Michael Stack
Minor
Fixed
Feb 16, 2007
Jul 23, 2014
Bug
Heritrix robot violating robots.txt and robots META tags at French National Library
Unassigned
Swami Petaramesh
Critical
Won't Fix
Aug 27, 2007
Jul 23, 2014
Bug
StripSessonCFIDs missing from modules/BaseRule.options
Unassigned
Michael Magin
Major
Fixed
Apr 30, 2007
Jul 23, 2014
Bug
heritrix fails to save accept-headers in an override
Unassigned
Michael Magin
Major
Fixed
Feb 16, 2007
Jul 23, 2014
Bug
[maven] [build] maven2 build on Mac/Windows may not find libraries, classes necessary (esp. JSPC JSP precompiling step)
Unassigned
Gordon Mohr
Minor
Obsolete
Jul 10, 2007
Jul 23, 2014
Bug
move from Filters to DecideRules is done, but still no replacement for ContentTypeRegExpFilter exists
Unassigned
Olaf Freyer
Major
Fixed
Mar 14, 2007
Jul 23, 2014
Bug
Adding decide rules in an override (via UI, or via an already-existing settings tree) does not work.
Unassigned
Michael Magin
Critical
Fixed
Apr 27, 2007
Jul 23, 2014
Improvement
extend WARC metadata to contain source-seed
Unassigned
Olaf Freyer
Minor
Fixed
Mar 14, 2007
Jul 23, 2014
Bug
BdbModule triggers an IllegalStateException("Database already exists : uri_history") when used with PersistStoreProcessor/PersistLoadProcessor for recrawl
Unassigned
Simon Huet
Major
Won't Fix
Sep 6, 2009
Jul 23, 2014
Bug
NoClassDefFoundError when starting a job
Unassigned
(sourceforge)
Minor
Incomplete
Feb 16, 2007
Jul 23, 2014
1-50 of 1000+
...