Issues

Select view

Select search mode

Improvement
Option to enable/disable the FIELDCACHE in the Nutch IndexSearcher.
Unassigned
Aaron Binns
Major
Fixed
May 5, 2009
Feb 19, 2018
Improvement
Change default value of searcher.fieldcache in nutch-site.xml to 'false'
Unassigned
Aaron Binns
Major
Fixed
Feb 20, 2010
Feb 19, 2018
Bug
Wrong log() function used in PageRankScoringFilter.
Unassigned
Aaron Binns
Major
Fixed
Jul 22, 2009
Feb 19, 2018
Bug
JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling
Unassigned
Aaron Binns
Major
Fixed
Apr 8, 2011
Feb 19, 2018
Improvement
Enhance index merging to combine parallel indexes.
Unassigned
Aaron Binns
Major
Fixed
Jul 7, 2009
Feb 18, 2018
Bug
NutchWaxBean's command-line searching should emit title along with other document metadata.
Unassigned
Aaron Binns
Major
Fixed
Jul 14, 2009
Feb 18, 2018
Improvement
Some additional diagnostics on connecting results to segments and snippets would be very helpful.
Unassigned
Aaron Binns
Major
Fixed
Mar 4, 2009
Feb 18, 2018
Bug
Nutch HTML parser infinite loop.
Unassigned
Aaron Binns
Major
Obsolete
Apr 21, 2011
Feb 18, 2018
Improvement
Index documents without crawldb nor linkdb.
Unassigned
Aaron Binns
Major
Fixed
Oct 26, 2009
Feb 18, 2018
New Feature
Add ability to configure HTTP headers to support cacheing.
Unassigned
Aaron Binns
Major
Fixed
Aug 20, 2009
Feb 17, 2018
Improvement
Use NutchWAX configurable query filter for site and url fields.
Unassigned
Aaron Binns
Major
Fixed
Jun 23, 2009
Feb 17, 2018
Improvement
Add record to index for non-text documents
Unassigned
Aaron Binns
Major
Obsolete
May 20, 2009
Feb 17, 2018
New Feature
Integrate nutchwax with Access Control Oracle
Unassigned
Lewis Crawford
Major
Obsolete
Mar 25, 2009
Feb 17, 2018
Improvement
Add option to continue importing if an arcfile cannot be read.
Unassigned
Aaron Binns
Major
Fixed
May 5, 2009
Feb 2, 2018
Bug
bug in Hurricane Katrina
Unassigned
Molly Bragg
Major
Fixed
May 19, 2009
Feb 2, 2018
May 19, 2009
Improvement
Add ability to store but not index a field via ConfigurableIndexingFilter
Unassigned
Aaron Binns
Minor
Fixed
Jun 2, 2009
Feb 2, 2018
Improvement
Add option to DumpParallelIndex to output only single field.
Unassigned
Aaron Binns
Major
Fixed
Jun 18, 2009
Feb 2, 2018
Improvement
Stop storing document key in "orig" field in index, synthesize it as needed from the "url" and "digest" fields.
Unassigned
Aaron Binns
Major
Fixed
Jun 23, 2009
Feb 2, 2018
Improvement
Add "hitsPerSite" option to NutchWaxBean
Unassigned
Aaron Binns
Major
Fixed
Jun 23, 2009
Feb 2, 2018
Improvement
Add "num hits to find" option to NutchWaxBean
Unassigned
Aaron Binns
Major
Fixed
Jun 24, 2009
Feb 2, 2018
Improvement
Add option to NutchWaxBean to specify directory where index+segments are to be found.
Unassigned
Aaron Binns
Major
Fixed
Jul 7, 2009
Feb 2, 2018
Bug
IndexMerging parallel indexes fails when index is empty.
Unassigned
Aaron Binns
Major
Fixed
Jul 7, 2009
Feb 2, 2018
Improvement
In IndexSearcher.translateHits(), when de-duping use a FieldSelector when loading the document to only load the site field.
Unassigned
Aaron Binns
Major
Fixed
Jul 13, 2009
Feb 2, 2018
Bug
Date-adder allows for duplicate dates to be added to a record.
Unassigned
Aaron Binns
Major
Fixed
Jul 14, 2009
Feb 2, 2018
Bug
nutchwax command-driver doesn't properly enclose arguments in quotes.
Unassigned
Aaron Binns
Major
Fixed
Jul 16, 2009
Feb 2, 2018
New Feature
Need tool to update an existing index's norms based on pagerank information.
Unassigned
Aaron Binns
Major
Fixed
Jul 22, 2009
Feb 2, 2018
Improvement
DateAdder should have an option to determine if norms should be used.
Unassigned
Aaron Binns
Major
Obsolete
Jul 22, 2009
Feb 2, 2018
Improvement
Change mime-type of OpenSearch XML response from text/xml to application/xml.
Unassigned
Aaron Binns
Minor
Fixed
Aug 20, 2009
Feb 2, 2018
Bug
LengthNormUpdater returning error code if no fields in index have norms is inconvenient.
Unassigned
Aaron Binns
Major
Fixed
Sep 19, 2009
Feb 1, 2018
Question
research sorting feature for NutchWAX
Unassigned
Hunter Stern
Major
Fixed
Sep 21, 2009
Feb 1, 2018
Sep 22, 2009
Bug
Some odd-ball characters display as '?' in search results.
Unassigned
Aaron Binns
Major
Not a Bug
Oct 22, 2009
Feb 1, 2018
Bug
Nutch OpenOffice parser does not pass along metadata.
Unassigned
Aaron Binns
Major
Fixed
Oct 26, 2009
Feb 1, 2018
New Feature
Compatibility with {index+segment}s created by NutchWAX 0.10.
Unassigned
Aaron Binns
Major
Fixed
Oct 29, 2009
Feb 1, 2018
Bug
Class not found when importing within a Hadoop MR job.
Unassigned
Aaron Binns
Critical
Fixed
Jan 12, 2010
Feb 1, 2018
Bug
Cannot use rsync URLs, no handler for rsync protocol.
Unassigned
Aaron Binns
Major
Obsolete
Jan 12, 2010
Feb 1, 2018
Bug
NutchWAX-required libraries not included in nutch-1.0.job
Unassigned
Aaron Binns
Major
Obsolete
Feb 20, 2010
Feb 1, 2018
Improvement
Simply build system to copy NW files into Nutch dirs and use Nutch build.xml
Unassigned
Aaron Binns
Major
Fixed
Feb 20, 2010
Feb 1, 2018
Improvement
Add support for storing fields in compressed form.
Unassigned
Aaron Binns
Major
Fixed
Mar 18, 2010
Feb 1, 2018
Task
Hacks to use with Hadoop-0.20 from Cloudera
Unassigned
Aaron Binns
Major
Obsolete
Jul 10, 2010
Feb 1, 2018
Bug
Slow parsing
Unassigned
Aaron Binns
Major
Unresolved
Sep 1, 2010
Feb 1, 2018
Bug
HTML noindex and nofollow enforced in HTMLParser?
Unassigned
Aaron Binns
Major
Obsolete
Apr 15, 2011
Feb 1, 2018
Improvement
Extract HTML meta tags for 'description' and 'keywords' and add to segment.
Unassigned
Aaron Binns
Major
Unresolved
Apr 15, 2011
Feb 1, 2018
Bug
Mime-type detection infinite loop due to control character in DOCTYPE declaration.
Unassigned
Aaron Binns
Major
Unresolved
Apr 18, 2011
Feb 1, 2018
Bug
Corrupt script tag at end of page causes HTML parser infinite loop.
Unassigned
Aaron Binns
Major
Unresolved
Apr 19, 2011
Feb 1, 2018
Question
nutchwax-0.13/src/java/org/archive/nutchwax/imagesearch/DocIndexer.java:309: error: method filter in class IndexingFilters cannot be applied to given types
Unassigned
Sam
Major
Won't Do
Mar 20, 2012
Feb 1, 2018
New Feature
Add pagerankdb similar to linkdb but which only keeps counts rather than actual inlinks.
Unassigned
Aaron Binns
Major
Fixed
Mar 4, 2009
Jan 19, 2018
Bug
Per-collection segments not supported in distributed/master-slave configuration.
Unassigned
Aaron Binns
Major
Fixed
Mar 4, 2009
Jan 19, 2018
Bug
Build omits neessary libraries from .job file.
Unassigned
Aaron Binns
Major
Fixed
Mar 4, 2009
Jan 19, 2018
New Feature
Write more efficient, specialized segment parse_text merging
Unassigned
Aaron Binns
Major
Fixed
Mar 8, 2009
Jan 19, 2018
Improvement
Option to skip an ARC record based on size or other filtering policy
Unassigned
Aaron Binns
Major
Unresolved
Jul 18, 2008
Mar 13, 2015
1-50 of 50