| | Option to enable/disable the FIELDCACHE in the Nutch IndexSearcher. | | | | | Fixed | May 5, 2009 | Feb 19, 2018 | | |
| | Change default value of searcher.fieldcache in nutch-site.xml to 'false' | | | | | Fixed | Feb 20, 2010 | Feb 19, 2018 | | |
| | Wrong log() function used in PageRankScoringFilter. | | | | | Fixed | Jul 22, 2009 | Feb 19, 2018 | | |
| | JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling | | | | | Fixed | Apr 8, 2011 | Feb 19, 2018 | | |
| | Enhance index merging to combine parallel indexes. | | | | | Fixed | Jul 7, 2009 | Feb 18, 2018 | | |
| | NutchWaxBean's command-line searching should emit title along with other document metadata. | | | | | Fixed | Jul 14, 2009 | Feb 18, 2018 | | |
| | Some additional diagnostics on connecting results to segments and snippets would be very helpful. | | | | | Fixed | Mar 4, 2009 | Feb 18, 2018 | | |
| | Nutch HTML parser infinite loop. | | | | | Obsolete | Apr 21, 2011 | Feb 18, 2018 | | |
| | Index documents without crawldb nor linkdb. | | | | | Fixed | Oct 26, 2009 | Feb 18, 2018 | | |
| | Add ability to configure HTTP headers to support cacheing. | | | | | Fixed | Aug 20, 2009 | Feb 17, 2018 | | |
| | Use NutchWAX configurable query filter for site and url fields. | | | | | Fixed | Jun 23, 2009 | Feb 17, 2018 | | |
| | Add record to index for non-text documents | | | | | Obsolete | May 20, 2009 | Feb 17, 2018 | | |
| | Integrate nutchwax with Access Control Oracle | | | | | Obsolete | Mar 25, 2009 | Feb 17, 2018 | | |
| | Add option to continue importing if an arcfile cannot be read. | | | | | Fixed | May 5, 2009 | Feb 2, 2018 | | |
| | bug in Hurricane Katrina | | | | | Fixed | May 19, 2009 | Feb 2, 2018 | May 19, 2009 | |
| | Add ability to store but not index a field via ConfigurableIndexingFilter | | | | | Fixed | Jun 2, 2009 | Feb 2, 2018 | | |
| | Add option to DumpParallelIndex to output only single field. | | | | | Fixed | Jun 18, 2009 | Feb 2, 2018 | | |
| | Stop storing document key in "orig" field in index, synthesize it as needed from the "url" and "digest" fields. | | | | | Fixed | Jun 23, 2009 | Feb 2, 2018 | | |
| | Add "hitsPerSite" option to NutchWaxBean | | | | | Fixed | Jun 23, 2009 | Feb 2, 2018 | | |
| | Add "num hits to find" option to NutchWaxBean | | | | | Fixed | Jun 24, 2009 | Feb 2, 2018 | | |
| | Add option to NutchWaxBean to specify directory where index+segments are to be found. | | | | | Fixed | Jul 7, 2009 | Feb 2, 2018 | | |
| | IndexMerging parallel indexes fails when index is empty. | | | | | Fixed | Jul 7, 2009 | Feb 2, 2018 | | |
| | In IndexSearcher.translateHits(), when de-duping use a FieldSelector when loading the document to only load the site field. | | | | | Fixed | Jul 13, 2009 | Feb 2, 2018 | | |
| | Date-adder allows for duplicate dates to be added to a record. | | | | | Fixed | Jul 14, 2009 | Feb 2, 2018 | | |
| | nutchwax command-driver doesn't properly enclose arguments in quotes. | | | | | Fixed | Jul 16, 2009 | Feb 2, 2018 | | |
| | Need tool to update an existing index's norms based on pagerank information. | | | | | Fixed | Jul 22, 2009 | Feb 2, 2018 | | |
| | DateAdder should have an option to determine if norms should be used. | | | | | Obsolete | Jul 22, 2009 | Feb 2, 2018 | | |
| | Change mime-type of OpenSearch XML response from text/xml to application/xml. | | | | | Fixed | Aug 20, 2009 | Feb 2, 2018 | | |
| | LengthNormUpdater returning error code if no fields in index have norms is inconvenient. | | | | | Fixed | Sep 19, 2009 | Feb 1, 2018 | | |
| | research sorting feature for NutchWAX | | | | | Fixed | Sep 21, 2009 | Feb 1, 2018 | Sep 22, 2009 | |
| | Some odd-ball characters display as '?' in search results. | | | | | Not a Bug | Oct 22, 2009 | Feb 1, 2018 | | |
| | Nutch OpenOffice parser does not pass along metadata. | | | | | Fixed | Oct 26, 2009 | Feb 1, 2018 | | |
| | Compatibility with {index+segment}s created by NutchWAX 0.10. | | | | | Fixed | Oct 29, 2009 | Feb 1, 2018 | | |
| | Class not found when importing within a Hadoop MR job. | | | | | Fixed | Jan 12, 2010 | Feb 1, 2018 | | |
| | Cannot use rsync URLs, no handler for rsync protocol. | | | | | Obsolete | Jan 12, 2010 | Feb 1, 2018 | | |
| | NutchWAX-required libraries not included in nutch-1.0.job | | | | | Obsolete | Feb 20, 2010 | Feb 1, 2018 | | |
| | Simply build system to copy NW files into Nutch dirs and use Nutch build.xml | | | | | Fixed | Feb 20, 2010 | Feb 1, 2018 | | |
| | Add support for storing fields in compressed form. | | | | | Fixed | Mar 18, 2010 | Feb 1, 2018 | | |
| | Hacks to use with Hadoop-0.20 from Cloudera | | | | | Obsolete | Jul 10, 2010 | Feb 1, 2018 | | |
| | Slow parsing | | | | | Unresolved | Sep 1, 2010 | Feb 1, 2018 | | |
| | HTML noindex and nofollow enforced in HTMLParser? | | | | | Obsolete | Apr 15, 2011 | Feb 1, 2018 | | |
| | Extract HTML meta tags for 'description' and 'keywords' and add to segment. | | | | | Unresolved | Apr 15, 2011 | Feb 1, 2018 | | |
| | Mime-type detection infinite loop due to control character in DOCTYPE declaration. | | | | | Unresolved | Apr 18, 2011 | Feb 1, 2018 | | |
| | Corrupt script tag at end of page causes HTML parser infinite loop. | | | | | Unresolved | Apr 19, 2011 | Feb 1, 2018 | | |
| | nutchwax-0.13/src/java/org/archive/nutchwax/imagesearch/DocIndexer.java:309: error: method filter in class IndexingFilters cannot be applied to given types | | | | | Won't Do | Mar 20, 2012 | Feb 1, 2018 | | |
| | Add pagerankdb similar to linkdb but which only keeps counts rather than actual inlinks. | | | | | Fixed | Mar 4, 2009 | Jan 19, 2018 | | |
| | Per-collection segments not supported in distributed/master-slave configuration. | | | | | Fixed | Mar 4, 2009 | Jan 19, 2018 | | |
| | Build omits neessary libraries from .job file. | | | | | Fixed | Mar 4, 2009 | Jan 19, 2018 | | |
| | Write more efficient, specialized segment parse_text merging | | | | | Fixed | Mar 8, 2009 | Jan 19, 2018 | | |
| | Sensible output for requesting page of results past the end. | | | | | Fixed | Oct 17, 2008 | Dec 23, 2017 | | |