All issues

Class not found when importing within a Hadoop MR job.
WAX-69
nutchwax-0.13/src/java/org/archive/nutchwax/imagesearch/DocIndexer.java:309: error: method filter in class IndexingFilters cannot be applied to given types
WAX-83
Nutch HTML parser infinite loop.
WAX-82
Corrupt script tag at end of page causes HTML parser infinite loop.
WAX-81
Mime-type detection infinite loop due to control character in DOCTYPE declaration.
WAX-80
Extract HTML meta tags for 'description' and 'keywords' and add to segment.
WAX-79
HTML noindex and nofollow enforced in HTMLParser?
WAX-78
JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling
WAX-77
Slow parsing
WAX-76
Hacks to use with Hadoop-0.20 from Cloudera
WAX-75
Add support for storing fields in compressed form.
WAX-74
Change default value of searcher.fieldcache in nutch-site.xml to 'false'
WAX-73
Simply build system to copy NW files into Nutch dirs and use Nutch build.xml
WAX-72
NutchWAX-required libraries not included in nutch-1.0.job
WAX-71
Cannot use rsync URLs, no handler for rsync protocol.
WAX-70
Compatibility with {index+segment}s created by NutchWAX 0.10.
WAX-68
Nutch OpenOffice parser does not pass along metadata.
WAX-67
Index documents without crawldb nor linkdb.
WAX-66
Some odd-ball characters display as '?' in search results.
WAX-65
research sorting feature for NutchWAX
WAX-64
LengthNormUpdater returning error code if no fields in index have norms is inconvenient.
WAX-63
Add ability to configure HTTP headers to support cacheing.
WAX-62
DateAdder should have an option to determine if norms should be used.
WAX-60
Wrong log() function used in PageRankScoringFilter.
WAX-59
Need tool to update an existing index's norms based on pagerank information.
WAX-58
nutchwax command-driver doesn't properly enclose arguments in quotes.
WAX-57
Date-adder allows for duplicate dates to be added to a record.
WAX-56
NutchWaxBean's command-line searching should emit title along with other document metadata.
WAX-55
In IndexSearcher.translateHits(), when de-duping use a FieldSelector when loading the document to only load the site field.
WAX-54
IndexMerging parallel indexes fails when index is empty.
WAX-53
Add option to NutchWaxBean to specify directory where index+segments are to be found.
WAX-52
Enhance index merging to combine parallel indexes.
WAX-51
Add "num hits to find" option to NutchWaxBean
WAX-50
Add "hitsPerSite" option to NutchWaxBean
WAX-49
Use NutchWAX configurable query filter for site and url fields.
WAX-48
Stop storing document key in "orig" field in index, synthesize it as needed from the "url" and "digest" fields.
WAX-47
Add option to DumpParallelIndex to output only single field.
WAX-46
Add record to index for non-text documents
WAX-44
bug in Hurricane Katrina
WAX-43
Add option to continue importing if an arcfile cannot be read.
WAX-42
Option to enable/disable the FIELDCACHE in the Nutch IndexSearcher.
WAX-41
Integrate nutchwax with Access Control Oracle
WAX-40
Write more efficient, specialized segment parse_text merging
WAX-39
Build omits neessary libraries from .job file.
WAX-38
Per-collection segments not supported in distributed/master-slave configuration.
WAX-37
Some additional diagnostics on connecting results to segments and snippets would be very helpful.
WAX-36
Add pagerankdb similar to linkdb but which only keeps counts rather than actual inlinks.
WAX-35
Add option to omit storing of content in segment
WAX-34
Add URL canonicalization to pageranker
WAX-33
500 error - java.lang.NegativeArraySizeException
WAX-32
issue 1 of 83