Back to project
Issues and filters
View all filters
Sensible output for requesting page of results past the end.
Add option to omit storing of content in segment
Add strict/loose option to DateAdder for revisit lines with extra data on end
Observe content size limit on importing.
Add DFS read/write support to DateAdder
Digest differs between ARCReader and Wayback index-arc.
Add URL canonicalization to pageranker
Option to skip ARC record import based on HTTP status code of content
Investigate why so many PDFs fail to parse.
Add a "field setter" filter to set a field to a static value in the Lucene document during indexing.
Add XML elements containing all search URL params for self-link generation
500 error - java.lang.NegativeArraySizeException
nutchwax home page issue tracker still points to sf.net
Allow for blank lines and comment lines in manifest file.
Entire file not imported
Various code clean-ups based on code review using PMD tool.
Nutchwax requires very long timeouts on remotely hosted arc files
contrib/archive/README.txt needs clarifications
Investigate malformed URL report during date-adder
Investigate why reading content from archive file uses such small chunks
Add reading of archive files from DFS
Implementor/user-provided XSLT for OpenSearch results
Change config to that URL filters are not applied during link inversion
testing, please ignore
Change metadata field name in search results from "arcname" to "filename"
Add metadata field "fileoffset"
Date queries cause TooManyClauses exceptions
Add "exacturl" metadata field to indexing so it can be searched as-is, not parsed/tokenized like the "url" field.
Add utility/tool to dump unique values of a field in an index.
More aggressive collapsing by site in search results
bug in exacturl query
DateAdder fails due to uncaught exception in URL canonicalization
Change DateAdder to allow for implementation of URLCanonicalizer to be defined in property.
issue 1 of 33