Classic software project
Back to project
View all filters
Closed and 1 more
Artboard Copy 3
Created with Sketch.
Add option to omit storing of content in segment
Add URL canonicalization to pageranker
500 error - java.lang.NegativeArraySizeException
contrib/archive/README.txt needs clarifications
Nutchwax requires very long timeouts on remotely hosted arc files
nutchwax home page issue tracker still points to sf.net
Investigate malformed URL report during date-adder
Sensible output for requesting page of results past the end.
Add XML elements containing all search URL params for self-link generation
Add utility/tool to dump unique values of a field in an index.
DateAdder fails due to uncaught exception in URL canonicalization
Add a "field setter" filter to set a field to a static value in the Lucene document during indexing.
Various code clean-ups based on code review using PMD tool.
Allow for blank lines and comment lines in manifest file.
bug in exacturl query
Add strict/loose option to DateAdder for revisit lines with extra data on end
Add reading of archive files from DFS
More aggressive collapsing by site in search results
Option to skip ARC record import based on HTTP status code of content
Investigate why reading content from archive file uses such small chunks
Add DFS read/write support to DateAdder
Add metadata field "fileoffset"
Change metadata field name in search results from "arcname" to "filename"
Add "exacturl" metadata field to indexing so it can be searched as-is, not parsed/tokenized like the "url" field.
Entire file not imported
1-25 of 33