Classic software project
Back to project
View all filters
Artboard Copy 3
Created with Sketch.
Option to enable/disable the FIELDCACHE in the Nutch IndexSearcher.
Change default value of searcher.fieldcache in nutch-site.xml to 'false'
Wrong log() function used in PageRankScoringFilter.
JDK6u23 breaks GzippedInputStream & W/ARCReaders with different GZIP handling
Enhance index merging to combine parallel indexes.
NutchWaxBean's command-line searching should emit title along with other document metadata.
Some additional diagnostics on connecting results to segments and snippets would be very helpful.
Nutch HTML parser infinite loop.
Index documents without crawldb nor linkdb.
Add ability to configure HTTP headers to support cacheing.
Use NutchWAX configurable query filter for site and url fields.
Add record to index for non-text documents
Integrate nutchwax with Access Control Oracle
Add option to continue importing if an arcfile cannot be read.
bug in Hurricane Katrina
Add ability to store but not index a field via ConfigurableIndexingFilter
Add option to DumpParallelIndex to output only single field.
Stop storing document key in "orig" field in index, synthesize it as needed from the "url" and "digest" fields.
Add "hitsPerSite" option to NutchWaxBean
Add "num hits to find" option to NutchWaxBean
Add option to NutchWaxBean to specify directory where index+segments are to be found.
IndexMerging parallel indexes fails when index is empty.
In IndexSearcher.translateHits(), when de-duping use a FieldSelector when loading the document to only load the site field.
Date-adder allows for duplicate dates to be added to a record.
nutchwax command-driver doesn't properly enclose arguments in quotes.
1-25 of 83