It would be really, really, really, nice if we could dispense with the crawldb and linkdb all together. The crawldb contains info about the URLs/documents related to Nutch's crawling features – which we don't use at all. The linkdb is always empty in practice. However, the Nutch 'index' command requires them to exist. It would be great to hack-up Nutch's 'index' command to simply ignore them all together.
Fixed. SVN 2833 and SVN 2834
Added NutchWAX version of "Indexer.java" and command-line driver to run same indexing process as the Nutch indexer but w/o requiring the crawldb and linkdb. In fact, the NutchWAX Indexer doesn't want them on the command line at all.
I also added a command-line driver in the 'nutchwax' script, so one can do
nuchwax index <indexes-dir> <segment>...