NutchWAX 0.10 and 0.12 use a slightly different format for the key field in the segment. A Nutch(WAX) segment contains Hadoop MapFiles, where the key is based on the URL and the value is the document-specific info. We are concerned with the segment's 'parse_text' MapFile. This file contains the parsed text of the documents in the index. These are used for generating the search resulsts snippets.
In NutchWAX 0.10, the format of the key was
c=<collectionId>,u=<url>
and in NutchWAX 0.12, the format was changed to
<url> <digest>
in order to support (de-)duplication.
If one tries to point a NutchWAX 0.12 searcher to a NW 0.10 {index+segment}, it will successfully search the index, but will be unable to generate snippets due to the change in the key format.
We need a method to tell NutchWAX which segments use he 0.10 format and which use the 0.12 format. Then it can generate the key accordingly and thus simultaneously search indexes created by both 0.10 and 0.12.
SVN 2870 & 2946
In the segments directory, create a file named "versions". In it,
place lines of the form:
<segment-name> <version>
where version can be either "10" or "12" (without quotes). Ex.
foo-segment 10
bar-segment 12
If a segment is not listed in the "versions" file, it will be treated as version 12.