The Nutch OpenOffice parser does not pass along the metadata set by Nutch(WAX). Thus, when we get to the indexing stage, the metadata is missing – so there is no document title, segment, digest, etc. to use for indexing. In fact, it causes the indexer code to explode since it assumes at least the segment and digest fields will be set to non-null values. Without the metadata, those fields are null which triggers a NullPointerException in Lucene.
Fixed. SVN 2832. One-line change/fix to include metadata object passed in from caller.