There are two ways to run a NutchWAX MapReduce job. It can be invoked with the 'nutchwax' command-line driver, e.g.
nutchwax import <manifest> <segment>
or it can be submitted as a job to a Hadoop cluster
hadoop jar $NUTCH_HOME/nutch-1.0.job org.archive.nutchwax.Importer <manifest> <segment>
When using the second method, during the reduce step of the import job, when the key/value pairs for the segment's crawl_data are read, Hadoop cannot find the class
Everything works fine when using the 'nutchwax' command-line driver with a full-on NutchWAX installation.
I'm guessing that there is some differences in the way the classloaders are configured in the two different contexts.