Heritrix3 now makes use of the 'Spring Container' (and its XML-based configuration format) to assemble a runnable crawl, choosing from among alternate compatible implementations and settings values.
Developers will find it helpful to review the relevant chapter of Spring's reference documentation to learn all the options provided by the container and configuration format:
Some key insights to understanding this model are:
- Applications are large groupings of collaborating components, and often components have alternate, swappable implementations. (In our case, one runnable crawl job, with chosen settings and options, is one application.)
- The configuration file(s) declare all participating components, and, where necessary, initial assignment values.
- The 'container' uses the configuration file(s), plus other hints derived from the components themselves (like compatible types and settings-names), to assemble all components with their initial state and direct references to their collaborators. If a component is needed (as implied by other components), but insufficiently declared, errors are thrown.