In large scale crawls the current (3.3.0-SNAPSHOT) hosts report tends to include large number of hosts that are never crawled. In my crawls, it can easily be more than half the report.
This is clutter and it also means it takes way longer to generate the report since all these extra hosts need to be sorted.
I've modified the HostsReport in two ways to help with this. Make sorting optional and make the inclusion of 'empty' hosts option. In both cases the default values match current behavior so this shouldn't impact anyone who doesn't actively seek it out. PR: 123
Merged pull request