I noticed when working on AIT that the pageranker can be "incorrect" in that it doesn't canonicalize the URLs. Thus, we can have a pagerank.txt file produced with stuff like:
Notice the first and last lines. Those should probably be combined since the URLs would (AFAIK) canonicalize to the same.
And, if we do canonicalize the URLs when generating the pagerank.txt file, we'll have to do the same during indexing to match.