redirect loop bug in OS Wayback

Description

This issue first came up in Archive-It as issue ARI-1104

original issue:

In Wayback, two dates are listed for each crawl. The second one (the one under the first one for a given date) gets an error message. In Firefox, it says "Redirect Loop" in IE, it says "Internet Explorer Cannot Display the Webpage"

example:
http://wayback.archive-it.org/928/20080516005604/http://www.pbskids.org/

Noah notes;

This looks like it's caused by site weirdness interacting with wayback canonicalization and anchor dates.

On the live web:
http://www.pbskids.org/
redirects to:
http://pbskids.org/domainredirs/www.pbskids.org:80/redir/
which redirects to:
http://pbskids.org/
which serves the content.

The problem is that the wayback considers the urls http://www.pbskids.org/ and http://pbskids.org/ to be equivalent. So in the wayback, because (I think) it's trying to anchor the date to the link you originally clicked on, the redirect to http://pbskids.org/ serves the same thing as the first url in the chain, and starts the redirect loop over again.

Not sure what to do about this. Brad?

Brad says:

Non-trivial – this is a multi-step redirect loop, which is difficult to detect during each redirection. Classic Wayback has a partial method for keeping this from happening, which involved added flags after the date. We haven't needed to use this mechanism to solve this problem yet, but may need to consider adding...

Want to clarify - the first date:

http://wayback.archive-it.org/928/20080109193504/http://pbskids.org/

works for me in Firefox, and the second date:

http://wayback.archive-it.org/928/20080109202911/http://www.pbskids.org/

behaves as you've decribed. Can you confirm the dates, by clicking each from:

http://wayback.archive-it.org/928/*/http://www.pbskids.org/

otherwise there may be something more subtle happening.

Environment

None

Status

Assignee

Brad Tofel

Reporter

Molly Bragg

Labels

None

Group Assignee

None

ZendeskID

None

Estimated Difficulty

None

Actual Difficulty

None

Components

Sprint

None

Priority

Major
Configure