Remove ArchivalUrls from incoming requests

Description

Wayback in Archival Url mode should check for embedded Archival Urls in incoming requests, remove them and redirect to the modified form. With wayback rewriting javascript, URLs in the page need to be left as-is, but we don't have enough context to do this at rewrite time - only after the browser has made a request with an embedded Archival URL can we remove it.

Example, from:

http://wayback.archive-it.org/1726/20091231154917/http://www.creightonmagazine.org/

Wayback server-side rewrite is changing the javascript function argument:

CurrentIssue/Source.Flash/sp_Buttons?title=Read Article&theLink=http://www.creightonmagazine.org/Issue.Fall_Winter_2009/President.asp

to:

CurrentIssue/Source.Flash/sp_Buttons?title=Read Article&theLink=http://wayback.archive-it.org/1726/20091231154917/http://www.creightonmagazine.org/Issue.Fall_Winter_2009/President.asp

The function called is adding the argument as a CGI GET argument to a constructed URL. The original form was crawled, but when server-side rewrite has added the archival URL prefix, the altered form is not found.

Simplest solution seems like it's to inspect incoming URLs, and if they contain an archival URL (using RegEx) they Archival URL prefix should be stripped, and the client redirected to the unmodified form..

Environment

None

Status

Assignee

Kenji Nagahashi

Reporter

Brad Tofel

Labels

None

Group Assignee

None

ZendeskID

None

Estimated Difficulty

None

Actual Difficulty

None

Components

Sprint

None

Fix versions

Affects versions

Wayback-1.4.2

Due date

2010/03/08

Priority

Major
Configure