The function called is adding the argument as a CGI GET argument to a constructed URL. The original form was crawled, but when server-side rewrite has added the archival URL prefix, the altered form is not found.
Simplest solution seems like it's to inspect incoming URLs, and if they contain an archival URL (using RegEx) they Archival URL prefix should be stripped, and the client redirected to the unmodified form..
<snipped email conv from Kate>
Unless I'm totally out of it though, it looks like while it did get the url of the
button, it didn't actually follow the link through to collect that page. Any ideas,
or is that something that someone else should look into, since it's more of a
For example, the wayback url of the button is at:
The page that it links to should have the following wayback url, but gives a 'not in
You're totally right, heritrix has not gotten all it needed to..
Here, the flash buttons were archived:
But that little flash app drives the browser to:
Actually, they've gotten fancy: The same sp_Buttons.swf is used for all the buttons in the navigation menu, and the flash script inspects the two GET arguments, "title" and "theLink" and creates the button text and the link, on-the-fly:
So this is a little more complicated than I'd thought.
There are 2 problems here:
1) Heritrix didn't capture the subsequent page.
2) Wayback will need to do some very site-specific acrobatics to make these buttons work, once Heritrix has captured them.
I'm tempted to say we have bigger fish to fry at the moment, and #2 will have to remain broken for the near term wrt wayback playback. We should definitely make sure Heritrix is capturing the subsequent links, I think this could be done by adding the menu button targets as additional seeds.