We're updating the issue view to help you get more done. 

heritrix hitting non existent URLs in wix.com/app-market

Description

-------- Forwarded Message --------
Subject: Archive org bot hitting not existing urls
Date: Wed, 26 Jul 2017 15:35:00 +0000
From: Marius Vaitkus <mariusv@wix.com>
To: info@archive.org

Hey,

Your bot is constantly hitting non existing urls at https://www.wix.com/app-market

It looks like instead of trying to go to actual URL's it tries to parse random keys inside the site and append them to current location. An example of such url:
https://www.wix.com/app-market/wix-restaurants-kit/welcomeScreenEditor.or

referrer:
https://www.wix.com/app-market/wix-restaurants-kit/overview
Client ip:
207.241.229.83
User agent:
Mozilla/5.0 (compatible; heritrix/3.3.0-SNAPSHOT-20140702-2247 +http://archive.org/details/archive.org_bot)

Also, hitting a section of a site with around 600 requests per minute which results in about 3 times more traffic that this endpoint usually gets is not very reasonable.

It would be really helpful if your bot stops doing that and we don't need to apply additional measures to blocking it.

Thank you,
Marius Vaitkus

Environment

None

Status

Assignee

Unassigned

Reporter

Vangelis Banos

Priority

Major