Child pages
  • Searching Archived Page Text
Skip to end of metadata
Go to start of metadata
General Search Information

To search the text of your archived web pages enter the keyword(s) you would like to search for (just as you would using other search engines, such as Google). If you enter multiple keywords, the default for basic search will be to return results that contain all the keywords. Keep in mind that you will only be searching through the content that has been archived as a part of your collection(s).

The search engine indexes all text on the archived webpages and gives results based on the concentration of the given search term. 

By default, only one result for each site (host) is displayed.  To see all hits from a site (host), click on the "more results from..." link. You can also used the Advanced Search options (below) to view more than one result for each site (host).

Searching Page Text on www.archive-it.org

You can perform a full text search from the public archive-it.org site by clicking on the "Search Page Text" tab, either from the "Explore" page (http://www.archive-it.org/explore), which will allow you to search the full text of all partners' collections, or from a specific collecting organization or collection page.

Entering a search term in the main search box will allow you to do a basic keyword search, and the Advanced Search Options on the left will allow you to further refine the scope of your search. For more information on the Advanced Search Options see below.

Advanced Search Options

In addition to basic full text search there are also advanced search options:

  • With all of the words: As with the basic search, the first advanced search option is to search for results with all keyword terms
  • In collection: Allows you to select multiple collections to search across
  • With the exact phrase: Displays only results that match an exact phrase
  • With none of the words: Displays only results that include none of the designated words
  • From the host: Displays only results from the designated host
  • Total documents per host: While the default is to display only one document (file) per host, with an option to view more results from a host, you can modify that to show 5, 10, 25 or unlimited documents from a host.
  • With the file format: Displays only results with a specific file type. File types that are not indexed for full text search (images, videos, etc.) are not on this list.
  • With a capture date from: Displays only those results captured in a specified date range.The first two drop down menus let you specify the month and year of the earliest results you would like displayed. The second two drop down menus let you specify the month and year of the latest results you would like displayed.
    • To view all results after a specific month/year, leave the second set of drop downs blank.
    • To view all results before a specific month/year, leave the first set of drop downs blank.
    • To view results from just one specific month/year, select the same values for both sets of drop downs.

You can also use advanced search options by adding the query formats below to your search terms. 

Boolean search terms

-Default is AND: so Pope Rome will search all documents with Pope and Rome only. Boolean "not" works as well.

-Minus: Pope -Rome will only present search results for documents with Pope but without the term Rome

-Exact Phrase: put quotes around the exact phrase you want to search for, so for example:  "Pope Rome"   will return only results with that exact text string in them

You can find more information here about NutchWAX, the OpenSource tool used to provide full-text search of your archived content.

  • No labels