Preserving Arizona’s Web, One Crawl at a Time

Did you know that part of the State Archive’s responsibilities include archiving the websites of the Arizona State Government? Electronic media is the primary way that people communicate in the present, therefore preserving digital-born material is vital for future generations of researchers. Since we began web archiving in 2007, we’ve crawled 68.1 million internet-based documents and collected 4.5 terabytes of data!

Using Archive-It software, we “crawl” the State of Arizona’s websites and harvest digital-born content, including documents, videos and images. These crawls are essentially snapshots of a page that once captured, future researchers may revisit and interact with as though those sites were still running. The public has online access to these collections 24/7, and our entire corpus is easily searched with a click of a button. If only paper-based research was so easy!

Like traditional archiving, we create collections of related sites for researchers. Our primary goal is to preserve the websites of Arizona’s state agencies and departments. Facebook, Twitter and other social media sites of our elected officials are also essential sources of information.  We also preserve special interest sites that may be thematic or based around a specific event, such as Arizona’s centennial celebration in 2012. Websites are constantly changing, therefore it is important that we crawl our collections regularly.

The web archiving project is made possible by a grant provided by the Library Services and Technology Act. To access the Arizona Web Archive, follow this link to our partner page at Archive-It. Content on the page will be updated frequently over the coming months so check it often! You may contact us with questions or comments about the project at archives@azlibrary.gov.

AHSurl1

Web archivists comb through thousands of URLs to make sure only necessary webpages are archived. They use a programming language called regular expressions to expand or narrow the scope of each web crawl.

AHSReport

Some websites may block certain crawlers from capturing their pages or lead them into traps of endless URLs. Archivists must test and retest each website to make sure Archive-It captures a complete and efficient snapshot of the page.

AHSWayback

Once captured, web data forms a complete picture of the website at the time it was crawled. Using Archive-It’s Wayback Machine, researchers may experience the Arizona Historical Society’s page as it appeared when it was crawled on September 16, 2014.

 

 

Advertisements