What is web archiving? ****************************************************************************************** * Why is web archived? ****************************************************************************************** Archived internet will serve as a basic source of information for future researchers. Vast scientific and cultural information is nowadays published only in a digital form. Web cont is short-lived: it quickly changes, links rot, and information that was online yesterday i This is why various institutions interested in preserving data harvest and archive also in ****************************************************************************************** * Web archiving technology ****************************************************************************************** To harvest or scrap the content of internet pages, the Webarchiv of the National Library o Republic, like many other institutions, uses the Heritrix [ URL "https://webarchive.jira.c heritrix"] web crawler. Smooth and efficient harvesting, however, requires further extensi The crawler browses the web, harvests content, and creates snapshots of pages at a particu in time. It also creates an index, which is then uses to emulate archives pages in order t accessible. Archived content is stored in ARC or WARC [ URL "http://www.digitalpreservation.gov/format fdd000236.shtml"] XML containers, which not only store web content but also supplement it and administrative metadata.