Future of Czech Web Archive
Jaroslav Kvasnica, National Library of the Czech Republic, Czech Republic
Rudolf Kreibich, National Library of the Czech Republic
Documents to download
In presentation we will introduce Czech web archive (webarchiv.cz), its purpose is long-term preservation and access of Czech national digital online resources. We will introduce web acquisition and data preservation technology. Whoever central topic of the presentation will be new ways to access our collection. We think that contemporary ways to access archived data is not satisfactory for research purposes, we would like to introduce new access paradigm for web archives.
We will present idea of big open datasets, resulting from different analysis of hundreds of Terabytes stored in our archive. In technological part of our talk, we will introduce Hadoop based technologies, enabling analytics of huge data collection. Our central strategy is to motivate researchers from different areas to explore and research data collection originating in 2001.
Author's professional CV
Jaroslav Kvasnica is head of Web Archiving department in the National Library of the Czech Republic. In past he was a curator focused on metadata of long-term preservation repository of National Digital Library. Currently he is working on digital preservation of web resources.