List of all accepted papers (sorted by name)

Presentation of Cultural Heritage : Integration - Aggregation - Contextualisation
Session coordinator: Zdeněk Uhlíř,  National Library of the Czech Republic
Where: 25. 5. 2004, 14:00 - 17:00, Auditorium D

ACT - Computer Processing of Written Cultural Heritage Sources

Author: Kiril Ribarov,  Charles University - Institute of Formal and Applied Linguistics, Czech Republic

Co-author: Jiří Bubník, Jiří Čelák, Vojtěch Janota, Alexandr Kára, Václav Novák, Tomáš Vondra, Matematicko-fyzikální fakulta UK


PDF file  PDF file


The aim of this work is to present the ACT (Annotated Corpora of Text) package: software tools for lexical and corpus processing of (European) written cultural sources.

ACT is suitable for manipulation and capturing of rich language variability on word and sentential level. It is not the word-form, but its understandings that become central processing units, which can be assigned morphology distinctions, headwords (including redactional), translation equivalents, complexes (multi-word units), and correlated to other sources. The whole annotation process is automated, and private sorting orders and morphology tags structures can be individually defined.

ACT incorporates modules for:

  • complex searches on one or more sources
  • creation of various ready-to-use documents (in various output formats) as index verborum, retrograde index, index of concordances, frequency lists and others, from one or more sources
  • on-line web-based queries for text and image access (ACT-Web)

The ACT annotation module also exists in a light version (ACT-light), which is used for off-line document processing.

The last to present is the ACT-Distiller tool, used for incorporation of lexical card-files into a corpus through an algorithm for card-file context binding and text-from-card-files reconstruction.

As designed we believe that ACT contributes towards contextualized and intelligent heritage Information Technology framework. ACT is used for processing of mediaeval Slavonic manuscripts.

About the author

RNDr. Kiril Ribarov was born in Ohrid, Macedonia, in 1971. In 1989 he initiates his studies at the Electrotechnical faculty in Skopje, Macedonia. In 1992 he moves to the Czech Republic and in 1996 he completes his studies in Informatics at the Faculty of Mathematics and Physics at the Charles University in Prague (CU). From 1996 he specializes in mathematical linguistics and works at the Institute of Formal and Applied Linguistics, later at the Center for Computational Linguistics at CU. He teaches at the Faculty of Mathematics and Physics at CU, at the Czech Technical University, and at the Anglo-American College in Prague. His publications are from the area of automatic methods in natural language processing, from the area of computer processing of written cultural heritage, and from the area of non-linear phenomena in relation to natural languages. Kiril Ribarov is the author of the framework for computer processing of Old-Church Slavonic manuscripts by which the first annotated corpus of Old-Church Slavonic texts has been created. From 2003 he co-chairs the Commission for Computer Processing of Medieval Slavonic Manuscripts and Early Printed Books at the International Congress of Slavists.

Other papers in this session:

On the Disappearance of the Library

Author: Torsten Schaßan, University of Cologne, Germany

Vision of Semantic Processing and Latest Trends

Author: Nerutė Kligienė, Institute of Mathematics and Informatics, Lithuania

Conceptual Framework of Virtual Research Environment

Author: Zdeněk Uhlíř, National Library of the Czech Republic

Manuscriptorium - New Research Environment for the Sphere of Historical Book Resources

Author: Stanislav Psohlavec, AiP Beroun, Czech Republic

Digitazing Centre of Academy of Sciences CR

Author: Martin Lhoták, Library of Academy of Sciences CR, Czech Republic

Back to the programme