ACT - Computer Processing of Written Cultural Heritage Sources
Author: Kiril Ribarov, Charles University - Institute of Formal and Applied Linguistics, Czech Republic
Co-author: Jiří Bubník, Jiří Čelák, Vojtěch Janota, Alexandr Kára, Václav Novák, Tomáš Vondra, Matematicko-fyzikální fakulta UK
The aim of this work is to present the ACT (Annotated Corpora of Text) package: software tools for lexical and corpus processing of (European) written cultural sources.
ACT is suitable for manipulation and capturing of rich language variability on word and sentential level. It is not the word-form, but its understandings that become central processing units, which can be assigned morphology distinctions, headwords (including redactional), translation equivalents, complexes (multi-word units), and correlated to other sources. The whole annotation process is automated, and private sorting orders and morphology tags structures can be individually defined.
ACT incorporates modules for:
- complex searches on one or more sources
- creation of various ready-to-use documents (in various output formats) as index verborum, retrograde index, index of concordances, frequency lists and others, from one or more sources
- on-line web-based queries for text and image access (ACT-Web)
The ACT annotation module also exists in a light version (ACT-light), which is used for off-line document processing.
The last to present is the ACT-Distiller tool, used for incorporation of lexical card-files into a corpus through an algorithm for card-file context binding and text-from-card-files reconstruction.
As designed we believe that ACT contributes towards contextualized and intelligent heritage Information Technology framework. ACT is used for processing of mediaeval Slavonic manuscripts.
About the author
RNDr. Kiril Ribarov was born in Ohrid, Macedonia, in 1971. In 1989 he initiates his studies at the Electrotechnical faculty in Skopje, Macedonia. In 1992 he moves to the Czech Republic and in 1996 he completes his studies in Informatics at the Faculty of Mathematics and Physics at the Charles University in Prague (CU). From 1996 he specializes in mathematical linguistics and works at the Institute of Formal and Applied Linguistics, later at the Center for Computational Linguistics at CU. He teaches at the Faculty of Mathematics and Physics at CU, at the Czech Technical University, and at the Anglo-American College in Prague. His publications are from the area of automatic methods in natural language processing, from the area of computer processing of written cultural heritage, and from the area of non-linear phenomena in relation to natural languages. Kiril Ribarov is the author of the framework for computer processing of Old-Church Slavonic manuscripts by which the first annotated corpus of Old-Church Slavonic texts has been created. From 2003 he co-chairs the Commission for Computer Processing of Medieval Slavonic Manuscripts and Early Printed Books at the International Congress of Slavists.
Other papers in this session:
Author: Torsten Schaßan, University of Cologne, Germany
Author: Nerutė Kligienė, Institute of Mathematics and Informatics, Lithuania
Author: Zdeněk Uhlíř, National Library of the Czech Republic
Author: Stanislav Psohlavec, AiP Beroun, Czech Republic
Author: Martin Lhoták, Library of Academy of Sciences CR, Czech Republic