Algorithms for the Automatic Tagging of Medieval Manuscripts


Project Management: Technische Universität Darmstadt – Institut für Sprach- und Literaturwissenschaft (linglit)

Project Participants: Karlsruher Institut für Technologie (KIT) · Wissenschaftliche Bibliothek der Stadt Trier

Sponsors: BMBF – Bundesministerium für Bildung und Forschung

Running time: -

Contact person (TCDH): Dr Thomas Burch

Research Area: Software Systems and Research Infrastructure

Keywords: Quantitative Analysis, Manuscripts


Website of the Project: eCodicology

The aim of the BMBF-funded joint project “eCodicology” was the development, testing and optimization of new algorithms that automatically recognize macro and microstructural elements of manuscript pages and embed them in the metadata of the images. Examples of such structural elements were data such as page size, text space, marginalia, paratexts, information on the type and position of graphic elements, the relationship between image and text. This information was statistically and qualitatively evaluated and enables the analysis of questions about the writer's corpora, writing schools, references to manuscripts, provenances, connections between displaced manuscripts and the like.

The three-year research project used the metadata and scans of around 500 medieval codes developed in the St. Matthias' Virtual Scriptorium project and already described in accordance with TEI standards. Based on this, a metadata scheme was designed that automatically records the external descriptive features of a code as far as possible and at the same time documents precisely for each page of a manuscript. The newly acquired metadata was simultaneously stored as XML tags in the associated metadata and can therefore be used flexibly. Due to the integration of both IT and philological approaches, the project should make a significant contribution to the methodological development of eHumanities.