Mining and Modeling Text. Linked Open Data für die Literaturgeschichtsschreibung

Guest lecture by Christof Schöch & Maria Hinzmann as part of the "Digital History" research colloquium

MiMoText Logo

Date:

27.01.2021

Place:

Online via Zoom.

Categories:

Event

The lecture presents the research project Mining and Modelling Text (MiMoText for short), whose aim is to further develop the field of quantitative methods for extracting, modelling and analysing information relevant to the humanities from extensive text collections and to research it from an interdisciplinary (humanities, computer science and law) perspective.

The lecture presents the research project Mining and Modelling Text (MiMoText for short), whose aim is to further develop the field of quantitative methods for extracting, modelling and analysing information relevant to the humanities from extensive text collections and to research it from an interdisciplinary (humanities, computer science and law) perspective. The primary application domain is initially French literary history of the second half of the 18th century - the transfer to other domains and disciplines (other philologies, but also, for example, philosophy, history and art studies) is planned and will be considered from the start of the project.

A central starting point concerns the fact that the literary-historical research findings accumulated over about two centuries are largely not directly usable because they are very extensive, distributed among different sources and locations and not available in digital form. Through the digitisation activities at libraries and archives, more and more extensive stocks of texts and data are now becoming available digitally, but these can no longer be systematically recorded through human reading. This is where MiMoText comes in: On the basis of three different types of information sources (metadata from reference systems, text properties from primary texts, factual information from research literature), methods of information extraction ('mining') and data modelling ('modelling') following the 'Linked Open Data' paradigm are intertwined.

The combination of the three types of information sources with the four methodological RAs creates a unique literary-historical knowledge network that can gradually grow, become increasingly dense and be linked to the outside world. The aim is to provide a kind of "Wikidata for literary history" with a SPARQL endpoint that can offer added value for interested parties from different disciplines (literary studies, cultural studies, history, media studies, information studies) in various usage scenarios (both research and teaching).


Keywords: Text Mining, Legal Language