Project closure 'Mining and Modeling Text' (2019-2023)

31.01.2024 | General, Press Releases, Project News

The successful completion of the project, funded by the Rhineland-Palatinate Research Initiative at the Trier Center for Digital Humanities from 2019 to 2023 under the leadership of Prof. Dr. Christof Schöch and Prof. Dr. Claudine Moulin, was celebrated with a drink in the guest room of the Mensa Trier on November 9th. The Digital Humanities project has developed an innovative Linked Open Data approach in the form of a knowledge graph for the humanities. It has been applied to the French novel of the Enlightenment as an example, gaining national and international visibility for this data linking paradigm through numerous lectures, workshops, and publications.

Networked knowledge of French literary history

The most important result of the four-year project is without a doubt the MiMoTextBase, our freely available knowledge network on the history of the French novel of the Enlightenment. The project team used computer-aided methods to extract information from a wide range of sources – including bibliographic resources, primary texts from the 18th century, and current research literature. This information includes bibliographic data such as publication locations or book formats to themes, settings and protagonists, as well as sentiment trends, and stylistic similarities between texts. Via the "Linked Open Data" paradigm, these heterogeneous pieces of information can be linked to form a common knowledge base. Its contents are formally modeled, interconnected in various ways, and linked to external knowledge resources, particularly Wikidata. The numerous querying possibilities that this allows open up entirely new perspectives on both well-known and lesser-known literary-historical knowledge.

Beyond the specialized scientific knowledge in the narrower sense, over the years, participants in Mining and Modeling Text were able to acquire not only outstanding expertise in important areas such as information extraction, data modeling, data publication, and SPARQL but also explore and intensively test the methodological paradigm of Linked Open Data or the Semantic Web for literary studies, establishing the Trier Center for Digital Humanities as a nationally and internationally visible hub for this genre of Digital Humanities. This also provides a solid foundation for further diverse research on the topic of Linked Open Data in the Humanities.

Cooperations

The interdisciplinary project combines knowledge and participants from various specialist fields including Computer Science, Literary Studies, Digital Humanities, Computational Linguistics, and Law.

During the course of the project, international fellows visited the TCDH for stays of several months. Alongside the interdisciplinary and international collaboration, there was active cooperation with other partners: Joint events were organized with the Patterns network (Trier Center for Language and Communication), for instance the workshop titled “Computational Modeling of Language Phenomena“ in June 2020. Trier scientists (Prof. Ralf Münnich, Prof. Achim Rettinger, Prof. Dr. Sabine Arndt-Lappe), as well as Prof. Dr. Melanie Bell (Cambridge) contributed to this workshop.

The MiMoText team held a workshop on research data management in cooperation with the partner organization Fachinformationsdienst Romanistik. The Graduate Center Trier (GUT) was a collaborator in a series aimed at assisting young researchers in utilizing digital tools like Zotero to organize academic references.

In collaboration with Dr. Christian Reul from the University of Würzburg, who works at the Artificial Intelligence Department, the project team was able to train a model for automatic text recognition of historical prints from the 18th century. This model enables the recognition of the full text for example in scans from the French National Library with the help of machine learning technology as well as the availability for digitization.

The project team also actively sought academic cooperation in the greater Trier-Luxembourg region, among others, in cooperation with the Centre for Contemporary and Digital History at the University of Luxembourg. This collaboration involved events such as "The Use and Abuse of Word Embeddings in Digital Humanities", Digital History and Hermeneutics Lecture Series, which took place on December 4, 2019, at the University of Luxembourg, or the Scholarly Writing and Publishing Today lecture on January 30, 2020. The MiMoText team provided support in the binational organization of the 2023 annual conference of the DHd Association (Digital Humanities in the German-speaking region) at the University of Trier and the University of Luxembourg, with over 500 participants.

Lectures

Throughout the project, the project members gave over 35 presentations on the project and were present nationally and internationally. Among others, Christof Schöch gave the lecture "How Could Digital Literary Historiography Work?" at the Department of Germanic Studies at the University of Texas in Austin. MiMoText also participated in the 11th international Conference of Digital Archives and Digital Humanities in Taipei, Taiwan, presenting on "Smart Modeling for Digital Literary History". Due to the coronavirus pandemic, many conferences were held online, allowing the team to deliver multiple international presentations (Austin, USA; Taipei, Taiwan; Tokyo, Japan; Zurich, Switzerland) without any CO2 emissions or travel costs: These presentations included topics such as "Current Challenges in Computational Literary Studies" in Stockholm (Digital Humanities Now, January 27, 2021), "The French Enlightenment Novel as a Graph? Potentials and Challenges in the Construction of a Knowledge Network" in Amsterdam (Graphs and Networks in the Humanities 2022, February 03, 2022), "Mining and Modeling Literary History" in Vilnius (Lithuanian Academy of Music and Theatre, Vilnius, Lithuania, September 30, 2020), "Informationsextraktion und Linked Open Data für die Literaturgeschichtsschreibung“ in Zurich (Zentralbibliothek Zurich, September, 23, 2020), and "Pour une histoire littéraire ouverte et en réseau: le projet Mining and Modeling Text“ in Paris (Sorbonne Centre for Artificial Intelligence, April 04, 2023).

SPARQL-workshops & tutorial

During the project, the researchers acquired knowledge in the SPARQL query language, which they subsequently shared with the digital humanities community through various workshops. These workshops were conducted at the national DHd conference (Luxembourg/Trier) and the international Digital Humanities Conference (Graz), as well as in response to additional requests (University of Rostock).

In addition, a comprehensive online tutorial has been created that introduces the SPARQL query language with many illustrative examples (Hinzmann et al. 2022). It will be available on GitHub even after the project has been completed and is already being used in teaching by international researchers such as Federico Pianzola (University of Groningen).

Publications

The members of the team have published partial results of the project as well as summarizing contributions. Noteworthy are certainly our reference publication titled "Smart Modelling for Literary History", which was published in the International Journal of Humanities and Arts Computing, as well as the article "The French Enlightenment Novel as a Graph? Potentials and Challenges in the Construction of a Knowledge Network", which appeared in the conference proceedings to Graphs and Networks in the Humanities 2022.

In ongoing collaboration with the Institute for Law and Digitization Trier, numerous various jurisprudential guidelines in the area of text and data mining have been created and published.

Rechtswissenschaftliche Handreichungen zu Urheberrecht und Digital Humanities — Jurisprudential guidelines on copyright and digital humanities

The idea of making 20th and 21st-century textual corpora accessible through "derived text formats" has emerged from a collaboration between Digital Humanities and Law. These formats present texts in a copyright-free form (Schöch et al. 2020a, Schöch et al. 2020b, Raue/Schöch 2020, Kugler et al. 2022). An anthology edited by Benjamin Raue and Christof Schöch documenting the results of the partnership between the TCDH and the IRDT is currently being prepared.

In a broader understanding of academic publication formats, further publications by the project team should also be mentioned, above all the publication of the full-text corpus Collection de romans français du dix-huitième siècle (1751-1800) / Eighteenth-Century French Novels (1751-1800), which was edited by Julia Röttgermann. This also includes numerous other research data of various kinds documenting different sub-tasks of the project.

Science Communication

On the one hand, the video format was utilized at the vDhd conference, themed 'Experiments', to engage in interactive discussions with virtual conference attendees. On the other hand, the project team employed the video format to convey an impression of the SPARQL tutorial.

Videos MiMoText — Video formats: Tutorial insights and videos in pandemic-related virtual conferences

In addition to traditional articles and contributions, several blog posts (Röttgermann/Schöch 2020, Röttgermann 2023) have been created to document the project work and ensure further dissemination (via newsletter distribution by the Voltaire Foundation, University of Oxford). Foundation, University of Oxford).

In terms of communicating the results of the project, the project can look back on various formats: Besides sharing information with the university community (through an article titled “Im Netz der Daten: Informationen extrahieren und modellieren“, in: konzenTRiert, December 2020), the project also made use of podcast formats (In sechs Stationen rund um MiMoText: Einblicke in das Projekt „Mining and Modeling Text“,“ on the "RaDiHum" podcast, March 14, 2021, Wissenschaftspodcasts, Spotify).

The project work was prominently featured in a radio interview with Michael Köhler for the magazine Büchermarkt on Deutschlandfunk ("Literatur mit künstlicher Intelligenz lesen“, interview with Christof Schöch, by Michael Köhler for the magazine Büchermarkt on Deutschlandfunk, April 27, 2021).

Outlook

Fortunately, the promising subject of Linked Open Data in the humanities will continue beyond the conclusion of the project. In the context of the project, we were able to develop a project idea with numerous partners from Trier University, which was positively evaluated as part of the Rhineland-Palatinate research initiative and will be funded for at least three years. The broadly diversified new project, "LODing - Linked Open Data in the Humanities," will be soon initiated, coordinated by the Trier Center for Digital Humanities.