Text+ (NFDI-consortium)

Project Management: Leibniz-Institut für Deutsche Sprache, Mannheim · Berlin-Brandenburgische Akademie der Wissenschaften Berlin · Deutsche Nationalbibliothek · Niedersächsische Staats- und Universitätsbibliothek Göttingen · Nordrhein-Westfälische Akademie der Wissenschaften und der Künste

Sponsors: Deutsche Forschungsgemeinschaft (DFG)

Running time: 2021 - 2026

Contact person (TCDH): Prof Dr Christof Schöch; Dr Thomas Burch; Dr Joëlle Weis

Research Area: Software Systems and Research Infrastructure, Digital Literary and Cultural Studies, Digital Edition and Lexicography

Keywords: Dissemination and Community Building in the DH

Website of the Project: text+

The Text+ network will preserve text- and language-based research data in the long term and enable their broad use in science.

The Text+ infrastructure is focused on language and text data and initially concentrates on digital collections, lexical resources and editions. These are highly relevant to all language- and text-based disciplines, especially linguistics, literary studies, philosophy, classical philology, anthropology, non-European cultures and languages, and language- and text-based research in the social, economic, political, and historical sciences.

The Joint Science Conference has approved Text+ as a consortium of the nationwide initiative to build a National Research Data Infrastructure (NFDI). Text+ will officially launch in the fall of 2021 after several years of preparation and will initially be funded for five years by the German Research Foundation (DFG).

The project is divided into the working areas of Collections, Lexical Resources, Editions, and Infrastructure/Operations.

The TCDH is involved as a project partner in two data domains:

Lexical Resources

The TCDH contributes its many years of expertise and experience in retro-digitization, processing, and networking of dictionaries to Text+. The Trier Dictionary Network provides access to now 49 different digital dictionaries, including both resources provided by the TCDH itself and dictionaries published by other institutions. Within the task area "Lexical Resources", the TCDH is involved in the implementation of Federated Content Search and provides various dictionaries for integration into the cross-resource interface. To improve the interoperability of lexicographic data, these are gradually being aligned with the de facto standard TEI Lex-0.

Collections

In the Text+ Task Area Collections, the TCDH is involved in researching and evaluating the work with derived text formats. When using digitized texts as research data, there is often the problem that the text data is protected by copyright and therefore cannot be published. Publishing the text data in derived text formats can enable research results to remain transparent and reproducible despite this, by removing the copyrighted information from the original texts. In the context of Text+, the focus is mainly on how well the derived text formats remain usable for different text and data mining tasks and to what extent they can be reconstructed (e.g., by Large Language Models). For a more detailed presentation of the work on derived text formats, see also the blog post „Abgeleitete Textformate: (Nach-)nutzbarkeit, Wiedererkennbarkeit und Rekonstruierbarkeit“.