Text+ (NFDI-consortium)

The Text+ network will preserve text- and language-based research data in the long term and enable their broad use in science.

The Text+ infrastructure is focused on language and text data and initially concentrates on digital collections, lexical resources and editions. These are highly relevant to all language- and text-based disciplines, especially linguistics, literary studies, philosophy, classical philology, anthropology, non-European cultures and languages, and language- and text-based research in the social, economic, political, and historical sciences.

The Joint Science Conference has approved Text+ as a consortium of the nationwide initiative to build a National Research Data Infrastructure (NFDI). Text+ will officially launch in the fall of 2021 after several years of preparation and will initially be funded for five years by the German Research Foundation (DFG).

The project is divided into the working areas of Collections, Lexical Resources, Editions, and Infrastructure/Operations.

The TCDH is involved as a project partner in two data domains:

Lexical Resources

The TCDH contributes its many years of expertise and experience in retro-digitization, processing, and networking of dictionaries to Text+. The Trier Dictionary Network provides access to now 49 different digital dictionaries, including both resources provided by the TCDH itself and dictionaries published by other institutions. Within the task area "Lexical Resources", the TCDH is involved in the implementation of Federated Content Search and provides various dictionaries for integration into the cross-resource interface. To improve the interoperability of lexicographic data, these are gradually being aligned with the de facto standard TEI Lex-0.

Collections

In the Text+ Task Area Collections, the TCDH is involved in researching and evaluating the work with derived text formats. When using digitized texts as research data, there is often the problem that the text data is protected by copyright and therefore cannot be published. Publishing the text data in derived text formats can enable research results to remain transparent and reproducible despite this, by removing the copyrighted information from the original texts. In the context of Text+, the focus is mainly on how well the derived text formats remain usable for different text and data mining tasks and to what extent they can be reconstructed (e.g., by Large Language Models). For a more detailed presentation of the work on derived text formats, see also the blog post „Abgeleitete Textformate: (Nach-)nutzbarkeit, Wiedererkennbarkeit und Rekonstruierbarkeit“.

Team TCDH

Anne Klee
E-mail: kleeatuni-trier [dot] de
Phone: +49 651 201-3120

Dr Joëlle Weis
E-mail: weisatuni-trier [dot] de
Phone: +49 651 201-3017

Dr Matthias Bremm
E-mail: bremmatuni-trier [dot] de
Phone: +49 651 201-2679

Dr Thomas Burch
E-mail: burchatuni-trier [dot] de
Phone: +49 651 201-3364

Dr. Keli Du
E-mail: dukatuni-trier [dot] de
Phone: +49 651 201-3377

Prof Dr Christof Schöch
E-mail: schoechatuni-trier [dot] de
Phone: +49 651 201-3264

Topics