LREC Workshop

Leveraging Derived Text Formats to Unlock Copyrighted Collections for Open Science

Text+

Datum:

12.05.2026

Ort:

Palma de Mallorca

Kategorie(n):

Workshop
The workshop Leveraging Derived Text Formats to Unlock Copyrighted Collections for Open Science will be held at the Language Resources and Evaluation Conference (LREC 2026). Derived Text Formats (DTF), also known as extracted features, offer a promising solution for enabling research on textual data that cannot be shared in its original form due to copyright or privacy restrictions. This workshop brings together researchers, legal experts, and infrastructure providers to explore the creation, standardization, legal framing, and scientific use of derived data in linguistics, digital humanities, and language technology.

Program

Tuesday, May 12, 2026

Session 1: Overview

14:00–15:30 · Room 9 · Chair: Philippe Genêt (Deutsche Nationalbibliothek)

  • 14:00–14:10: Welcome and Introduction
  • 14:10–14:30: Derived Text Formats as Strategic Transformations of In-Copyright Materials to Support Open Science: A Survey (Christof Schöch)
  • 14:30–14:50: A Multi-dimensional Constrained Framework for Derived Text Formats (Keli Du, Christof Schöch)
  • 14:50–15:10: Legal implications of Derived Text Formats – a copyright perspective (Gianna Iacino, Pawel Kamocki, Keli Du)
  • 15:10–15:30: Revisiting Masking After Fifteen Years: Early Approaches to Non-Reconstructable Linguistic Data in the current context (Georg Rehm, Thorsten Trippel, Andreas Witt)
  • 15:30–16:00: Break

Session 2: Applications

16:00–18:00 · Room 9 · Chair: Piroska Lendvai (Bavarian Academy of Sciences and Humanities)

  • 16:00–16:20: Multi-Label Text Classification of Derived Text Formats with DistilBERT (Jennifer Ecker, Roman Schneider)
  • 16:20–16:40: Training data generation for context-dependent rubric-based short answer grading (Pavel Šindelář, Filip Prášil, Dávid Slivka, Christopher Bouma, Ondrej Bojar)
  • 16:40–17:00: DUO_DE A1: An Annotated Corpus of Online Learning Material for Beginning Learners of German as a Foreign Language (Jammila Laâguidi, Vitaliia Ruban, Ronja Laarmann-Quante, Anastasia Drackert)
  • 17:00–17:20: Why Reconstructing Scrambled Texts Fails (Keli Du, Christof Schöch)
  • 17:20–17:40: DIN 19461: A National Standard for Derived Text Formats (Thorsten Trippel, Florian Barth, Jose Calvo Tello, Keli Du, Philippe Genêt, Daniel Kurzawe, Peter Leinen, Piroska Lendvai, Christof Schöch, Andreas Witt, Arden Zimmermann)
  • 17:40–18:00: Final discussion and closing

Workshop Organisers

  • Florian Barth, Göttingen State and University Library
  • Keli Du, University of Trier
  • José Calvo Tello, Göttingen State and University Library
  • Philippe Genêt, German National Library
  • Piroska Lendvai, Bavarian Academy of Sciences and Humanities
  • Christof Schöch University of Trier
  • Thorsten Trippel, University of Tübingen and Leibniz-Institut für Deutsche Sprache