Researching with derivatives
Solutions for providing copyrighted materials online
Project Management: Prof. Dr. Bela Gipp (Niedersächsische Staats- und Universitätsbibliothek Göttingen) · Prof. Dr. Benjamin Raue (Universität Trier – Fachbereich V (Rechtswissenschaft)) · Prof. Dr. Thomas Stäcker (Universitäts- und Landesbibliothek Darmstadt) · Frank Scholze (Deutsche Nationalbibliothek) · Prof Dr Christof Schöch (Universität Trier - Computerlinguistik & Digital HumanitiesUniversität Trier - Trier Center for Digital Humanities (TCDH))
Sponsors: Deutsche Forschungsgemeinschaft (DFG)
Running time: -
Contact person (TCDH): Prof Dr Christof Schöch
Research Area: Digital Literary and Cultural Studies
Keywords: Digital Technologies and Tools, 20th century, 21th century, Quantitative Analysis, Text Collections, Text Mining
In the modern information society, it is commonly assumed that digitisation and networking lead to access to ever more extensive stocks of data, information and knowledge. However, for a variety of reasons, whether technical, legal or social, there are also limits and restrictions to this accessibility, which can lead to a significant underrepresentation of certain data, information and knowledge stocks. In the area of legal norms, it is particularly the role of copyright law to contribute to a balance of interests between copyright holders (such as authors or publishers) on the one hand and users (such as citizens or academia) on the other. From a scientific point of view, extensive collections of recent texts are particularly affected by restrictions. This is particularly evident in the limited availability or restricted usability of corpora of recent research literature, newspaper texts or literary texts.
Until now, however, access to these documents has been restricted by copyright regulations – despite the reform of copyright law, in particular with the introduction of the TDM barrier for science (Section 60d UrhG) – and only the use of the complete documents on site or via appropriate licences for a limited group of users has been possible. Christof Schöch et al. (2020) were able to show that working with such materials in the form of derived text formats (ATF) is possible despite restrictive copyright conditions, pointing out the largely untapped potential that arises from the provision of derived text formats for a wide range of research purposes based on automatic text analysis.
The overall objective of the pilot project is to take a decisive step forward in advancing the possibilities of text-based research on copyright-protected data by developing and providing suitable derived text formats (ATF).
The TCDH contributes to the project by leading the work package that evaluates the suitability of various ATFs, as well as through various contributions to the other work packages.
The overall project is divided into the following four main areas:
- Identification and documentation of scientific questions and analysis scenarios typical for the community that could be answered not only with full texts but also with derivatives (ATF),
- Legal classification of different ATF, taking into account copyright relevance, reconstructability, recognisability, also depending on ATF parameters,
- Evaluation of the validity and performance of relevant methods when applied to ATF of literary and scientific texts in comparison to the original text, as well as proposals for the documentation and persistent addressing of ATF and for its use and citation in a scientific context,
- Pilot implementation, documentation and publication of a suite of ATF for a collection of literary and scientific texts protected by copyright.
At the end of the project, a documented overview of derivatives and possible methods for the use cases under consideration should be available. This overview should have been discussed from a legal perspective and offer solutions for the respective cases in order to publish the respective ATFs in a legally compliant manner. Two pilot projects are to demonstrate the concrete applicability and illustrate the different derivatives in the context of legally protected data and free reference corpora.
__
Schöch, Christof, Frédéric Döhl, Achim Rettinger, Evelyn Gius, Peer Trilcke, Peter Leinen, Fotis Jannidis, Maria Hinzmann, Jörg Röpke. 2020. „Abgeleitete Textformate: Text und Data Mining mit urheberrechtlich geschützten Textbeständen“. Zeitschrift für digitale Geisteswissenschaften 5. https://doi.org/10.17175/2020_006.
Team TCDH
Prof Dr Christof Schöch
E-mail: schoech
uni-trier [dot] de
Phone: +49 651 201-3264
Zita Baronnet
E-mail: baronnet
uni-trier [dot] de
Phone: +49 651 201-1302