Postersession im Rahmen der DHd 2022
17:00 – 19:00 Uhr
Contrastive Text Analysis with pydistinto - A Python Package for the Use of Different Distinctiveness Measures | Keli Du*, Julia Dudar*, Cora Rok, Christof Schöch
In Computational Literary Studies (CLS), statistical distinctiveness measures are used to determine features that are characteristic of one text group compared to another text group. However, most existing tools prove to be inadequate when user:s want to customize their analyses and make their own parameter settings or use specific data formats. To facilitate the use of relevant measures for contrastive text analysis and to raise awareness of the diversity of measures, we are developing a Python package called pydistinto. With the help of pydistinto, users:inside even with little programming and statistical knowledge can compare two text corpora with different measures, and in an advanced mode also empirically determine and contrast the properties and performance of the different measures. Through tables and figures, the planned poster will mainly present the following aspects of our package: the possibilities of preprocessing the text data, the implemented distinctiveness measures, and the visualization of the contrastive analysis results.
Linked Open Data for literary historiography: the project "Mining and Modeling Text" | Maria Hinzmann, Christof Schöch, Katharina Dietz, Anne Klee, Katharina Erler-Fridgen, Julia Röttgermann, Moritz Steffes
In dealing with the ever-growing 'digital cultural heritage', the further development of systematic data indexing and knowledge representation offers hitherto unexploited potentials for literary historiography. Against this background, the project "Mining and Modeling Text" (MiMoText) interweaves quantitative methods of information extraction ('mining') and data modeling ('modeling') in order to build up an information system for literary history. Transferability to other domains will be considered. The central concern is to further develop the field of quantitative methods for extracting, modeling, and analyzing information relevant to the humanities from extensive text collections and to explore it from an interdisciplinary (humanities, computer science, and law) perspective.