Evaluating Hyperparameter Alpha of LDA Topic Modeling

Vortrag im Rahmen der DHd 2022, Panel „Maschinelles Lernen in der Literaturwissenschaft“

DHd 2022 Logo

Datum:

11.03.2022

Ort:

via zoom

11:15 – 12:45

Kategorie(n):

Tagung
Vortrag im Rahmen der DHd 2022 im Panel „Maschinelles Lernen in der Literaturwissenschaft“.

V7_1: Maschinelles Lernen in der Literaturwissenschaft, Chair: Christof Schöch (Universität Trier)

  • Evaluating Hyperparameter Alpha of LDA Topic Modeling | Keli Du

As a quantitative text analytic method, Latent Dirichlet Allocation (LDA) topic modeling has been widely used in Digital Humanities in recent years to explore numerous unstructured text data. When topic modeling is used, one has to deal with many parameters that can influence the result of the modeling such as the hyperparameter Alpha and Beta, topic number, document length, number of iterations of model-updating. The present research has evaluated the influence of hyperparameter Alpha in topic modeling on a newspaper corpus and a literary text corpus from two perspectives, document classification and topic coherence. The results show that one should avoid training topic models with setting Alpha of each topic higher than 1 if one wants to ensure better topic modeling based document classification and more coherent topics.

  • Adapting Coreference Algorithms to German Fairy Tales | David Schmidt*, Markus Krug, Frank Puppe
  • Verwendung von Wissensgraphen zur inhaltlichen Ergänzung kleinerer Textkorpora | Thora Hagen*

Schlagworte: Text Mining, quantitative Analysen