Evaluating Hyperparameter Alpha of LDA Topic Modeling

Lecture at DHd 2022, Panel "Machine Learning in Literary Studies".

DHd 2022 Logo

Date:

11.03.2022

Place:

via zoom

11:15 – 12:45

Categories:

Conference
Vortrag im Rahmen der DHd 2022 im Panel „Maschinelles Lernen in der Literaturwissenschaft“.

V7_1: Machine learning in literary studies, Chair: Christof Schöch (Universität Trier)

  • Evaluating Hyperparameter Alpha of LDA Topic Modeling | Keli Du

As a quantitative text analytic method, Latent Dirichlet Allocation (LDA) topic modeling has been widely used in Digital Humanities in recent years to explore numerous unstructured text data. When topic modeling is used, one has to deal with many parameters that can influence the result of the modeling such as the hyperparameter Alpha and Beta, topic number, document length, number of iterations of model-updating. The present research has evaluated the influence of hyperparameter Alpha in topic modeling on a newspaper corpus and a literary text corpus from two perspectives, document classification and topic coherence. The results show that one should avoid training topic models with setting Alpha of each topic higher than 1 if one wants to ensure better topic modeling based document classification and more coherent topics.

  • Adapting Coreference Algorithms to German Fairy Tales | David Schmidt*, Markus Krug, Frank Puppe
  • Verwendung von Wissensgraphen zur inhaltlichen Ergänzung kleinerer Textkorpora | Thora Hagen*

Projects: Zeta and Company

Keywords: Text Mining, Quantitative Analysis