Towards Topic Modeling for Big Data

Provided by: Association for Computing Machinery
Topic: Big Data
Format: PDF
Latent Dirichlet Allocation (LDA) is a popular topic modeling technique in academia but less so in industry, especially in large-scale applications involving search engines and on-line advertisement systems. A main underlying reason is that the topic models used have been too small in scale to be useful; for example, some of the largest LDA models reported in literature have up to (10)3 topics, which cover difficultly the long-tail semantic word sets. In this paper, the authors show that the number of topics is a key factor that can significantly boost the utility of topic-modeling system.

Find By Topic