Arbit Scratchpad

Topic modeling - Machine Learning

Probabilistic Topic modeling provides methods for organizing, understanding, searching, and summarizing large electronic archives. Latent Dirichlet Allocation (LDA) : The simple intuition behind LDA is that documents exhibit multiple topics. In reality, we only observe the documents, the other structure are hidden variables Our goal is to infer the hidden variables i.e. compute their distribution conditioned on the documents: p(topics, proportions, assignments|documents) . Gibbs sampling for LDA - Here we sample the topic of a word in one of the documents, given the topics of all other words, the topic distributions and the data. A sample from a recent coursework at Chalmers below: Another instance shown below - for 10 topics:

Arbit Scratchpad

Search This Blog

Posts

Topic modeling - Machine Learning