Abstract:
We propose subject matter expert rened topic (SMERT) allocation, a generative probabilistic model applicable to clusteringfreestyle text. SMERT models are three-level hierarchical Bayesian models in which each item is modeled as a nite mixture over aset of topics. In addition to discrete data inputs, we introduce binomial inputs. These ‘high-level’ data inputs permit the ‘boosting’or afrming of terms in the topic denitions and the ‘zapping’ of other terms. We also present a collapsed Gibbs sampler for efcientestimation. The methods are illustrated using real world data from a call center. Also, we compare SMERT with three alternativeapproaches and two criteria.