Resumen de Efficient histogram dictionary learning for text/image modeling and classification

In dealing with text or image data, it is quite effective to represent them as histograms. In modeling histograms, although recent Bayesian topic models such as latent Dirichlet allocation and its variants are shown to be successful, they often suffer from computational overhead for inference of a large number of hidden variables. In this paper we consider a different modeling strategy of forming a dictionary of base histogramswhose convex combination yields a histogram of observable text/image document. The dictionary entries are learned from data, which establishes direct/indirect association between specific topics/keywords and the base histograms. From a learned dictionary, the coding of an observed histogram can provide succinct and salient information useful for classification. One of our main contributions is that we propose a very efficient dictionary learning algorithm based on the recent Nesterov’s smooth optimization technique in conjunction with analytic solution methods for quadratic minimization sub-problems. Not alone the faster theoretical convergence rate, also in real time, our algorithm is 20–30 times faster than general-purpose optimizers such as interior-point methods. In classification/annotation tasks on several text/image datasets, our approach exhibits comparable or often superior performance to existing Bayesian models, while significantly faster than their variational inference.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: