Ayuda
Ir al contenido

Dialnet


A statistical significance testing approach to mining the most informative set of patterns

  • Autores: Jefrey Lijffijt, Panagiotis Papapetrou, Kai Puolamäki
  • Localización: Data mining and knowledge discovery, ISSN 1384-5810, Vol. 28, Nº 1, 2014, págs. 238-263
  • Idioma: inglés
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • Hypothesis testing using constrained null models can be used to compute the significance of data mining results given what is already known about the data. We study the novel problem of finding the smallest set of patterns that explains most about the data in terms of a global pvalue. The resulting set of patterns, such as frequent patterns or clusterings, is the smallest set that statistically explains the data. We show that the newly formulated problem is, in its general form, NP-hard and there exists no efficient algorithm with finite approximation ratio. However, we show that in a special case a solution can be computed efficiently with a provable approximation ratio. We find that a greedy algorithm gives good results on real data and that, using our approach, we can formulate and solve many known data-mining tasks. We demonstrate our method on several data mining tasks. We conclude that our framework is able to identify in various settings a small set of patterns that statistically explains the data and to formulate data mining problems in the terms of statistical significance.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno