Ayuda
Ir al contenido

Dialnet


Resumen de Bayesian nonparametric models for data exploration

Mélanie Natividad Fernández Pradier

  • Making sense out of data is one of the biggest challenges of our time. With the emergence of new technologies such as the Internet, sensor networks or deep genome sequencing, a true data explosion has been unleashed that affects our everyday’s life and all fields of science. Latest breakthroughts, such as self-driven cars or champion Go player programs, have demonstrated the potential benefits from exploiting data, mostly in well-defined supervised tasks. However, we have barely started to actually explore and truly understand data.

    In fact, data holds valuable information for answering most important questions for humanity: How does aging impact our well-being? What are the underlying mechanisms of cancer? Which factors make countries wealthier than others? Most of these questions cannot be stated as well- defined supervised problems, and might benefit enormously from multidisciplinary research efforts involving easy-to-interpret models and rigorous data exploratory analyses. Efficient data exploration might lead to life-changing scientific discoveries, which can later be turned into a more impactful ex- ploitation phase, to put forward more informed policy recommendations, decision-making systems, medical protocols or improved models for highly accurate predictions.

    This Thesis proposes tailored Bayesian nonparametric (BNP) models to solve specific data ex- ploratory tasks across different scientific areas including sport sciences, cancer research, and eco- nomics. We resort to BNP approaches to facilitate the discovery of unexpected hidden patterns within data. BNP models place a prior distribution over an infinite-dimensional parameter space, which makes them particularly useful in probabilistic models where the number of hidden param- eters is a priori unknown. Under this prior distribution, the posterior distribution of the hidden pa- rameters given the data will assign high probability mass to those configurations that best explain the observations. Hence, inference over the hidden variables can be performed using standard Bayesian inference techniques, therefore avoiding expensive model selection steps.

    This Thesis is application-focused and highly multidisciplinar. We propose an automatic grading system to compare athletic performance regardless of age, gender and environmental aspects; we de- velop BNP models to perform genetic association and biomarker discovery in cancer research,either using genetic information and Electronic Health Records or clinical trial data; finally, we present an infinite Poisson factorization model of international trade data to understand the underlying economic structure of countries.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus