Ayuda
Ir al contenido

Dialnet


Session variability compensation in automatic speaker and language recognition

  • Autores: Javier González Domínguez
  • Directores de la Tesis: Joaquín González Rodríguez (dir. tes.)
  • Lectura: En la Universidad Autónoma de Madrid ( España ) en 2011
  • Idioma: inglés
  • Tribunal Calificador de la Tesis: Luis Alfonso Hernández Gómez (presid.), Doroteo Torre Toledano (secret.), Eduardo López Gonzalo (voc.), Daniel Ramos Castro (voc.), Pietro Laface (voc.), Ascensión Gallardo Antolín (voc.), Francisco Javier Hernando Pericas (voc.)
  • Materias:
  • Enlaces
  • Resumen
    • Robust and accurate automatic speaker and language recognition, through the voice signal, remains a challenge for the scientific community mainly due to an old and well-known ¿enemy¿: the session variability, defined as the set of variations among recordings belonging to a same identity (either speaker or language respectively).

      During the past decades the issue of compensating/removing undesired variability effects has been broadly accepted as one of the biggest challenges in the field, giving rise to a number of publications full of new manners of somehow avoiding or cleaning the distortions present in the speech signal. However, major advances in the field have not been achieved until the development of new schemes based on Factor Analysis (FA) modelling. This fact responds to the conjunction of several ideas, properly combined in FA, which can be roughly summed up in two key points. First, exploiting prior knowledge in order to model session variability rather than directly removing it; and second, considering session variability as a continuous source rather than a discrete one.

      This Ph.D. Thesis is focused on the study, analysis and development of new forms to palliate in a proper way the effects of the session variability problem through recent compensation schemes based on classical FA. In this sense, an extent analysis of the use and mathematical background of FA-based techniques, from the eigen-channels approach to more sophisticated schemes such as Joint Factor Analysis has been conducted.

      Further, a special focus has been placed on the use of FA techniques applied to challenging scenarios, as those where the available background data is far from target conditions or the amount of train/test speech is very limited. This is a common case in the increasingly relevant forensic speaker recognition area. Regarding the experimental framework, well-defined and challenging recent automatic speaker and language recognition evaluations (SRE¿08 and LRE¿09 respectively) have been employed to assess the proposed and studied methods.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno