Ayuda
Ir al contenido

Dialnet


Resumen de Data Augmentation for Pipeline-Based Speech Translation

Diego Alves, Askars Salimbajevs, Marcis Pinnis

  • Pipeline-based speech translation methods may suffer from errors found in speech recognition system output. Therefore, it is crucial that machine translation systems are trained to be robust against such noise. In this paper, we propose two methods for parallel data augmentation for pipeline-based speech translation system development. The first method utilises a speech processing workflow to introduce errors and the second method generates commonly found suffix errors using a rule-based method. We show that the methods in combination allow significantly improving speech translation quality by 1.87 BLEU points over a baseline system.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus