Ayuda
Ir al contenido

Dialnet


Turigal: compilation of a parallel corpus for bilingual terminology extraction

    1. [1] Instituto Politécnico de Leiria

      Instituto Politécnico de Leiria

      Leiria, Portugal

  • Localización: Las tecnologías de la información y las comunicaciones: presente y futuro en el análisis de corpus: Actas del III Congreso Internacional de Lingüistica de Corpus / María Luisa Carrió Pastor (ed. lit.), Miguel Ángel Candel Mora (ed. lit.), 2011, ISBN 978-84-694-6225-6, págs. 33-42
  • Idioma: inglés
  • Enlaces
  • Resumen
    • Turigal, a parallel corpus of tourism advertising material, has been devised to support the creation of a bilingual term bank on tourism. The corpus consists of texts – printed brochures, guidebooks and websites – in Portuguese and their translations into English, all of which were sourced from Portuguese Tourism Regions, Regional Tourism Boards and Regional Tourism Promotion Agencies, and stored as plain text. For the moment, it contains 1,285,764 words and is included in the Linguistic Corpus of the University of Vigo (CLUVI). This paper describes the methodology used in the compilation of Turigal. First, we examine the process of text collection and storage. Then, we discuss Pearson’s (1998) set of criteria for corpus design and text selection which has been considered when compiling our corpus. Finally, we present the alignment and tagging of Turigal.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno