Leiria, Portugal
Turigal, a parallel corpus of tourism advertising material, has been devised to support the creation of a bilingual term bank on tourism. The corpus consists of texts – printed brochures, guidebooks and websites – in Portuguese and their translations into English, all of which were sourced from Portuguese Tourism Regions, Regional Tourism Boards and Regional Tourism Promotion Agencies, and stored as plain text. For the moment, it contains 1,285,764 words and is included in the Linguistic Corpus of the University of Vigo (CLUVI). This paper describes the methodology used in the compilation of Turigal. First, we examine the process of text collection and storage. Then, we discuss Pearson’s (1998) set of criteria for corpus design and text selection which has been considered when compiling our corpus. Finally, we present the alignment and tagging of Turigal.
© 2001-2024 Fundación Dialnet · Todos los derechos reservados