On distributing the analysis process of a broad-coverage unification-based grammar of spanish

Montserrat Marimon Felipe

Ayuda

On distributing the analysis process of a broad-coverage unification-based grammar of spanish

Autores: Montserrat Marimon Felipe
Directores de la Tesis: Josep Andreu Martín Rioja (dir. tes.), Núria Bel Rafecas (dir. tes.), Axel Theofilidis (dir. tes.)
Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2003
Idioma: inglés
ISBN: 84-688-3522-6
Depósito Legal: B.44817-2003
Tribunal Calificador de la Tesis: Ramón Cerdà Massó (presid.), Horacio Rodríguez Hontoria (secret.), Fernando Sánchez León (voc.), Detlef Prescher (voc.), José Gabriel Amores Carredano (voc.)
Enlaces
- Tesis en acceso abierto en: TDX
Resumen
- This thesis describes research into the development and deployment of engineered large-scale unification-based grammar to provide more robust and efficient deep grammatical analysis of linguistic expressions in real-world applications, while maintaining the accuracy of the grammar (i.e. percentage of input sentences that receive the correct analysis) and keeping its precision up to a reasonable level (i.e. percentage of input sentences that received no superfluous analysis).
  
  In tacking the efficiency problem, our approach has been to prune the search space of the parser by integrating shallow and deep processing. We propose and implement a NLP system which integrates a Part-of-Speech (PoS) tagger and chunker as a pre-processing module of broad-coverage nification-based grammar of Spanish. This allows us to release the arser from certain tasks that may be efficiently and reliably dealt with by these computationally less expensive processing techniques. On the one hand, by integrating the morpho-syntactic information delivered by the PoS tagger, we reduce the number of morpho-syntactic ambiguities of the linguistic expression to be analyzed. On the other hand, by integrating chunk mark-ups delivered by the partial parser, we do notonly avoid generating irrelevant constituents which are not to contribute to the final parse tree, but we also provide part of the structure that the analysis component has to compute, thus, avoiding a duplication of efforts.
  
  In addition, we want our system to be able to maintain the accuracy of the high-level grammar. In the integrated architecture we propose, we keep the ambiguities which can not be reliably solved by the PoS tagger to be dealt with by the linguistic components of the grammar performing deep analysis.
  
  Besides improving the efficiency of the overall analysis process and maintaining the accuracy of the grammar, our system provides both structural and lexical robustness to the high-level processing. Structural robustness is obtained by integrating into the linguistic components of the high-level grammar the structures which have already been parsed by the chunker such that they do not need to be re-built by phrase structure rules. This allows us to extend the coverage of the grammar to deal with very low frequent constructions whose treatment would increase drastically the parsing search space and would create spurious ambiguity. To provide lexical robustness to the system, we have implemented default lexical entries. Default lexical entries are lexical entry templates that are activated when the system can not find a particular lexical entry to apply. Here, the integration of the tagger, which supplies the PoS information to the linguistic processing modules of our system, allows us to increase robustness while avoiding increase in morphological ambiguity. Better precision is achieved by extending the PoS tags of our external lexicon so that they include syntactic information, for instance subcategorization information.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: