Ayuda
Ir al contenido

Dialnet


Inference of concise regular expressions and DTDs

  • Autores: Geert Jan Bex, Frank Neven, Thomas Schwentick, Stijn Vansummeren
  • Localización: ACM transactions on database systems, ISSN 0362-5915, Vol. 35, Nº 2, 2010, pág. 11
  • Idioma: inglés
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • We consider the problem of inferring a concise Document Type Definition (DTD) for a given set of XML-documents, a problem that basically reduces to learning concise regular expressions from positive examples strings. We identify two classes of concise regular expressions�the single occurrence regular expressions (SOREs) and the chain regular expressions (CHAREs)�that capture the far majority of expressions used in practical DTDs. For the inference of SOREs we present several algorithms that first infer an automaton for a given set of example strings and then translate that automaton to a corresponding SORE, possibly repairing the automaton when no equivalent SORE can be found. In the process, we introduce a novel automaton to regular expression rewrite technique which is of independent interest. When only a very small amount of XML data is available, however (for instance when the data is generated by Web service requests or by answers to queries), these algorithms produce regular expressions that are too specific. Therefore, we introduce a novel learning algorithm crx that directly infers CHAREs (which form a subclass of SOREs) without going through an automaton representation. We show that crx performs very well within its target class on very small datasets.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno