The ContentMine Scraping Stack: Literature-scale Content Mining with Community-maintained Collections of Declarative Scrapers

Autores: Richard Smith-Unna, Peter Murray-Rust
Localización: D-Lib Magazine, ISSN-e 1082-9873, Vol. 20, Nº. 11-12, 2014
Idioma: inglés
Enlaces
- Texto completo (html)
Resumen
- Successfully mining scholarly literature at scale is inhibited by technical and political barriers that have been only partially addressed by publishers' application programming interfaces (APIs). Many of those APIs have restrictions that inhibit data mining at scale, and while only some publishers actually provide APIs, almost all publishers make their content available on the web. Current web technologies should make it possible to harvest and mine the scholarly literature regardless of the source of publication, and without using specialised programmatic interfaces controlled by each publisher. Here we describe the tools developed to address this challenge as part of the ContentMine project.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: