Trade-off among timeliness, messages and accuracy for large-Ssale information management

René Brunner

Ayuda

Trade-off among timeliness, messages and accuracy for large-Ssale information management

Autores: René Brunner
Directores de la Tesis: Leandro Navarro Moldes (dir. tes.), Félix Freitag (dir. tes.)
Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2011
Idioma: inglés
Tribunal Calificador de la Tesis: Torsten Eymann (presid.), José María Barceló Ordinas (secret.), Luis Manuel Díaz de Cerio Ripalda (voc.), Oscar Ardaiz Villanueva (voc.), Dirk Neumann (voc.)
Materias:
- Matemáticas
  - Ciencia de los ordenadores
    - Diseño con ayuda de ordenador
    - Informática
- Ciencias tecnológicas
  - Tecnología de los ordenadores
    - Arquitectura de ordenadores
Enlaces
- Tesis en acceso abierto en: TDX
Resumen
- The increasing amount of data and the number of nodes in large-scale environments require new techniques for information management. Examples of such environments are the decentralized infrastructures of Computational Grid and Computational Cloud applications. These large-scale applications need different kinds of aggregated information such as resource monitoring, resource discovery or economic information. The challenge of providing timely and accurate information in large scale environments arise from the distribution of the information. Reasons for delays in distributed information system are a long information transmission time due to the distribution, churn and failures. A problem of large applications such as peer-to-peer (P2P) systems is the increasing retrieval time of the information due to the decentralization of the data and the failure proneness. However, many applications need a timely information provision. Another problem is an increasing network consumption when the application scales to millions of users and data. Using approximation techniques allows reducing the retrieval time and the network consumption. However, the usage of approximation techniques decreases the accuracy of the results. Thus, the remaining problem is to offer a trade-off in order to solve the conflicting requirements of fast information retrieval, accurate results and low messaging cost. Our goal is to reach a self-adaptive decision mechanism to offer a trade-off among the retrieval time, the network consumption and the accuracy of the result. Self-adaption enables distributed software to modify its behavior based on changes in the operating environment. In large-scale information systems that use hierarchical data aggregation, we apply self-adaptation to control the approximation used for the information retrieval and reduces the network consumption and the retrieval time. The hypothesis of the thesis is that approximation techniquescan reduce the retrieval time and the network consumption while guaranteeing an accuracy of the results, while considering user’s defined priorities. First, this presented research addresses the problem of a trade-off among a timely information retrieval, accurate results and low messaging cost by proposing a summarization algorithm for resource discovery in P2P-content networks. After identifying how summarization can improve the discovery process, we propose an algorithm which uses a precision-recall metric to compare the accuracy and to offer a user-driven trade-off. Second, we propose an algorithm that applies a self-adaptive decision making on each node. The decision is about the pruning of the query and returning the result instead of continuing the query. The pruning reduces the retrieval time and the network consumption at the cost of a lower accuracy in contrast to continuing the query. The algorithm uses an analytic hierarchy process to assess the user’s priorities and to propose a trade-off in order to satisfy the accuracy requirements with a low message cost and a short delay. A quantitative analysis evaluates our presented algorithms with a simulator, which is fed with real data of a network topology and the nodes’ attributes. The usage of a simulator instead of the prototype allows the evaluation in a large scale of several thousands of nodes. The algorithm for content summarization is evaluated with half a million of resources and with different query types. The selfadaptive algorithm is evaluated with a simulator of several thousands of nodes that are created from real data. A qualitative analysis addresses the integration of the simulator’s components in existing market frameworks for Computational Grid and Cloud applications. The proposed content summarization algorithm reduces the information retrieval time from a logarithmic increase to a constant factor. Furthermore, the message size is reduced significantly by applying the summarization technique. For the user, a precision-recall metric allows defining the relation between the retrieval time and the accuracy. The self-adaptive algorithm reduces the number of messages needed from an exponential increase to a constant factor. At the same time, the retrieval time is reduced to a constant factor under an increasing number of nodes. Finally, the algorithm delivers the data with the required accuracy adjusting the depth of the query according to the network conditions.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: