Ayuda
Ir al contenido

Dialnet


Classy: fast clustering streams of call-graphs

  • Autores: Orestis Kostakis
  • Localización: Data mining and knowledge discovery, ISSN 1384-5810, Vol. 28, Nº 5-6, 2014, págs. 1554-1585
  • Idioma: inglés
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • An abstraction resilient to common malware obfuscation techniques is the call-graph. A call-graph is the representation of an executable file as a directed graph with labeled vertices, where the vertices correspond to functions and the edges to function calls. Unfortunately, most of the interesting graph comparison problems, including full-graph comparison and computing the largest common subgraph, belong to the $$NP$$ NP -hard class. This makes the study and use of graphs in large scale systems difficult. Existing work has focused only on offline clustering and has not addressed the issue of clustering streams of graphs. In this paper we present Classy, a scalable distributed system that clusters streams of large call-graphs for purposes including automated malware classification and facilitating malware analysts. Since algorithms aimed at clustering sets are not suitable for clustering streams of objects, we propose the use of a clustering algorithm that relies on the notion of candidate clusters and reference samples therein. We demonstrate via thorough experimentation that this approach yields results very close to the offline optimal. Graph similarity is determined by computing a graph edit distance (GED) of pairs of graphs using an adapted version of simulated annealing. Furthermore, we present a novel lower bound for the GED. We also study the problem of approximating statistics of clusters of graphs when the distances of only a fraction of all possible pairs have been computed. Finally, we present results and statistics from a real production-side system that has clustered and contains more than 0.8 million graphs.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno