Ayuda
Ir al contenido

Dialnet


Lineage inference of packed malware using binary code similarity

  • Autores: Irfan Ul Haq
  • Directores de la Tesis: Juan Caballero (dir. tes.)
  • Lectura: En la Universidad Politécnica de Madrid ( España ) en 2019
  • Idioma: español
  • Tribunal Calificador de la Tesis: Juan Manuel Estévez Tapiador (presid.), Alessandra Gorla (secret.), Martina Lindorfer (voc.), Marco Guarnieri (voc.), Ricardo Julio Rodríguez Fernández (voc.)
  • Programa de doctorado: Programa de Doctorado en Software, Sistemas y Computación por la Universidad Politécnica de Madrid
  • Materias:
  • Enlaces
  • Resumen
    • Software evolves to adapt to changing requirements by adding new functionality and improving stability through bug fixing. Software lineage studies the evolutionary relationships among software. In particular, program lineage studies the evolution of a program over time across its different versions. The source code of a program manifests its evolution. Unfortunately, many software programs are distributed as executables without source code, e.g., malware and commercial off-the-shelf (COTS) programs. However, the binary code of a program also reflects its evolution across versions, e.g., added, removed, and updated binary code. Thus, to compute lineage we can leverage binary code similarity approaches that compare two, or more, pieces of binary code and capture their differences.

      Similar to benign programs, malware families also evolve over time. But, malware lineage inference has three additional challenges not addressed by existing lineage approaches. First, malware development typically comprises of an extra step not present in benign software development. Once a new version of a malware family is ready, the malware authors pack the resulting executable to hide its original binary code. The only directly visible binary code in a packed malware sample is the code for unpacking the original binary code. Second, the packing process is typically applied many times to the same input executable, creating polymorphic variants of exactly the same malware version, which look different to malware detectors, but have the same functionality. Consequently, many packed malware samples may correspond to the same version. Thus, malware version identification is a fundamental challenge to infer lineage of malware collected in the wild. Third, the development model, e.g., straight-line or branching and merging, is unknown. A malware family may use a development model. Thus, we need a lineage inference algorithm that works independently of the development model used by the malware family.

      In this thesis, we develop an approach for inferring the lineage of packed malware using binary code similarity. In particular, this thesis provides four contributions. First, we perform a systematic study to analyze existing research on binary code similarity. We systematize four aspects of binary code similarity research: the applications enabled, the approaches employed, how the approaches have been implemented, and the benchmarks and evaluation methodologies.

      Second, we propose the first approach to identify malware versions of a malware family. Our approach uses static and dynamic analysis to recover the original binary code of a packed malware and disassemble the recovered binary code. Then, it performs binary code similarity using the semantic hashes of the disassembled code to identify the versions of the malware family.

      Third, we develop a malware lineage inference algorithm for packed malware that works independently of the development model of the malware family. It produces a lineage graph where nodes are versions of the family and edges describe the relationships between versions.

      Finally, we contribute two metrics to evaluate the malware unpacking and disassembly approaches, and use them to evaluate our approach.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno