As part of his presentation at the e-COPP conference, P. Kijewski (NASK) will introduce the WOMBAT project.
Hervé Debar: November 2008 Archives
The following paper has been accepted at the Network and Distributed Systems Security (NDSS) 2009 conference:
Title: Scalable, Behavior-Based Malware Clustering
Authors:
Anti-malware companies receive thousands of malware samples every day. To process this large quantity, a number of automated analysis tools were developed. These tools execute a malicious program in a controlled environment and produce reports that summarize the program's actions. Of course, the problem of analyzing the reports still remains. Recently, researchers have started to explore automated clustering techniques that help to identify samples that exhibit similar behavior. This allows an analyst to discard reports of samples that have been seen before, while focusing on novel, interesting threats. Unfortunately, previous techniques do not scale well and frequently fail to generalize the observed activity well enough to recognize related malware.
In this paper, we propose a scalable clustering approach to identify and group malware samples that exhibit similar behavior. For this, we first perform dynamic analysis to obtain the execution traces of malware programs. These execution traces are then generalized into behavioral profiles, which characterize the activity of a program in more abstract terms. The profiles serve as input to an efficient clustering algorithm that allows us to handle sample sets that are an order of magnitude larger than previous approaches. We have applied our system to real-world malware collections. The results demonstrate that our technique is able to recognize and group malware programs that behave similarly, achieving a better precision than previous approaches. To underline the scalability of the system, we clustered a set of more than 75 thousand samples in less than three hours.
Title: Scalable, Behavior-Based Malware Clustering
Authors:
- Ulrich Bayer, TUV
- Paolo Milani Comparetti, TUV
- Clemens Hlauschek, TUV
- Christopher Kruegel, UCSB
- Engin Kirda, Eurecom
Anti-malware companies receive thousands of malware samples every day. To process this large quantity, a number of automated analysis tools were developed. These tools execute a malicious program in a controlled environment and produce reports that summarize the program's actions. Of course, the problem of analyzing the reports still remains. Recently, researchers have started to explore automated clustering techniques that help to identify samples that exhibit similar behavior. This allows an analyst to discard reports of samples that have been seen before, while focusing on novel, interesting threats. Unfortunately, previous techniques do not scale well and frequently fail to generalize the observed activity well enough to recognize related malware.
In this paper, we propose a scalable clustering approach to identify and group malware samples that exhibit similar behavior. For this, we first perform dynamic analysis to obtain the execution traces of malware programs. These execution traces are then generalized into behavioral profiles, which characterize the activity of a program in more abstract terms. The profiles serve as input to an efficient clustering algorithm that allows us to handle sample sets that are an order of magnitude larger than previous approaches. We have applied our system to real-world malware collections. The results demonstrate that our technique is able to recognize and group malware programs that behave similarly, achieving a better precision than previous approaches. To underline the scalability of the system, we clustered a set of more than 75 thousand samples in less than three hours.
The WOMBAT proect will be represented at the Future Internet Assembly conference in Madrid, December 2008, by the following people:
- Vincent Boutroux, France Télécom R&D/Orange Labs
- Sotiris Ioannidis, FORTH (also representing FORWARD)
- Philip Homburg, VU (Also representing FORWARD)
- Paolo Milani Comparetti, TUV
The WOMBAT project will be represented by the following people at the ICT 2008 Conference:
- Vincent Boutroux, France Télécom R&D/Orange Labs
- Marc Dacier, Symantec
Hervé Debar participates in working group 1 of the Think-Trust project.
The WOMBAT project was represented by Hervé Debar at the SEC 2008 Conference in Paris, September 2008.
This document contains a description of the wombat architecture and a high level design
of the new sensors. The wombat architecture is covered by a comprehensive review of
all its components. Part of this architecture is also the data sources and especially the
new ones that will be implemented as part of the wombat project. Each of them will
be described in the design level, focusing on the way that they will be integrated with
the wombat infrastructure
FP7-ICT-216026-Wombat-WP3-D06_V02_Infrastructure_design.pdf
M. Corrado LEITA will publicly defend his UNS Doctoral Thesis
on Thursday, December 4th 2008 at 2:00 pm, in the Amphitheater MARCONI at EURECOM.
Topic of the Thesis:
"SGNET: automated protocol learning for the observation of malicious threats"
Jury members :
- Marc DACIER (Symantec)
- Vern PAXSON (ICSI)
- Hervé DEBAR (France Télécom R&D/Orange Labs)
- Engin KIRDA (Eurecom)
- Christopher KRUEGEL (UCSB)
- Mohamed KAANICHE (LAAS CNRS)
- Sotiris IOANNIDIS (FORTH)
One of the main prerequisites for the development of reliable defenses to protect a network resource consists in the collection of quantitative data on Internet threats. This attempt to "know your enemy" leads to an increasing interest in the collection and exploitation of datasets providing intelligence on network attacks. The creation of these datasets is a very challenging task. The challenge derives from the need to cope with the spatial and quantitative diversity of malicious activities. The observations need to be performed on a broad perspective, since the activities are not uniformly distributed over the IP space. At the same time, the data collectors need to be sophisticated enough to extract a sufficient amount of information on each activity and perform meaningful inferences. How to combine the simultaneous need to deploy a vast number of data collectors with the need of sophistication required to make meaningful observations? This work addresses this challenge by proposing a protocol learning technique based on bioinformatics algorithms. The proposed technique allows to automatically generate low-cost protocol responders starting from a set of samples of network interaction. Its characteristics are exploited in a distributed honeypot deployment that collected information on Internet attacks for a period of 8 months in 23 different networks distributed all over the world (Europe, Australia, United States). This information is organized in a central dataset enriched with contextual information from a number of sources and analysis tools. Simple data mining techniques proposed in this work allow the generation of a valuable overview on the propagation techniques employed by nowadays malware.
The WOMBAT project has received numerous requests for interaction, either to provide data to the project for analysis or to use the information collected by the project.
Our current answer to these requests is to suggest that, if you are interested in participating, you join one of the project partners' initiatives. The current suggestion is to install an SGNet honeypot through the leurre.com project, https://www.leurrecom.org/. This will enable you to collect data and provide it to the project. It will also enable you to access some of the data collected by others throgh well specified interfaces, and carry out your own data analysis research.
If you are a large data collector, we also have an interface for data exchange, run by FORTH in Greece. Please contact us if you feel that you fall into this category
Our current answer to these requests is to suggest that, if you are interested in participating, you join one of the project partners' initiatives. The current suggestion is to install an SGNet honeypot through the leurre.com project, https://www.leurrecom.org/. This will enable you to collect data and provide it to the project. It will also enable you to access some of the data collected by others throgh well specified interfaces, and carry out your own data analysis research.
If you are a large data collector, we also have an interface for data exchange, run by FORTH in Greece. Please contact us if you feel that you fall into this category
This document outlines the requirements for early warning systems built on technology provided by the WOMBAT project, setting out both: functional and non-functional requirements. The collected requirements reflect the identified user needs and the key directions to be followed within the research and development Work-packages (WP3-Data Collection and Distribution, WP4-Data Enrichment and Characterization, WP5-Threat Intelligence).
The document starts from an assessment of user requirements gathered from potential users including external participants in the Amsterdam Workshop and the WOMBAT development group. This part covers expectations of distinct classes of data users such as: security vendors, malware researchers, ISPs, CERT teams, Government, financial institutions and home users. It details the requirements for the system architecture, data and system functions, and specifies performance, availability and security features to provide sufficient functionality. It also defines user interface, testing and configuration management requirements.
FP7-ICT-216026-Wombat_WP2_D05_V01_Requirements.pdf
The document starts from an assessment of user requirements gathered from potential users including external participants in the Amsterdam Workshop and the WOMBAT development group. This part covers expectations of distinct classes of data users such as: security vendors, malware researchers, ISPs, CERT teams, Government, financial institutions and home users. It details the requirements for the system architecture, data and system functions, and specifies performance, availability and security features to provide sufficient functionality. It also defines user interface, testing and configuration management requirements.
FP7-ICT-216026-Wombat_WP2_D05_V01_Requirements.pdf
This document contains a detailed analysis of the state-of-the-art tools and research approaches for malware collection and analysis. We have reviewed high/medium/low-interaction honeypots and malware collection tools and worldwide initiatives. The analysis of the collected malware is covered by a comprehensive review of the most relevant research proposals, also including techniques that have been used to analyze running programs in general, to be adapted for the wombat purposes.
FP7-ICT-216026-Wombat-WP2_D03_V01_State_art.pdf.
FP7-ICT-216026-Wombat-WP2_D03_V01_State_art.pdf.