Large scale machine learning for the detection and classification of malware

Kilgallon, Sean

Large scale machine learning for the detection and classification of malware

Author(s)	Kilgallon, Sean
Date Accessioned	2018-12-19T12:17:58Z
Date Available	2018-12-19T12:17:58Z
Publication Date	2018
SWORD Update	2018-09-12T13:00:46Z
Abstract	Bad actors have embraced automation and current malware analysis systems cannot keep up with the ever-increasing load of malware being created daily. As a result, traditional malware detection and classification techniques using expert systems and brittle heuristics are outdated and ineffective. We introduce deep learning models based on inexpensive static features gathered from large scale malware datasets to generate robust and efficient malware detection and malware family classification predictions. ☐ Static analysis is performed by dissecting or disassembling the malware's binary file and studying the components without executing it. Furthermore, static analysis is generally much faster than most malware analysis techniques. However, some static analysis of malware can be computationally expensive and not all static analysis should be considered for every sample in a large malware dataset. We introduce a meta-model trained using deep learning that finds the simplest classifiers to characterize and assign malware into their corresponding families. Using static analysis of malware, we generate descriptive features to be used in conjunction with deep learning, in order to predict malware families. Our meta-model can determine when simple and less expensive malware characterization will suffice to accurately classify malicious executables, or when more computationally expensive descriptions are required. ☐ One of the most important components of training deep learning models, particularly deep neural networks, is finding the optimal model configuration and feature set combinations. Most applications of deep learning, specifically neural networks, use heuristics or trial-and-error to find the optimal model configurations. We implemented a large scale model configuration search using supercomputing resources to produce the most accurate deep learning model given a feature set. In addition, we construct a genetic algorithm used to find the optimal subset of static analysis features. This result provides us with the ability to construct extremely accurate deep learning models for malware detection and malware family classification.	en_US
Advisor	Cavazos, John
Degree	Ph.D.
Program	University of Delaware, Institute for Financial Services Analytics
DOI	https://doi.org/10.58088/09sb-5091
Unique Identifier	1079398639
URL	http://udspace.udel.edu/handle/19716/24010
Language	en
Publisher	University of Delaware	en_US
URI	https://search.proquest.com/docview/2117544640?accountid=10457
Keywords	Applied sciences	en_US
Keywords	Data science	en_US
Keywords	Deep learning	en_US
Keywords	Machine learning	en_US
Keywords	Malware detection	en_US
Keywords	Neural networks	en_US
Keywords	Static analysis	en_US
Title	Large scale machine learning for the detection and classification of malware	en_US
Type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kilgallon_udel_0060D_13499.pdf
Size:: 3.7 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.22 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Doctoral Dissertations (Winter 2014 to Present)