Large scale machine learning for the detection and classification of malware

Author(s)Kilgallon, Sean
Date Accessioned2018-12-19T12:17:58Z
Date Available2018-12-19T12:17:58Z
Publication Date2018
SWORD Update2018-09-12T13:00:46Z
AbstractBad actors have embraced automation and current malware analysis systems cannot keep up with the ever-increasing load of malware being created daily. As a result, traditional malware detection and classification techniques using expert systems and brittle heuristics are outdated and ineffective. We introduce deep learning models based on inexpensive static features gathered from large scale malware datasets to generate robust and efficient malware detection and malware family classification predictions. ☐ Static analysis is performed by dissecting or disassembling the malware's binary file and studying the components without executing it. Furthermore, static analysis is generally much faster than most malware analysis techniques. However, some static analysis of malware can be computationally expensive and not all static analysis should be considered for every sample in a large malware dataset. We introduce a meta-model trained using deep learning that finds the simplest classifiers to characterize and assign malware into their corresponding families. Using static analysis of malware, we generate descriptive features to be used in conjunction with deep learning, in order to predict malware families. Our meta-model can determine when simple and less expensive malware characterization will suffice to accurately classify malicious executables, or when more computationally expensive descriptions are required. ☐ One of the most important components of training deep learning models, particularly deep neural networks, is finding the optimal model configuration and feature set combinations. Most applications of deep learning, specifically neural networks, use heuristics or trial-and-error to find the optimal model configurations. We implemented a large scale model configuration search using supercomputing resources to produce the most accurate deep learning model given a feature set. In addition, we construct a genetic algorithm used to find the optimal subset of static analysis features. This result provides us with the ability to construct extremely accurate deep learning models for malware detection and malware family classification.en_US
AdvisorCavazos, John
DegreePh.D.
ProgramUniversity of Delaware, Institute for Financial Services Analytics
DOIhttps://doi.org/10.58088/09sb-5091
Unique Identifier1079398639
URLhttp://udspace.udel.edu/handle/19716/24010
Languageen
PublisherUniversity of Delawareen_US
URIhttps://search.proquest.com/docview/2117544640?accountid=10457
KeywordsApplied sciencesen_US
KeywordsData scienceen_US
KeywordsDeep learningen_US
KeywordsMachine learningen_US
KeywordsMalware detectionen_US
KeywordsNeural networksen_US
KeywordsStatic analysisen_US
TitleLarge scale machine learning for the detection and classification of malwareen_US
TypeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kilgallon_udel_0060D_13499.pdf
Size:
3.7 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: