Kattan, Ahmed and Galván-López, Edgar and Poli, Riccardo and O’Neill, Michael (2010) GP-Fileprints: File Types Detection Using Genetic Programming. Genetic Programming, 6021. pp. 134-145. ISSN 0302-9743
|
Download (276kB)
| Preview
|
Abstract
We propose a novel application of Genetic Programming (GP): the identification of file types via the analysis of raw binary streams (i.e., without the use of meta data). GP evolves programs with multiple components. One component analyses statistical features extracted from the raw byte-series to divide the data into blocks. These blocks are then analysed via another component to obtain a signature for each file in a training set. These signatures are then projected onto a two-dimensional Euclidean space via two further (evolved) program components. K-means clustering is applied to group similar signatures. Each cluster is then labelled according to the dominant label for its members. Once a program that achieves good classification is evolved it can be used on unseen data without requiring any further evolution. Experimental results show that GP compares very well with established file classification algorithms (i.e., Neural Networks, Bayes Networks and J48 Decision Trees).
Item Type: | Article |
---|---|
Keywords: | Genetic Programming; Meta Data; Unseen Data; Data Fragment; Genetic Programming System; |
Academic Unit: | Faculty of Science and Engineering > Computer Science Faculty of Science and Engineering > Research Institutes > Hamilton Institute |
Item ID: | 15393 |
Identification Number: | https://doi.org/10.1007/978-3-642-12148-7_12 |
Depositing User: | Edgar Galvan |
Date Deposited: | 01 Feb 2022 17:10 |
Journal or Publication Title: | Genetic Programming |
Publisher: | Springer |
Refereed: | Yes |
URI: | |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only(login required)
Item control page |
Downloads
Downloads per month over past year