MURAL - Maynooth University Research Archive Library



    GP-Fileprints: File Types Detection Using Genetic Programming


    Kattan, Ahmed and Galván-López, Edgar and Poli, Riccardo and O’Neill, Michael (2010) GP-Fileprints: File Types Detection Using Genetic Programming. Genetic Programming, 6021. pp. 134-145. ISSN 0302-9743

    [img]
    Preview
    Download (276kB) | Preview


    Share your research

    Twitter Facebook LinkedIn GooglePlus Email more...



    Add this article to your Mendeley library


    Abstract

    We propose a novel application of Genetic Programming (GP): the identification of file types via the analysis of raw binary streams (i.e., without the use of meta data). GP evolves programs with multiple components. One component analyses statistical features extracted from the raw byte-series to divide the data into blocks. These blocks are then analysed via another component to obtain a signature for each file in a training set. These signatures are then projected onto a two-dimensional Euclidean space via two further (evolved) program components. K-means clustering is applied to group similar signatures. Each cluster is then labelled according to the dominant label for its members. Once a program that achieves good classification is evolved it can be used on unseen data without requiring any further evolution. Experimental results show that GP compares very well with established file classification algorithms (i.e., Neural Networks, Bayes Networks and J48 Decision Trees).

    Item Type: Article
    Keywords: Genetic Programming; Meta Data; Unseen Data; Data Fragment; Genetic Programming System;
    Academic Unit: Faculty of Science and Engineering > Computer Science
    Faculty of Science and Engineering > Research Institutes > Hamilton Institute
    Item ID: 15393
    Identification Number: https://doi.org/10.1007/978-3-642-12148-7_12
    Depositing User: Edgar Galvan
    Date Deposited: 01 Feb 2022 17:10
    Journal or Publication Title: Genetic Programming
    Publisher: Springer
    Refereed: Yes
    URI:
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only(login required)

    View Item Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads