MURAL - Maynooth University Research Archive Library



    Statistical and Machine Learning Models to Predict Programming Performance


    Bergin, Susan (2006) Statistical and Machine Learning Models to Predict Programming Performance. PhD thesis, National University of Ireland Maynooth.

    [thumbnail of Susan_Bergin_20140806112124.pdf]
    Preview
    Text
    Susan_Bergin_20140806112124.pdf

    Download (2MB) | Preview

    Abstract

    This thesis details a longitudinal study on factors that influence introductory programming success and on the development of machine learning models to predict incoming student performance. Although numerous studies have developed models to predict programming success, the models struggled to achieve high accuracy in predicting the likely performance of incoming students. Our approach overcomes this by providing a machine learning technique, using a set of three significant factors, that can predict whether students will be ‘weak’ or ‘strong’ programmers with approximately 80% accuracy after only three weeks of programming experience. This thesis makes three fundamental contributions. The first contribution is a longitudinal study identifying factors that influence introductory programming success, investigating 25 factors at four different institutions. Evidence of the importance of mathematics, comfort-level and computer game-playing as predictors of programming performance is provided. A number of new instruments were developed by the author and a programming self-esteem measure was shown to out-perform other previous comparable comfort-level measures in predicting programming performance. The second contribution of the thesis is an analysis of the use of machine learning (ML) algorithms to predict performance and is a first attempt to investigate the effectiveness of a variety of ML algorithms to predict introductory programming performance. The ML models built as part of this research are the most effective models so far developed. The models are effective even when students have just commenced a programming module. Consequently, timely interventions can be put in place to prevent struggling students from failing. The third contribution of the thesis is the recommendation of an algorithm, based on detailed statistical analysis that should be used by the computer science education community to predict the likely performance of incoming students. Optimisations were carried out to investigate if prediction accuracy could be further increased and an ensemble algorithm, StackingC, was shown to improve prediction performance. The factors identified in this thesis and the associated machine learning models provide a means to predict accurately programming performance when students have only completed preliminary programming concepts. This has not previously been possible.
    Item Type: Thesis (PhD)
    Keywords: Machine Learning Models; Programming Performance;
    Academic Unit: Faculty of Science and Engineering > Computer Science
    Item ID: 5314
    Depositing User: IR eTheses
    Date Deposited: 14 Aug 2014 11:55
    URI: https://mural.maynoothuniversity.ie/id/eprint/5314
    Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

    Repository Staff Only (login required)

    Item control page
    Item control page

    Downloads

    Downloads per month over past year

    Origin of downloads