MURAL - Maynooth University Research Archive Library

    Statistical and Machine Learning Models to Predict Programming Performance

    Bergin, Susan (2006) Statistical and Machine Learning Models to Predict Programming Performance. PhD thesis, National University of Ireland Maynooth.

    Download (2MB) | Preview

    Share your research

    Twitter Facebook LinkedIn GooglePlus Email more...

    Add this article to your Mendeley library


    This thesis details a longitudinal study on factors that influence introductory programming success and on the development of machine learning models to predict incoming student performance. Although numerous studies have developed models to predict programming success, the models struggled to achieve high accuracy in predicting the likely performance of incoming students. Our approach overcomes this by providing a machine learning technique, using a set of three significant factors, that can predict whether students will be ‘weak’ or ‘strong’ programmers with approximately 80% accuracy after only three weeks of programming experience. This thesis makes three fundamental contributions. The first contribution is a longitudinal study identifying factors that influence introductory programming success, investigating 25 factors at four different institutions. Evidence of the importance of mathematics, comfort-level and computer game-playing as predictors of programming performance is provided. A number of new instruments were developed by the author and a programming self-esteem measure was shown to out-perform other previous comparable comfort-level measures in predicting programming performance. The second contribution of the thesis is an analysis of the use of machine learning (ML) algorithms to predict performance and is a first attempt to investigate the effectiveness of a variety of ML algorithms to predict introductory programming performance. The ML models built as part of this research are the most effective models so far developed. The models are effective even when students have just commenced a programming module. Consequently, timely interventions can be put in place to prevent struggling students from failing. The third contribution of the thesis is the recommendation of an algorithm, based on detailed statistical analysis that should be used by the computer science education community to predict the likely performance of incoming students. Optimisations were carried out to investigate if prediction accuracy could be further increased and an ensemble algorithm, StackingC, was shown to improve prediction performance. The factors identified in this thesis and the associated machine learning models provide a means to predict accurately programming performance when students have only completed preliminary programming concepts. This has not previously been possible.

    Item Type: Thesis (PhD)
    Keywords: Machine Learning Models; Programming Performance;
    Academic Unit: Faculty of Science and Engineering > Computer Science
    Item ID: 5314
    Depositing User: IR eTheses
    Date Deposited: 14 Aug 2014 11:55
      Use Licence: This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

      Repository Staff Only(login required)

      View Item Item control page


      Downloads per month over past year

      Origin of downloads