Bergin, Susan
(2006)
Statistical and Machine
Learning Models to Predict
Programming Performance.
PhD thesis, National University of Ireland Maynooth.
Abstract
This thesis details a longitudinal study on factors that influence introductory
programming success and on the development of machine learning
models to predict incoming student performance. Although numerous
studies have developed models to predict programming success, the models
struggled to achieve high accuracy in predicting the likely performance of
incoming students. Our approach overcomes this by providing a machine
learning technique, using a set of three significant factors, that can predict
whether students will be ‘weak’ or ‘strong’ programmers with approximately
80% accuracy after only three weeks of programming experience.
This thesis makes three fundamental contributions. The first contribution
is a longitudinal study identifying factors that influence introductory
programming success, investigating 25 factors at four different institutions.
Evidence of the importance of mathematics, comfort-level and computer
game-playing as predictors of programming performance is provided. A
number of new instruments were developed by the author and a programming
self-esteem measure was shown to out-perform other previous comparable
comfort-level measures in predicting programming performance.
The second contribution of the thesis is an analysis of the use of machine
learning (ML) algorithms to predict performance and is a first attempt to
investigate the effectiveness of a variety of ML algorithms to predict introductory
programming performance. The ML models built as part of this
research are the most effective models so far developed. The models are
effective even when students have just commenced a programming module.
Consequently, timely interventions can be put in place to prevent struggling
students from failing.
The third contribution of the thesis is the recommendation of an algorithm,
based on detailed statistical analysis that should be used by the
computer science education community to predict the likely performance of
incoming students. Optimisations were carried out to investigate if prediction
accuracy could be further increased and an ensemble algorithm, StackingC,
was shown to improve prediction performance.
The factors identified in this thesis and the associated machine learning
models provide a means to predict accurately programming performance
when students have only completed preliminary programming concepts.
This has not previously been possible.
Item Type: |
Thesis
(PhD)
|
Keywords: |
Machine
Learning Models;
Programming Performance; |
Academic Unit: |
Faculty of Science and Engineering > Computer Science |
Item ID: |
5314 |
Depositing User: |
IR eTheses
|
Date Deposited: |
14 Aug 2014 11:55 |
URI: |
|
Use Licence: |
This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available
here |
Repository Staff Only(login required)
|
Item control page |
Downloads per month over past year
Origin of downloads