proposal of ( Learning Analytic intervention)

The Learning Analytical Intervention System has been developed to identify and provide support to students whose academic performance is at a critical level. To enhance the accuracy of student identification, an increase in the questioning process has been implemented. This enhancement aims to obtain more accurate results, enabling the system to determine “The system was designed separately, so we needed to find a way to integrate the system into an educational platform like Canvus to present these results to users in an accessible and user-friendly way, and perhaps through an automated process for added convenience.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

want to write a proposal between 3 and 5 pages, including:

litreture reviwe,

What are the problems?

What is plan?

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

main task approach,

and time table

i will upload all the resources you need to go through

2021 6th International Conference on Innovative Technology in Intelligent System and Industrial Applications (CITISIA) | 978-1-6654-1784-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/CITISIA53721.2021.9719985
An Innovative Framework to Improve Course and
Student Outcomes
Khalid Alalawi
School of Information and Physical
Sciences
The University of Newcastle
Callaghan, Australia
khalid.alalawi@uon.edu.au
Rukshan Athauda
School of Information and Physical
Sciences
The University of Newcastle
Callaghan, Australia
rukshan.athauda@newcastle.edu.au
Abstract—This paper presents a novel framework aimed at
improving educational outcomes in tertiary-level courses. The
framework integrates concepts from educational data mining,
learning analytics and education research domains. The
framework considers the entire life cycle of courses and includes
processes and supporting technology artefacts. Well-established
pedagogy principles such as Constructive Alignment (CA) and
effective feedback principles are incorporated to the
framework. Mapping of learning outcomes, assessment tasks
and teaching/learning activities using CA enables generating
revision/study plans and determining the progress and
achievement of students, in addition to assisting with course
evaluation. Student performance prediction models are used to
identify students at risk of failure early on for interventions.
Tools are provided for academics to select student groups for
intervention and provide personalised feedback. Feedback
reports are generated based on effective feedback principles.
Learning analytics dashboards provide information on
students’ progress and course evaluation. An evaluation of the
framework based on a case study and quasi-experimental design
on real-world courses is outlined. This research and the
framework have the potential to significantly contribute to this
important field of study.
Keywords—Educational technology framework, student
performance prediction, machine learning, constructive
alignment, effective feedback, learning analytics
I. INTRODUCTION
In today’s tertiary education context, it is challenging for
educators to design and deliver high-quality courses and
learning experiences with many different competing demands.
It is common for academics to be expected to deliver courses
in online and blended modes, cater to large class sizes, provide
effective feedback, ensure high quality of learning, cater to
different students’ learning preferences, and accurately assess
student learning. Technological advancements such as
Learning Analytics (LA), Educational Data Mining (EDM),
Machine Learning (ML), Learning Management Systems and
sound educational principles can be integrated to address these
challenges. This paper presents a research-in-progress design
and development of a novel framework that takes into account
the entire lifecycle of courses incorporating sound
pedagogical principles and technology artefacts with the aim
to enable better student and course outcomes.
The remainder of this paper is organised as follows.
Section 2 provides a brief background and review of related
work. In Section 3, the proposed framework is presented.
Section 4 presents the research design aimed to evaluate the
framework. Finally, we conclude the paper highlighting the
expected outcomes and contributions in Section 5.
1
Raymond Chiong
School of Information and Physical
Sciences
The University of Newcastle
Callaghan, Australia
raymond.chiong@newcastle.edu.au
II. BACKGROUND AND RELATED WORK
In the literature, we observe many efforts using
technological advances and pedagogical principles to improve
student and course outcomes. Here we discuss some of these
studies that are related to and assist in developing the proposed
framework.
A. Educational Data Mining and Student Prediction
With the widespread use of online educational systems and
technology-enhanced learning environments [1], large data
sets have become available. Data mining has been used to
analyse these large data sets to gain knowledge and insights.
A particular interest that has gained attention in the EDM
community is to predict student performance [2, 3]. Typically,
ML algorithms have been used to develop prediction models
to accurately predict student performance in courses and
identify students at risk of failing or dropping out, among
others. It has been shown that actions taken based on the
prediction results can have significant impacts. In the work of
Burgos et al. [4], they identified potential drop-out students
early on in the course and implemented a tutoring plan
targeting potential drop-out students based on the prediction
results. Their results showed a 14% reduction in the drop-out
rate compared to previous years when the tutoring plan was
implemented. This study is an example that demonstrates
promising results and potential to have improved education
outcomes by early intervention based on student performance
prediction results.
B. Learning Analytics
LA is commonly defined as “the measurement, collection,
analysis and reporting of data about learners and their
contexts, for purposes of understanding and optimising
learning and the environments in which it occurs”1. In LA, the
most common way of reporting is through LA Dashboards
(LADs). An LAD is “a single display that aggregates different
indicators about learners, learning processes and/or learning
contexts into one or multiple visualisations” [5]. In the
literature, a number of efforts in LA and LADs have been
designed and implemented to improve student outcomes.
For example, Wang and Han [6] showed that an LAD
based on process-oriented feedback offers better support and
outcomes compared to product-oriented feedback in an online
learning platform iTutor. Pardo et al. [7] described OnTask,
an LA-based framework that allows integrating data from
numerous educational data sources to a single Student Data
Table (SDT) and enables to provide personalised feedback
(termed Personalised Learning Support Actions) based on
rules IF [CONDITION] THEN [TEXT], where conditions are
predicates defined in terms of columns in the SDT. Lim et al.
https://tekri.athabascau.ca/analytics/
978-1-6654-1784-6/21/$31.00 ©2021 IEEE
Authorized licensed use limited to: University of Newcastle. Downloaded on January 27,2023 at 04:03:58 UTC from IEEE Xplore. Restrictions apply.
[8] deployed OnTask to examine its impact on students’ selfregulated learning and academic achievement in a large
course. Their results showed that the personalised feedback
led students to participate in course activities (self-regulated
learning) and, ultimately, improved their academic
performance in the course.
In general, LA has shown to provide real-time information
on student progress using LADs for ‘actionable’ knowledge
by relevant decision makers – instructors, administrators and
others. A notable effort is Course Signals (CS) at Purdue
University [9], which combines data mining and LA to
provide real-time information. In CS, data is collected across
the institution, mined in real-time using a predictive student
success algorithm to determine which students might be at risk
partially indicated by their effort within a course. CS provided
academics a student risk (indicated by traffic light signals –
green, yellow, red), which is shared with students and used by
instructors for interventions. The system has shown a high
success rate impacting student outcomes, retention across
many cohorts at Purdue.
C. Research in Education
Research in education has provided a number of pedagogy
principles that have the potential to lead to improved student
outcomes and delivery of courses. Below we outline some
important pedagogy and educational principles.
1) Feedback: A significant factor to improve
educational outcomes
Feedback is an essential aspect of the assessment process.
It has a significant impact on student learning and has been
described as the most significant single moderator for
improving student achievement [10, 11]. Providing formative
(on-going, timely) feedback is an important means for
optimising students’ achievement of learning outcomes.
Formative feedback aims to offer students feedback to reflect
on approaching, orienting, and assessing learning. Also,
providing regular feedback enables educators to identify areas
learners are struggling and can lead to ways to improve
teaching [12]. Feedback helps students to understand, involve,
or develop successful approaches to access knowledge that is
meant to be taught [10]. Additionally, continuous feedback
would encourage deeper learning in the course [13].
Effective feedback must be clear, meaningful, relevant and
consistent with students’ prior knowledge. It also needs to
include information relating to the task or learning process that
addresses the gap between what is understood and what is
supposed to be understood [10]. Hattie and Timperley [10]
concluded that effective feedback must answer three major
questions: (i.) “Where am I going?” (What are the goals?); (ii.)
“How am I going?” (What progress is being made toward
achieving the goal?); and (iii.) “Where to next?” (What
activities need to be undertaken to make further progress?).
Several studies have found that feedback is far more useful
when it includes details of how to enhance, rather than just
stating whether the results are correct or not [12, 14].
2) Constructive Alignment: A Guide to Effective
Curriculum Design
Constructive Alignment (CA) [15] offers a guide for
designing an effective curriculum. In CA, what is intended for
students to learn and how they should express their learning is
clearly defined first (intended learning outcomes – ILOs).
Teaching and learning activities (TLAs) are then designed to
engage students with the aim to optimise their chances of
achieving the outcomes. Assessment tasks (ATs) are designed
to enable clear judgments as to how well those outcomes have
been attained. In summary, ILOs, TLAs and ATs are aligned
in the design of learning experiences.
In the relevant literature, we observe many instances of
CA applied for effective design of tertiary-level courses [13,
16], where a course’s ILOs, which are termed Course
Learning Outcomes (CLOs), are aligned to TLAs and ATs.
Also, efforts to evaluate student achievement of course and
program outcomes using CA have been made [17-19].
The next section presents the proposed framework.
III. THE PROPOSED FRAMEWORK
The proposed framework considers the entire lifecycle of
a course: Course Design, Course Delivery and Course
Evaluation. In the Course Design phase, the curriculum is
planned. The course materials, teaching/learning activities and
assessments are designed and prepared for delivery. Next, in
the Course Delivery stage, the course is delivered to students
where teaching/learning activities are performed and students
are assessed. The course in a tertiary education setting is
typically delivered in an academic term. Finally, the course is
evaluated based on students’ performance, feedback,
academics’ experience and revised/fine-tuned for next
offering of the course.
The proposed framework is novel in that it incorporates all
phases of a course’s life cycle (see Fig. 1). In the course design
phase, the course is designed based on principles of CA and a
CA mapping model is created. Also, student performance
prediction models based on historical assessment data are
created in this phase. In the Course Delivery phase, the
prediction models are deployed to predict student
performance and identify students at risk of failure early on.
LADs provide both academics and students a view of student
progress. LADs provide feedback to students based on
effective feedback principles [10] using the CA mapping
model. Academics can identify potential students at risk early
on and also have access to tools that enable personalised
feedback. Finally, at the end of the academic term, the course
is evaluated using a Course Evaluation Dashboard, which
presents students’ achievement of ILOs, student performance,
impact of interventions and comparison with previous course
offerings.
Fig. 1. Course Lifecycle and Proposed Framework
There are several technology artefacts developed as a part
of the framework (see Fig. 2). Detailed discussion of each
phase and technology artefacts will be presented in the
following sections.
2
Authorized licensed use limited to: University of Newcastle. Downloaded on January 27,2023 at 04:03:58 UTC from IEEE Xplore. Restrictions apply.
borderline, pass). The system by default selects the most
accurate prediction model(s) to predict student performance in
the current cohort.
B. Course Delivery Phase
There are a number of modules used by academics and
students in the Course Delivery phase. The main modules used
by academics are the Academic Dashboard and the Student
Feedback Module. Students access the Student Dashboard
module during this phase. Also, students in the intervention
groups receive personalised feedback from the Student
Feedback Module.
Fig. 2. Technology artefacts of the proposed framework
A. Course Design Phase
There are two major tasks undertaken in the course design
phase: (i.) design and organise the course following CA, to
align CLOs, TLAs, ATs and effective on-going feedback; and
(ii.) creating student performance prediction models based on
historical assessment data for early prediction of student
performance.
1) Application of Constructive Alignment
In the course design stage, academics use the principles of
CA to design the course. Firstly, CLOs are clearly outlined.
Next, TLAs are designed and organised so that student
learning is scaffolded to optimise achieving CLOs. ATs are
designed to demonstrate achievement of CLOs and also to
provide on-going feedback. A significant consideration is to
organise the course’s activities to provide on-going early
feedback throughout the term via (formative) assessments to
students and academics on student progress and achievement
of learning outcomes [12]. This approach allows for early
intervention based on students’ progress and potential
prediction results as discussed later. A CA Mapping Model is
specified during course design mapping between CLOs, ATs
and TLAs using the CA Mapping Module. The CA Mapping
Model is used in several instances including providing
recommended revision plans, course evaluation and
determining student achievement of CLOs, as will be seen
later.
1) Academic and Student Dashboards
After each assessment, as the assessment marks are
available, the prediction models generate students’
performance predictions. The Academic Dashboard (see Fig.
3(a)) allows academics to view students who are at risk, as
well as students’ progress and achievement of CLOs.
Academics can utilise these results to identify groups of
students for intervention. Students have access to the Student
Dashboard (see Fig. 3 (b)), which provides student’s progress
in assessments, CLOs and revision plans for assessment based
on the CA Mapping model.
(a)
2) Creating Student Performance Prediction Models
In this step, historical assessment data from pervious
offerings of the course is used to create student performance
prediction models for the course. In the literature, many ML
algorithms have been successfully used to predict student
performance accurately [20]. Early prediction of student
performance is a powerful source of information for
identifying at-risk (of failure or drop-out) students from which
academics can take pre-emptive actions/interventions for
better outcomes [9].
The Student Performance Prediction Module (see Fig. 2)
implements five popular ML algorithms to build the student
performance prediction models: Linear Regression (LR),
Support Vector Machine (SVM), Decision Tree (DT), kNearest Neighbours (k-NN), and Naive Bayes (NB).
Academics upload past historical student data to the system to
create the prediction models. The module will generate
prediction models and evaluate them with well-known
evaluation metrics: precision, recall, accuracy, and F-measure.
The prediction results are based on binary classification (atrisk/pass) as well as multiclass classification (at-risk,
(b)
Fig. 3. Examples of (a) Academic Dashboard and (b) Student
Dashboard
3
Authorized licensed use limited to: University of Newcastle. Downloaded on January 27,2023 at 04:03:58 UTC from IEEE Xplore. Restrictions apply.
2) Effective and Personalised Feedback
The framework utilises OnTask’s idea of using a single
table to integrate data for each student. The table contains
assessment data and results from prediction models for current
students. Next, academics are able to use conditions on the
columns of the table to select groups of students for
intervention. Personalised feedback can be generated for these
intervention groups. Fig. 4 shows the workflow of data
collection and feedback and an example of a Student Data
Table.
Fig. 5. A sample feedback report

Achievement of assessment results with regard to
CLOs: As the assessments are mapped to the CLOs,
the assessment marks of the cohort (avg. min, max,
percentile) in the assessment scores indicate students’
demonstrated achievement of CLOs. Reflection on
this information can lead to consideration for
revising/updating strategies in TLAs, formative
assessments and ATs to better achieve these learning
outcomes in future course offerings.

Impact of interventions: Selected student groups were
sent personalised messages and feedback reports,
based on predicted results and assessment marks
achieved. Comparison of students in the intervention
group that improved (i.e., actual result vs. predicted
result), did not improve and worsened can help in
assessing the impact of the interventions.
(a)
(b)
Fig. 4. (a) Workflow of data collection and feedback; (b)
Example of a Student Data Table with assessment data and
prediction results.
In addition to allowing instructors to provide personalised
messages, the Student Feedback Module generates a Feedback
Report for the assessment of each student. The Feedback
Report addresses the three critical questions for effective
feedback by Hattie and Timperley [10]. The question (i.)
“What are the goals?” (of the assessment) is answered using
the CLOs mapped to the assessment (in the CA Mapping
Model); (ii.) “How am I going?” is answered by assessment
marks and tutor feedback; and (iii.) “Where to next?” is
addressed by generating a revision plan for the assessment
task based on ATs’ mapping to TLAs (in the CA Mapping
Model). A sample feedback report is shown in Fig. 5. The
feedback report can be attached with the personalised
feedback and is also accessible via the Student Dashboard.
A design for a course evaluation dashboard is presented in
Fig. 6.
C. Course Evaluation Phase
The Course Evaluation is the final phase in the proposed
framework. At this stage, the academics will evaluate the
course and consider any updates for the next offering of the
course. The Course Evaluation Dashboard is used at this
stage.
The dashboard contains three main categories of
information for course evaluation:

Descriptive statistics on the pass rate, fail rate,
withdrawal rate and drop-out rate: comparing the
current cohort to previous cohorts. This information
will provide academics with an overall view of this
cohort’s performance compared with past offerings.
Fig. 6. Design of the Course Evaluation Dashboard
Another task completed at this stage is to update the
prediction models using the current cohort’s assessment data
for the next iteration of the course. The increased data set for
4
Authorized licensed use limited to: University of Newcastle. Downloaded on January 27,2023 at 04:03:58 UTC from IEEE Xplore. Restrictions apply.
prediction models will assist in increasing prediction models’
accuracy.
The first version of the framework is in development, and
we plan to evaluate it in Semester 1, 2022.
IV. EVALUATION OF FRAMEWORK
To evaluate the framework, we will deploy the framework
on several real-world courses for the entire lifecycle of the
course and gather data. There are two main user groups that
use the framework – academics and the students. Data will be
gathered from both groups of users. The main research
questions (RQs) that guide the evaluation are outlined below:

RQ1. What are the perceptions of academics who use
the framework?

RQ2. What are the perceptions of students with regard
to the student dashboard, personalised feedback and
feedback reports?

RQ3. What impacts, if any, are observed by using the
framework on course and student outcomes?
To answer RQ1 and RQ2, a case-study approach is used.
Academics are invited to use the framework in their courses;
a survey and an interview with the academics will be
conducted to gather data to answer RQ1. To answer RQ2, data
is collected from the students enrolled in the course by inviting
them to fill in a survey at the end of the academic term. To
answer RQ3, a quasi-experimental design will be used. Data
is compared between the control and experimental groups.
The experimental group is the cohort where the framework is
used. Previous offerings of the course are considered as the
control group. Propensity score matching will be used, since
it is not feasible to do a randomised controlled trial to assign
students to treatment and control groups [21]. Assessment
data, pass/fail rates and withdrawal rates are taken from
experimental and control groups to identify observable
impacts of the framework. Statistical tests will be performed
on assessment results to identify if any statistical significant
results are identified. Data from intervention groups will be
used to identify any impacts on interventions.
V. CONCLUSION AND FUTURE WORK
In this paper, we presented a novel framework aimed at
improving educational outcomes. The framework
incorporates several distinct features that underpin its
potential for success. These include:
(i.) Incorporating well-established pedagogy principles on
curriculum design and feedback. CA is used to design the
curriculum mapping learning outcomes to ATs and TLAs. The
mappings are saved in a CA mapping model and utilised to
provide feedback, evaluate students’ progress, and in course
evaluation. On-going formative feedback is generated
following principles of effective feedback.
(ii.) Utilising ML-based prediction models to identify
student performance. Student prediction models are created
using historical assessment data and used to predict and
identify students at risk of failure or drop-out early on for
interventions.
(iii.) Incorporating LADs to provide real-time on-going
students’ progress and feedback. Tools to select student
groups for intervention and provide personalised feedback are
incorporated into the framework.
(v.) Providing LADs to review, evaluate and assist in
improving course and student outcomes considering the entire
lifecycle of courses.
We also presented a research design for evaluating the
framework in real-world course environments. In future, the
framework, presently being developed, will be deployed and
evaluated in real-world courses with the aim to contribute to
this important field of study.
REFERENCES
[1]
J. Jovanovic and R. Chiong, Technological and Social Environments
for Interactive Learning. Santa Rosa, CA: Informing Science, 2013.
[2] A. P. Christy and N. Rama, “Relevance and resonance of data science
in performance prediction and visualization,” Indian Journal of Science
and Technology, vol. 9, no. 19, 93885, 2016, doi:
10.17485/ijst/2016/v9i19/93885.
[3] A. Abu Saa, M. Al-Emran, and K. Shaalan, “Factors affecting students’
performance in higher education: A systematic review of predictive
data mining techniques,” Technology, Knowledge and Learning, vol.
24, no. 4, pp. 567-598, 2019, doi: 10.1007/s10758-019-09408-7.
[4] C. Burgos, M. L. Campanario, D. d. l. Peña, J. A. Lara, D. Lizcano, and
M. A. Martínez, “Data mining for modeling students’ performance: A
tutoring action plan to prevent academic dropout,” Computers &
Electrical Engineering, vol. 66, pp. 541-556, 2018, doi:
https://doi.org/10.1016/j.compeleceng.2017.03.005.
[5] B. A. Schwendimann et al., “Perceiving learning at a glance: A
systematic literature review of learning dashboard research,” IEEE
Transactions on Learning Technologies, vol. 10, no. 1, pp. 30-41,
2016, doi: 10.1109/TLT.2016.2599522.
[6] D. Wang and H. Han, “Applying learning analytics dashboards based
on process-oriented feedback to improve students’ learning
effectiveness,” Journal of Computer Assisted Learning, vol. 37, no. 2,
pp. 487-499, 2021, doi: https://doi.org/10.1111/jcal.12502.
[7] A. Pardo et al., “OnTask: Delivering data-informed, personalised
learning support actions,” Journal of Learning Analytics, vol. 5, no. 3,
pp. 235–249, 2018, doi: 10.18608/jla.2018.53.15.
[8] L.-A. Lim et al., “What changes, and for whom? A study of the impact
of learning analytics-based process feedback in a large course,”
Learning and Instruction, vol. 72, 101202, 2021, doi:
https://doi.org/10.1016/j.learninstruc.2019.04.003.
[9] K. E. Arnold and M. D. Pistilli, “Course signals at Purdue: Using
learning analytics to increase student success,” in Proceedings of the
2nd International Conference on Learning Analytics and Knowledge,
Vancouver, British Columbia, Canada, 2012. [Online]. Available:
https://doi.org/10.1145/2330601.2330666.
[10] J. Hattie and H. Timperley, “The power of feedback,” Review of
Educational Research, vol. 77, no. 1, pp. 81-112, 2007, doi:
10.3102/003465430298487.
[11] P. Black and D. Wiliam, “Developing the theory of formative
assessment,” Educational Assessment, Evaluation and Accountability,
vol. 21, no. 1, pp. 5-31, 2009.
[12] V. J. Shute, “Focus on formative feedback,” Review of educational
research, vol. 78, no. 1, pp. 153-189, 2008, doi:
10.3102/0034654307313795.
[13] F. Lasrado and N. Kaul, “Designing a curriculum in light of
constructive alignment: A case study analysis,” Journal of Education
for Business, vol. 96, no. 1, pp. 60-68, 2021, doi:
10.1080/08832323.2020.1732275.
[14] B. Harks, K. Rakoczy, J. Hattie, M. Besser, and E. Klieme, “The effects
of feedback on achievement, interest and self-evaluation: The role of
feedback’s perceived usefulness,” Educational Psychology, vol. 34, no.
3, pp. 269-290, 2014.
[15] J. Biggs, “Constructive alignment in university teaching: HERDSA
Review of Higher Education, 1, 5–22,” ed, 2014.
[16] M. McCann, “Constructive alignment in economics teaching: A
reflection on effective implementation,” Teaching in Higher
Education, vol. 22, no. 3, pp. 336-348, 2017, doi:
10.1080/13562517.2016.1248387.
[17] A. Shanableh, “IT-facilitated student assessment: Outcome-based
student grades,” in Proceedings of the 2011 International Conference
on Information Technology Based Higher Education and Training,
2011: IEEE, pp. 1-6, doi: 10.1109/ITHET.2011.6018687.
5
Authorized licensed use limited to: University of Newcastle. Downloaded on January 27,2023 at 04:03:58 UTC from IEEE Xplore. Restrictions apply.
[18] R. Lottering, R. Hans, D. Chuene, C. Lepota, and V. Ranko,
“Outcomes-based student performance diagnostic and support model,”
in Proceedings of the 2017 IEEE 6th International Conference on
Teaching, Assessment, and Learning for Engineering (TALE), Hong
Kong, 2017: IEEE, pp. 198-203.
[19] J. Ducrot and V. Shankararaman, “Measuring student performance and
providing feedback using competency framework,” in Proceedings of
the 2014 IEEE 6th Conference on Engineering Education (ICEED),
Kuala Lumpur, 2014: IEEE, pp. 55-60.
[20] R. Al-Shabandar, A. J. Hussain, P. Liatsis, and R. Keight, “Detecting
at-risk students with early interventions using machine learning
techniques,” IEEE Access, vol. 7, pp. 149464-149478, 2019, doi:
10.1109/ACCESS.2019.2943351.
[21] S. Mojarad, A. Essa, S. Mojarad, and R. S. Baker, “Studying adaptive
learning efficacy using propensity score matching,” in Companion
Proceedings of the 8th International Conference on Learning Analytics
and Knowledge (LAK’18), 2018, pp. 5-9.
6
Authorized licensed use limited to: University of Newcastle. Downloaded on January 27,2023 at 04:03:58 UTC from IEEE Xplore. Restrictions apply.
2021 6th International Conference on Innovative Technology in Intelligent System and Industrial Applications (CITISIA) | 978-1-6654-1784-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/CITISIA53721.2021.9719896
Early Detection of Under-Performing Students
Using Machine Learning Algorithms
Khalid Alalawi
School of Information and Physical
Sciences
The University of Newcastle
Callaghan, Australia
khalid.alalawi@uon.edu.au
Raymond Chiong
School of Information and Physical
Sciences
The University of Newcastle
Callaghan, Australia
raymond.chiong@newcastle.edu.au
Abstract—Predicting student performance and identifying
under-performing students early is the first step towards
helping students who might have difficulties in meeting learning
outcomes of a course resulting in a failing grade. Early detection
in this context allows educators to provide appropriate
interventions sooner for students facing challenges, which could
lead to a higher possibility of success. Machine learning (ML)
algorithms can be utilized to create an early warning system that
detects students who need assistance and informs both
educators and learners about their performance. In this paper,
we explore the performance of different ML algorithms for
identifying under-performing students in the early stages of an
academic term/semester for a selected undergraduate course.
First, we attempted to identify students who might fail their
course, as a binary classification problem (pass or fail), with
several experiments at different times during the semester. Next,
we introduced an additional group of students who are at the
borderline of failing, resulting in a multiclass classification
problem. We were able to identify under-performing students
early in the semester using only the first assessment in the course
with an accuracy of 95%, and borderline students with an
accuracy of 84%. In addition, we introduce a student
performance prediction system that allows academics to create
ML models and identify under-performing students early on
during the academic term.
Keywords—student performance prediction, academic student
success, classification, educational data mining, early warning
systems, machine learning
I. INTRODUCTION
Rukshan Athauda
School of Information and Physical
Sciences
The University of Newcastle
Callaghan, Australia
rukshan.athauda@newcastle.edu.au
with several experiments at different times during the
semester, in weeks 6, 8, 12, and 13 after each assessment task,
to find the best model that predicts students who might fail the
course. The first model is to detect under-performing students
using binary classification (pass or fail). Next, we introduced
an additional class of students who are at the borderline of
failing, with the aim of detecting both at-risk and borderline
students for intervention.
We compared the performance of five different ML
algorithms using various evaluation techniques to identify the
best model. The primary goal is to support the academic task
of successfully predicting the students’ performance to
accurately identify under-performing students early on, so that
appropriate interventions can be made. The dataset used in this
study was collected from the Systems and Network
Administration course, between 2012 and 2018, at the
University of Newcastle, Australia. This course is distinct
from many other university courses, in that it has a major
practical component consisting of two practical exams that are
not linked to the final exam assessment. In total, there are five
assessment tasks: two practical tests, two assignments and a
final exam.
We also present a prediction system developed as a web
application that allows academics to detect students at risk of
failure early on during the term in their courses. This system
uses historical assessment data of courses to develop ML
prediction models, which can be deployed by academics
during a course offering to predict student performance.
Identifying students at risk of failing in the early stages is
a challenging but crucial task in higher education [1].
Detecting under-performing students in a timely manner could
help instructors explore possible intervention strategies to help
struggling students [2]. With an appropriate intervention
strategy, teachers could provide learners with real-time
support and increase success levels [3]. Early warning systems
are considered powerful tools to identify students at risk of
failure or dropping out [4].
The remainder of this paper is organized as follows. In the
next section, we briefly review some important related work.
Then, the methodology used in developing the proposed
models is described in Section 3. Section 4 discusses the
analysis of experimental studies and results. The student
performance prediction system is presented in Section 6.
Finally, we conclude the paper in Section 7.
Machine Learning (ML) is a subset of artificial
intelligence that focuses on developing models that can
provide systems with the capacity to learn and enhance from
experience without being directly programmed [5]. ML
algorithms have been widely used in different applications
(e.g., see [6][7]), including for educational purposes to build
efficient models in evaluating students’ performance and
detecting at-risk students through the use of predictive
modeling methods [8].
Student learning success and failure is a major concern in
higher educational institutions. Several methods have been
applied in the literature for the detection of at-risk students,
including ML. In this section, we provide a review of the stateof-the-art research in the detection of at-risk students using
ML methods.
In this paper, we investigate the performance of different
ML algorithms that can be used in early warning systems to
predict students’ performance in terms of detecting students
who are at risk of failure. We selected historical data from a
Systems and Network Administration course (INFT2031) to
build and evaluate our models. We implemented two models
II. RELATED WORK
Early identification of students at risk of withdrawal or
failure was presented in the work of Al-Shabandar et al. [3],
with the intention of early intervention and provision of timely
assistance to those students. They proposed two models: the
goal of the first model is the early detection of students who
are likely to withdraw from their courses by investigating the
relationship between their engagement level and motivational
status against past withdrawal rates. The goal of the second
model is to predict students in danger of failing their courses.
978-1-6654-1784-6/21/$31.00 ©2021 IEEE
Authorized licensed use limited to: University of Newcastle. Downloaded on January 27,2023 at 04:03:17 UTC from IEEE Xplore. Restrictions apply.
Additionally, they analyzed the factors influencing student
failure. A variety of ML algorithms were used in this study,
including the Random Forest (RF), Multi-Layer Perceptron
(MLP), Gradient Boosting Machine (GBM), and Generalized
Linear Model (GLM). While all classifiers showed good
accuracy for both models, the GBM provided the highest
accuracy of 0.894 and 0.952 for the first and second models,
respectively.
Liao et al. [9] proposed a methodology based on in-class
clicker questions to predict students’ performance as early as
week 3 in a 12-week course. They created a linear regression
model to predict students’ scores in the final exam, and
classify whether a student was in danger of failing the course.
The proposed model was able to accurately predict 70% of atrisk students. Mishra et al. [10] introduced a model that can
identify at-risk students early based on their social integration,
academic integration, and various emotional skills. Two ML
algorithms J48 and Random Tree were applied to 250 students
to predict their academic performance, and their results
showed that the Random Tree algorithm provided higher
accuracy (94%) than J48 (88%).
A DSS-PSP (Decision Support Software for Predicting
Students’ Performance) was presented by Livieris et al. [11]
to predict students’ exam scores in the second semester based
on their performance in the first semester. They used exam
scores from the first two years of 2206 high school students of
“Algebra” and “Geometry” between 2007-2016. Their DSSPSP applies a two-level classification, where the first level
predicts whether or not a student will pass the final exam, and
the second level classifies the performance (“Good”, “Very
good”, “Excellent”) of those students who ‘pass’ in the firstlevel classification. Of the several ML algorithms they tested,
C4.5 achieved the best performance as the first-level
classifier, while Sequential Minimal Optimization (SMO) was
the best classifier for the second-level classification. The
combined classifiers achieved the best accuracy of 90.1%.
Marbouti et al. [12] were able to build a predictive model
that can identify at-risk students based on their academic
performance (i.e., assessments) in the first five weeks of a
semester. Their goals were to compare and find the best model
for identifying students at risk of failure between seven
different predictive modeling methods using an academic
dataset of more than 1600 students. The seven models used in
their study are the Logistic Regression, Support Vector
Machine (SVM), Decision Tree (DT), MLP, Naive Bayes
(NB), K-Nearest Neighbor (KNN), and an ensemble model
consisting of three classifiers (NB, SVM, and KNN). A
comparison of the models’ performance showed that the NB
and ensemble model performed the best in terms of accuracy:
0.817 for NB and 0.846 for the ensemble model.
Similarly, Khan et al. [13] proposed a prediction model
that can inform students about their probable final grades in
an introductory programming course at the early stages of the
semester after 15% of total grades were awarded. They tested
11 different ML algorithms in their model, and the best one
was J48. Their model achieved 88% accuracy, even though
their dataset consisted of only 50 samples.
and 4, they used only attendance records, quiz grades, and
assignment grades for prediction. In week 9, they also added
mid-term exam grades to their model. Their model was able
to identify at-risk and successful students with an accuracy of
79% in week 2, 90% in week 4, and 98% in week 9.
A study was conducted by Trakunphutthirak et al. [15],
using a university log file containing students’ access to
Internet activities and browsing behavior to predict their
academic performance and thereby detect students at risk of
failure. Several ML algorithms were used in this study,
including the DT, NB, Logistic Regression, RF, and Neural
Network. The RF algorithm provided the best results in
predicting students who were at risk of failure, with an
accuracy of 79%. Moreover, their findings suggested that, on
average, at-risk students spent more time on Internet activities
than normal students, especially outside the study period.
The study presented in this paper is inspired by and
expands on the ideas from the abovementioned studies.
Despite the many efforts in identifying at-risk students in
higher education early, a limited number of studies have
investigated multiclass classification models for identifying
students who are at risk of failing their courses or are at the
borderline case. This approach not only helps in detecting
students who might pass or fail (two classes), but also
identifies borderline students who may require assistance to
improve performance. We have therefore implemented binary
and multiclass classifications to address this issue, where we
classify students’ results into three categories (at-risk,
borderline, and pass). Additionally, our study uses historical
assessment data to train the models, thus making it possible
for our approach to be deployed in a variety of situations
where academics only have access to such datasets.
III. METHODOLOGY
The primary objective of this research is to identify underperforming students and students who are at risk of failure in
the early stages of a term. To achieve that, we propose two
student performance prediction models using five
classification algorithms: RF, SVM, DT, KNN, and NB.
These ML algorithms are the most commonly used for
predicting student performance in the literature [16] [17]. The
first is a binary classification model to predict whether or not
a student will pass a course. The second is a multiclass
classification model for identifying students who will pass, or
those who are at risk or in the borderline case. The models
were built using Python. The key steps of the proposed
methodology are shown in Figure 1.
The first step of the proposed methodology involves data
cleansing and pre-processing, including removing missing
data and outliers so that the data is ready for the next step. The
subsequent step is to use 5-fold cross-validation to determine
the settings of the models’ hyperparameters. Then, the optimal
hyperparameters for each ML algorithm are used to create the
prediction models. In the model’s evaluation step, four
standard evaluation metrics are used to evaluate each model’s
performance. Finally, we compare the results of all five ML
algorithms.
A Logistic Regression-based model was presented in
another work of Marbouti et al. [14] to identify students at risk
of failure in a large first-year engineering course. They chose
to predict students’ performance at three different times
during the semester: week 2, week 4, and week 9. In weeks 2
2
Authorized licensed use limited to: University of Newcastle. Downloaded on January 27,2023 at 04:03:17 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. A flowchart of the proposed methodology
A. Dataset
The dataset used in this study was collected from the
assessment results obtained by students for the Systems and
Network Administration course (INFT2031) between 20122018 at the University of Newcastle, Australia. This course
differs from many other courses at the university because it
consists of two practical tests, two assignments, and a final
exam. The practical tests focus on practical knowledge and
skills that are not directly evaluated in the final exam. The
dataset consists of 577 observations (student records) with
five features. These features are Practical Test 1, Assignment
1, Practical Test 2, Assignment 2, and Final Exam. The target
label in the binary classification experiments is a class
denoting the students’ final result for the course (Pass/Fail);
while the target class for the multiclass classification
experiments is the class of students’ performance, which is
either AtRisk (0

Still stressed from student homework?
Get quality assistance from academic writers!

Order your essay today and save 25% with the discount code LAVENDER