Abstract
Student engagement (SE) is crucial in learning outcomes and teaching effectiveness. High levels of student engagement are associated with better academic performance. At the same time, disengagement can lead to issues such as low achievement, high dropout rates, alienation, and lack of interest or motivation. Despite its significance, measuring student engagement in a classroom setting remains challenging, especially when the student number is high. Teachers often rely on traditional methods such as instructor observation, surveys, and feedback, which can be subjective, time-consuming, and prone to bias.
From a teacher's perspective, understanding student engagement levels is essential for improving teaching styles and learning materials. This research aims to bridge that gap by providing educators with a practical tool to assess engagement, enabling them to make informed adjustments to their instructional strategies.
The first step in exploring student engagement involves defining the term engagement in terms of a classroom setting. Largely, it is defined as " The level of involvement students have in their learning". Even though numerous definitions are available, the one by Fedricks et el. is the most accepted in the education research field. They define engagement as a multidimensional construct, namely, behavioural, emotional/affective, and cognitive engagement, which are three different yet interrelated dimensions which impact learning and academic achievement.
This research investigates applying deep learning techniques to detect student engagement levels in a classroom setting non-intrusively. The publicly available DAiSEE dataset, comprising 9068 videos labelled with four affective states (boredom, confusion, engagement, and frustration), was used as the foundation for the initial study.
The multi-label DAiSEE dataset is converted into a single-label format, meaning one video will have one primary emotion, and the Komaravalli et al. study inspires this approach. SMOTE (Synthetic Minority Oversampling Technique) technique is applied to balance the minority classes. Using the Keras library, a deep-learning model is developed. A custom CNN (convolutional neural networks) - LSTM (Long Short-Term Memory) model and a MobileNetv2 based LSTM were tested, with the former achieving superior accuracy, particularly when segmenting video data into shorter temporal windows. The model integrates time-distributed convolutional layers for spatial feature extraction and an LSTM layer for capturing temporal dynamics. The study also integrated Dlib library for face detection to explore the impact of focusing on facial expressions versus incorporating broader contextual cues such as body gestures. While models using facial detection alone demonstrated strong results, those that included additional contextual information performed better, emphasising the importance of non-facial signals in engagement analysis.
The CNN-LSTM model consistently outperformed the MobileNetv2-based architecture, achieving an accuracy of 97.7% on 2-second video segments. However, confusion remains a challenge due to its similarity to other emotional states when their facial expressions are observed keenly.
The results showed that the custom-built CNN-LSTM model outperformed the considerably larger and pre-trained MobileNetv2 model, demonstrating that it didn’t add much value to this task. By employing a time-distributed wrapper for CNNs, our model effectively processes each frame independently while considering the sequence, contributing to its exceptional performance. Additionally, we explored how facial expressions influence the recognition of academic affective states by utilising Dlib. While facial expressions play a crucial role, non-facial movements like body movements or gestures significantly enhance the model's performance. This suggests that depending solely on facial expressions for emotional detection in educational settings may not fully capture the emotional status of students.
This has significant potential in the educational sector, where affective computing could assist teachers in understanding students’ emotional states. This capability may be beneficial in large classrooms, where these models can offer valuable insights, enabling teachers to respond better to students’ needs and aiding in developing enhanced teacher training materials for educators. The hope is that these advancements will enrich the overall learning experience; however, addressing concerns regarding privacy and the implications of recording students in classroom settings is essential. A significant obstacle identified is the scarcity of publicly accessible datasets from real classroom environments due to privacy issues.
Future research will focus on exploring advanced deep learning techniques and utilising larger datasets to further analyse and enhance student’s emotional states in the classroom. This study lays the groundwork for developing an AI-driven system to assist teachers in monitoring and improving student engagement, ultimately fostering better educational outcomes.
References
[1] Jacob Whitehill, Zewelanji Serpell, Yi-Ching Lin, Aysha Foster, and Javier R Movellan. The faces of engagement: Automatic recognition of student engagementfrom facial expressions. IEEE Transactions on Affective Computing, 5(1):86–98, 2014.
[2] Chakradhar Pabba and Praveen Kumar. An intelligent system for monitoring students’ engagement in large classroom teaching through facial expression recognition. Expert Systems, 39(1):e12839, 2022.
[3] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). arXiv preprint arXiv:1704.04861, 126, 2017.
[4] Jennifer A Fredricks, Phyllis C Blumenfeld, and Alison H Paris. School engagement: Potential of the concept, state of the evidence. Review of educational research, 74(1):59–109, 2004.
[5] Abhay Gupta, Arjun D’Cunha, Kamal Awasthi, and Vineeth Balasubramanian. Daisee: Towards user engagement recognition in the wild. arXiv preprint arXiv:1609.01885, 2016.
[6] Davis E King. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research, 10:1755–1758, 2009.
[7] Purushottama Rao Komaravalli and B Janet. Detecting academic affective states of learners in online learning environments using deep transfer learning. Scalable Computing: Practice and Experience, 24(4):957–970, 2023.
[8] Ömer Sümer, Patricia Goldberg, Sidney D’Mello, Peter Gerjets, Ulrich Trautwein, and Enkelejda Kasneci. Multimodal engagement analysis from facial videos in the classroom. IEEE Transactions on Affective Computing, 14(2):1012–1027, 2021.
[9] Sai Lakshmi Naidu, Hidangmayum Bebina, Piyush Bhatia, Prakash Duraisamy, James Van Haneghan, and Tushar Sandhan. Classroom engagement evaluation using 360-degree view of the camera with deep learning techniques. In Pattern Recognition and Tracking XXXIV, volume 12527, pages 93–103. SPIE, 2023.
[10] Hao Zhang, Xiaofan Xiao, Tao Huang, Sanya Liu, Yu Xia, and Jia Li. An novel end-to-end network for automatic student engagement recognition. In 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), pages 342–345. IEEE, 2019.
From a teacher's perspective, understanding student engagement levels is essential for improving teaching styles and learning materials. This research aims to bridge that gap by providing educators with a practical tool to assess engagement, enabling them to make informed adjustments to their instructional strategies.
The first step in exploring student engagement involves defining the term engagement in terms of a classroom setting. Largely, it is defined as " The level of involvement students have in their learning". Even though numerous definitions are available, the one by Fedricks et el. is the most accepted in the education research field. They define engagement as a multidimensional construct, namely, behavioural, emotional/affective, and cognitive engagement, which are three different yet interrelated dimensions which impact learning and academic achievement.
This research investigates applying deep learning techniques to detect student engagement levels in a classroom setting non-intrusively. The publicly available DAiSEE dataset, comprising 9068 videos labelled with four affective states (boredom, confusion, engagement, and frustration), was used as the foundation for the initial study.
The multi-label DAiSEE dataset is converted into a single-label format, meaning one video will have one primary emotion, and the Komaravalli et al. study inspires this approach. SMOTE (Synthetic Minority Oversampling Technique) technique is applied to balance the minority classes. Using the Keras library, a deep-learning model is developed. A custom CNN (convolutional neural networks) - LSTM (Long Short-Term Memory) model and a MobileNetv2 based LSTM were tested, with the former achieving superior accuracy, particularly when segmenting video data into shorter temporal windows. The model integrates time-distributed convolutional layers for spatial feature extraction and an LSTM layer for capturing temporal dynamics. The study also integrated Dlib library for face detection to explore the impact of focusing on facial expressions versus incorporating broader contextual cues such as body gestures. While models using facial detection alone demonstrated strong results, those that included additional contextual information performed better, emphasising the importance of non-facial signals in engagement analysis.
The CNN-LSTM model consistently outperformed the MobileNetv2-based architecture, achieving an accuracy of 97.7% on 2-second video segments. However, confusion remains a challenge due to its similarity to other emotional states when their facial expressions are observed keenly.
The results showed that the custom-built CNN-LSTM model outperformed the considerably larger and pre-trained MobileNetv2 model, demonstrating that it didn’t add much value to this task. By employing a time-distributed wrapper for CNNs, our model effectively processes each frame independently while considering the sequence, contributing to its exceptional performance. Additionally, we explored how facial expressions influence the recognition of academic affective states by utilising Dlib. While facial expressions play a crucial role, non-facial movements like body movements or gestures significantly enhance the model's performance. This suggests that depending solely on facial expressions for emotional detection in educational settings may not fully capture the emotional status of students.
This has significant potential in the educational sector, where affective computing could assist teachers in understanding students’ emotional states. This capability may be beneficial in large classrooms, where these models can offer valuable insights, enabling teachers to respond better to students’ needs and aiding in developing enhanced teacher training materials for educators. The hope is that these advancements will enrich the overall learning experience; however, addressing concerns regarding privacy and the implications of recording students in classroom settings is essential. A significant obstacle identified is the scarcity of publicly accessible datasets from real classroom environments due to privacy issues.
Future research will focus on exploring advanced deep learning techniques and utilising larger datasets to further analyse and enhance student’s emotional states in the classroom. This study lays the groundwork for developing an AI-driven system to assist teachers in monitoring and improving student engagement, ultimately fostering better educational outcomes.
References
[1] Jacob Whitehill, Zewelanji Serpell, Yi-Ching Lin, Aysha Foster, and Javier R Movellan. The faces of engagement: Automatic recognition of student engagementfrom facial expressions. IEEE Transactions on Affective Computing, 5(1):86–98, 2014.
[2] Chakradhar Pabba and Praveen Kumar. An intelligent system for monitoring students’ engagement in large classroom teaching through facial expression recognition. Expert Systems, 39(1):e12839, 2022.
[3] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). arXiv preprint arXiv:1704.04861, 126, 2017.
[4] Jennifer A Fredricks, Phyllis C Blumenfeld, and Alison H Paris. School engagement: Potential of the concept, state of the evidence. Review of educational research, 74(1):59–109, 2004.
[5] Abhay Gupta, Arjun D’Cunha, Kamal Awasthi, and Vineeth Balasubramanian. Daisee: Towards user engagement recognition in the wild. arXiv preprint arXiv:1609.01885, 2016.
[6] Davis E King. Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research, 10:1755–1758, 2009.
[7] Purushottama Rao Komaravalli and B Janet. Detecting academic affective states of learners in online learning environments using deep transfer learning. Scalable Computing: Practice and Experience, 24(4):957–970, 2023.
[8] Ömer Sümer, Patricia Goldberg, Sidney D’Mello, Peter Gerjets, Ulrich Trautwein, and Enkelejda Kasneci. Multimodal engagement analysis from facial videos in the classroom. IEEE Transactions on Affective Computing, 14(2):1012–1027, 2021.
[9] Sai Lakshmi Naidu, Hidangmayum Bebina, Piyush Bhatia, Prakash Duraisamy, James Van Haneghan, and Tushar Sandhan. Classroom engagement evaluation using 360-degree view of the camera with deep learning techniques. In Pattern Recognition and Tracking XXXIV, volume 12527, pages 93–103. SPIE, 2023.
[10] Hao Zhang, Xiaofan Xiao, Tao Huang, Sanya Liu, Yu Xia, and Jia Li. An novel end-to-end network for automatic student engagement recognition. In 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), pages 342–345. IEEE, 2019.
| Original language | English |
|---|---|
| Publication status | Published - 10 Sept 2025 |
| Event | BERA Annual Conference 2025 - University of Sussex, Brighton, United Kingdom Duration: 9 Sept 2025 → 11 Sept 2025 |
Conference
| Conference | BERA Annual Conference 2025 |
|---|---|
| Country/Territory | United Kingdom |
| City | Brighton |
| Period | 9/09/25 → 11/09/25 |
Funding
BERA Annual Conference Funding
Keywords
- student engagement
- feedback