We will present two full papers at EDM 2025 demonstrating ways of using large language and large multimodal models to supply data and representations that boost the performance of smaller, task-specific algorithms for student modelling and educational data mining.

Bridging the Data Gap: Using LLMs to Augment Datasets for Text Classification

Full Abstract

We introduce and evaluate a novel, five-stage methodological pipeline to augment educational datasets using Large Language Models, addressing data scarcity and improving text classification accuracy across multiple educational contexts. This pipeline is grounded in a systematic literature review of LLM-driven data augmentation for text classification.

Read the full paper here.

Authors

Seyed Parsa Neshaei
Paola Mejia
Tanya Nazaretsky
Richard Lee Davis
Tanja Käser

Using Large Multimodal Models to Extract Knowledge Components for Knowledge Tracing from Multimedia Question Information

Full Abstract

We propose a novel, zero-shot method using Large Multimodal Models (LMMs) to extract knowledge components (KCs) from multimodal question media. Unlike traditional methods, our approach directly parses text, images, and audio to generate detailed KCs. Experimental evaluations across five domains and multiple knowledge tracing models demonstrate that the LMM-generated KCs not only match but often exceed the performance of human-defined KCs.

Read the full paper here.

Authors

Hyeongdon Moon
Seyed Parsa Neshaei
Richard Lee Davis
Pierre Dillenbourg