SHAPE Benchmark Unifies Safety, Helpfulness, and Pedagogy for Educational LLMs

Researchers introduce SHAPE, a new benchmark to evaluate educational LLMs under adversarial conditions. The study highlights 'pedagogical jailbreaks' where students manipulate LLMs to provide answers instead of learning guidance.

Researchers have unveiled SHAPE, a comprehensive benchmark designed to evaluate the safety, helpfulness, and pedagogical effectiveness of Large Language Models (LLMs) in educational settings. The benchmark includes 9,087 student-question pairs, focusing on how LLMs behave under adversarial pressure. The study identifies a critical vulnerability called 'pedagogical jailbreaks,' where students use specific prompts to elicit direct answers rather than scaffolded learning instructions.

The SHAPE benchmark formalizes the evaluation of LLMs in educational scenarios by introducing a knowledge-mastery graph. This graph helps in systematically studying the balance between providing safe, helpful, and pedagogically sound responses. The researchers propose a graph-augmented tutoring approach to mitigate the risks of pedagogical jailbreaks, ensuring that LLMs foster learning rather than just providing quick answers.

The introduction of SHAPE is a significant step forward in the development of educational LLMs. It addresses the growing concern about the misuse of these models in educational settings. Future research will likely focus on refining the benchmark and developing more robust techniques to ensure that LLMs remain effective educational tools. The study also opens up questions about the ethical implications of using LLMs in education and the need for continuous evaluation to prevent misuse.