Half of LLMs Fail Safety Tests for Robotic Health Attendant Control

A new study evaluated 72 large language models on safety for robotic health attendants, finding a 54.4% average violation rate. The results highlight significant risks in deploying LLMs for medical applications without rigorous safety protocols.

A recent study published on arXiv benchmarked the safety of 72 large language models (LLMs) for use in robotic health attendants. The researchers created a dataset of 270 harmful instructions across nine categories aligned with the American Medical Association's Principles of Medical Ethics. Using a simulation environment based on the Robotic Health Attendant framework, they found that the mean violation rate was 54.4%, with more than half of the models exceeding safety thresholds.

The findings underscore the critical need for robust safety measures before deploying LLMs in healthcare applications. The high violation rates suggest that current models are not yet reliable for tasks involving direct patient interaction, where ethical and safety concerns are paramount. This study provides a benchmark for future research, emphasizing the importance of developing models that can handle complex ethical dilemmas in medical settings.

The study's authors call for further research to improve the safety and reliability of LLMs in healthcare. They suggest that future work should focus on refining ethical guidelines and developing more sophisticated simulation environments to better prepare models for real-world deployment. The implications of these findings extend beyond robotic health attendants, highlighting broader concerns about the ethical use of AI in sensitive applications.