Wait, but Tylenol is Acetaminophen Investigating and Improving Language Models Ability to Resist Requests for Misinformation

Published in Under Review at Lancet Digital Health, 2024

Recommended citation: Wait, but Tylenol is Acetaminophen Investigating and Improving Language Models Ability to Resist Requests for Misinformation. S Chen, M Gao, K Sasse, T Hartvigsen, B Anthony, L Fan, H Aerts, J Gallifant, D Bitterman -- arXiv preprint arXiv:2409.20385, 2024. https://arxiv.org/pdf/2409.20385

Download paper here

Large language models (LLMs) are vulnerable to generating misinformation by blindly complying with illogical user requests, posing significant risks in medicine. This study analyzed LLM compliance with misleading medication-related prompts and explored methods, including in-context directions and instruction-tuning, to enhance logical reasoning and reduce misinformation. Results show that both prompt-based and parameter-based approaches can improve flaw detection and mitigate misinformation risks, highlighting the importance of prioritizing logic over compliance in LLMs to safeguard against misuse.

Recommended citation: Wait, but Tylenol is Acetaminophen Investigating and Improving Language Models Ability to Resist Requests for Misinformation. S Chen, M Gao, K Sasse, T Hartvigsen, B Anthony, L Fan, H Aerts, J Gallifant, D Bitterman - arXiv preprint arXiv:2409.20385, 2024.