Wait, but Tylenol is Acetaminophen Investigating and Improving Language Models Ability to Resist Requests for Misinformation
Published in Under Review at Lancet Digital Health, 2024
Recommended citation: Wait, but Tylenol is Acetaminophen Investigating and Improving Language Models Ability to Resist Requests for Misinformation. S Chen, M Gao, K Sasse, T Hartvigsen, B Anthony, L Fan, H Aerts, J Gallifant, D Bitterman -- arXiv preprint arXiv:2409.20385, 2024. https://arxiv.org/pdf/2409.20385
Large language models (LLMs) are vulnerable to generating misinformation by blindly complying with illogical user requests, posing significant risks in medicine. This study analyzed LLM compliance with misleading medication-related prompts and explored methods, including in-context directions and instruction-tuning, to enhance logical reasoning and reduce misinformation. Results show that both prompt-based and parameter-based approaches can improve flaw detection and mitigate misinformation risks, highlighting the importance of prioritizing logic over compliance in LLMs to safeguard against misuse.
Recommended citation: Wait, but Tylenol is Acetaminophen Investigating and Improving Language Models Ability to Resist Requests for Misinformation. S Chen, M Gao, K Sasse, T Hartvigsen, B Anthony, L Fan, H Aerts, J Gallifant, D Bitterman - arXiv preprint arXiv:2409.20385, 2024.