Posts by Collection

portfolio

Optimization Algorithms for Subway Challenge

The Subway Challenge is an open record put out by the Guinness World Records Company. The goal of the challenge is to visit all 472 stations on the New York City Subway System in the shortest amount of time. We transformed the New York Subway System into a representation that a computer can solve. We then ran a couple different solvers on our representation: Nearest Neighbor Search and Ant Colony Optimizer. Through trial and error, we found multiple tours that beat the world record.

publications

To Burst or Not to Burst: Generating and Quantifying Improbable Text

Published in GEM @ EMNLP 2023, 2023

While large language models (LLMs) are extremely capable at text generation, their outputs are still distinguishable from human-authored text. We explore this separation across many metrics over text, many sampling techniques, many types of text data, and across two popular LLMs, LLaMA and Vicuna. Along the way, we introduce a new metric, recoverability, to highlight differences between human and machine text; and we propose a new sampling technique, burst sampling, designed to close this gap.

Recommended citation: To Burst or Not to Burst: Generating and Quantifying Improbable Text. K Sasse, E Kayi, S Barham, E Staley - GEM Workshop @ EMNLP, 2023. https://aclanthology.org/2023.gem-1.24/

Selecting Shots for Demographic Fairness in Few-Shot Learning with Large Language Models

Published in NLP for Positive Impact, 2023

In this work, we explore the effect of shots, which directly affect the performance of models, on the fairness of LLMs as NLP classification systems. We consider how different shot selection strategies, both existing and new demographically sensitive methods, affect model fairness across three standard fairness datasets. We discuss how future work can include LLM fairness evaluations.

Recommended citation: Selecting Shots for Demographic Fairness in Few-Shot Learning with Large Language Models. C Aguirre, K Sasse, I Cachola, M Dredze - Third Workshop on NLP for Positive Impact @ EMNLP 2024, 2024. https://aclanthology.org/2024.nlp4pi-1.4/

Wait, but Tylenol is Acetaminophen Investigating and Improving Language Models Ability to Resist Requests for Misinformation

Published in Under Review at Lancet Digital Health, 2024

Large language models (LLMs) are vulnerable to generating misinformation by blindly complying with illogical user requests, posing significant risks in medicine. This study analyzed LLM compliance with misleading medication-related prompts and explored methods, including in-context directions and instruction-tuning, to enhance logical reasoning and reduce misinformation. Results show that both prompt-based and parameter-based approaches can improve flaw detection and mitigate misinformation risks, highlighting the importance of prioritizing logic over compliance in LLMs to safeguard against misuse.

Recommended citation: Wait, but Tylenol is Acetaminophen Investigating and Improving Language Models Ability to Resist Requests for Misinformation. S Chen, M Gao, K Sasse, T Hartvigsen, B Anthony, L Fan, H Aerts, J Gallifant, D Bitterman -- arXiv preprint arXiv:2409.20385, 2024. https://arxiv.org/pdf/2409.20385

Disease Entity Recognition and Normalization is Improved with Large Language Model Derived Synthetic Normalized Mentions

Published in Under Review at Journal of Biomedical Semantics, 2024

Machine learning methods for Disease Entity Recognition (DER) and Normalization (DEN) face challenges with infrequently occurring concepts due to limited mentions in training corpora and sparse Knowledge Graph descriptions. Fine-tuning a LLaMa-2 13B Chat LLM to generate synthetic training data significantly improved DEN performance, particularly in Out of Distribution (OOD) data, with accuracy gains of 3-9 points overall and 20-55 points OOD, while DER showed modest improvements. This study highlights the potential of LLM-generated synthetic mentions for enhancing DEN but reveals limited benefits for DER, with all software and datasets made publicly available.

Recommended citation: Disease Entity Recognition and Normalization is Improved with Large Language Model Derived Synthetic Normalized Mentions. K Sasse, S Vadlakonda, R Kennedy, J Osborne - arXiv preprint arXiv:2410.07951, 2024. https://arxiv.org/pdf/2410.07951

Mapping Bias in Vision Language Models: Signposts, Pitfalls, and the Road Ahead

Published in Under Review at NAACL 2025, 2024

As Vision Language Models (VLMs) gain widespread use, their fairness remains under-explored. In this paper, we analyze demographic biases across five models and six datasets. We find that portrait datasets like UTKFace and CelebA are the best tools for bias detection, finding gaps in performance and fairness between LLaVa and CLIP models. However, scene based datasets like PATA, VLStereoSet fail to be useful benchmarks for bias due to their construction. As for pronoun based datasets like VisoGender, we receive mixed signals as only some subsets of the data are useful in providing insights. To alleviate this problem, we introduce a more difficult version of VisoGender to serve as a more rigorous evaluation. Based on these results, we call for more effective and carefully designed datasets to ensure VLMs are both fair and reliable.

Recommended citation: Mapping Bias in Vision Language Models: Signposts, Pitfalls, and the Road Ahead. K Sasse, S Chen, J Pond, D Bitterman, J Osborne - arXiv preprint arXiv:2410.13146, 2024. https://arxiv.org/pdf/2410.13146

Understanding the determinants of vaccine hesitancy in the United States: A comparison of social surveys and social media

Published in PLOS ONE, 2024

The COVID-19 pandemic highlighted the importance of vaccines as the most effective defense against the virus, yet vaccine hesitancy remains a pressing public health challenge, especially with emerging variants. This study explores the potential of social media data as a complementary source to traditional surveys for predicting vaccine hesitancy in the ten largest U.S. metropolitan areas, using machine learning models that integrate social, demographic, and economic variables. Results show that models incorporating social media or survey data, particularly with the XGBoost algorithm, outperform baseline models, emphasizing the promise of social media data, the variability of influential factors across communities, and the need for tailored, data-driven interventions.

Recommended citation: Understanding the determinants of vaccine hesitancy in the United States: A comparison of social surveys and social media. K Sasse, R Mahabir, O Gkountouna, A Crooks, A Croitoru - PLOS ONE, 2024. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0301488

talks

teaching