Papers
See AI2's Award Winning Papers
Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 298 papers
Detection and Measurement of Syntactic Templates in Generated Text
Chantal Shaib, Yanai Elazar, Junyi Jessy Li, Byron C. WallacearXiv • 2024 Recent work on evaluating the diversity of text generated by LLMs has focused on word-level features. Here we offer an analysis of syntactic features to characterize general repetition in models, beyond frequent n-grams. Specifically, we define syntactic…Evaluating n-Gram Novelty of Language Models Using Rusty-DAWG
William Merrill, Noah A. Smith, Yanai ElazararXiv • 2024 How novel are texts generated by language models (LMs) relative to their training corpora? In this work, we investigate the extent to which modern LMs generate /n/-grams from their training data, evaluating both (i) the probability LMs assign to complete…Evaluating In-Context Learning of Libraries for Code Generation
Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep DasigiNAACL • 2024 Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work…The Bias Amplification Paradox in Text-to-Image Generation
P. Seshadri, Sameer Singh, Yanai ElazarNAACL • 2024 Bias amplification is a phenomenon in which models increase imbalances present in the training data. In this paper, we study bias amplification in the text-to-image domain using Stable Diffusion by comparing gender ratios in training vs. generated images. We…Applying Intrinsic Debiasing on Downstream Tasks: Challenges and Considerations for Machine Translation
Bar Iluz, Yanai Elazar, Asaf Yehudai, Gabriel StanovskyarXiv • 2024 Most works on gender bias focus on intrinsic bias -- removing traces of information about a protected group from the model's internal representation. However, these works are often disconnected from the impact of such debiasing on downstream applications…BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models
Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh HajishirziICLR • 2024 Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of…MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chun-yue Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng GaoICLR • 2024 Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we…Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh HajishirziICLR • 2024 Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that…SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke ZettlemoyerICLR • 2024 The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government…TRAM: Bridging Trust Regions and Sharpness Aware Minimization
Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao PengICLR • 2024 By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness improvement under domain transfer. Instead of focusing on parameters, however, this work considers the transferability of…