Papers

See AI2's Award Winning Papers

Learn more about AI2's Lasting Impact Award

Viewing 21-30 of 292 papers

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min, Kalpesh Krishna, Xinxi Lyu, M. Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh HajishirziEMNLP • 2023 Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2…
Machine Reading Comprehension using Case-based Reasoning
Dung Ngoc Thai, Dhruv Agarwal, Mudit Chaudhary, Rajarshi Das, M. Zaheer, J. Lee, Hannaneh Hajishirzi, A. McCallumEMNLP • 2023 We present an accurate and interpretable method for answer extraction in machine reading comprehension that is reminiscent of case-based reasoning (CBR) from classical AI. Our method (CBR-MRC) builds upon the hypothesis that contextualized answers to similar…
Measuring and Narrowing the Compositionality Gap in Language Models
Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike LewisEMNLP Findings • 2023 We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems. We measure how often models can correctly answer all sub-problems but not generate…
SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks
Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi, Hannaneh HajishirziEMNLP • 2023 We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments…
TaskWeb: Selecting Better Source Tasks for Multi-task NLP
Joongwon Kim, Akari Asai, Gabriel Ilharco, Hannaneh HajishirziEMNLP • 2023 Recent work in NLP has shown promising results in training models on large amounts of tasks to achieve better generalization. However, it is not well-understood how tasks are related, and how helpful training tasks can be chosen for a new task. In this work…
Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements
Jiacheng Liu, Wenya Wang, Dianzhuo Wang, Noah A. Smith, Yejin Choi, Hanna HajishirziEMNLP • 2023 Despite the much discussed capabilities of today's language models, they are still prone to silly and unexpected commonsense failures. We consider a retrospective verification approach that reflects on the correctness of LM outputs, and introduce Vera, a…
We're Afraid Language Models Aren't Modeling Ambiguity
Alisa Liu, Zhaofeng Wu, Julian Michael, Alane Suhr, Peter West, Alexander Koller, Swabha Swayamdipta, Noah A. Smith, Yejin ChoiEMNLP • 2023 Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our interpretations as listeners. As language models (LMs) are…
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals
Yanai Elazar, Bhargavi Paranjape, Hao Peng, Sarah Wiegreffe, Khyathi Raghavi, Vivek Srikumar, Sameer Singh, Noah A. SmitharXiv • 2023 The inevitable appearance of spurious correlations in training datasets hurts the generalization of NLP models on unseen data. Previous work has found that datasets with paired inputs are prone to correlations between a specific part of the input (e.g., the…
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback
Nathan Lambert, Roberto CalandraarXiv • 2023 Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to prompt and more capable in complex settings. RLHF at its core is providing a new toolkit to optimize LLMs other than next…
Entangled Preferences: The History and Risks of Reinforcement Learning and Human Feedback
Nathan Lambert, Thomas Krendl Gilbert, Tom ZickarXiv • 2023 Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to use and more effective. A core piece of the RLHF process is the training and utilization of a model of human preferences that…

1
2
3
4
•••
30

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Papers

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

Machine Reading Comprehension using Case-based Reasoning

Measuring and Narrowing the Compositionality Gap in Language Models

SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks

TaskWeb: Selecting Better Source Tasks for Multi-task NLP

Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements

We're Afraid Language Models Aren't Modeling Ambiguity

Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals

The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

Entangled Preferences: The History and Risks of Reinforcement Learning and Human Feedback