Award Winning Papers

See AI2's Award Winning Papers

Learn more about AI2's Lasting Impact Award

Viewing 1-10 of 43 papers

IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions
Wenhao Yu, Meng Jiang, Peter Clark, Ashish SabharwalEMNLP • 2023
Outstanding Paper Award in the QA Track
Although counterfactual reasoning is a fundamental aspect of intelligence, the lack of large-scale counterfactual open-domain question-answering (QA) benchmarks makes it difficult to evaluate and improve models on this ability. To address this void, we…
Related:
Dataset
PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents
Kyle Lo, Zejiang Shen, Benjamin Newman, Joseph Chee Chang, Russell Authur, Erin Bransom, Stefan Candra, Yoganand Chandrasekhar, Regan Huff, Bailey Kuehl, Amanpreet Singh, Chris Wilhelm, Angele Zamarron, Marti A. Hearst, Daniel S. Weld, Doug Downey, Luca SoldainiEMNLP • 2023
Best Paper Demo Award
Despite growing interest in applying natural language processing (NLP) and computer vision (CV) models to the scholarly domain, scientific documents remain challenging to work with. They’re often in difficult-to-use PDF formats, and the ecosystem of models to…
Related:
Demo Code
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization
Hyunwoo Kim, Jack Hessel, Liwei Jiang, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Le Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, Yejin ChoiEMNLP • 2023
Outstanding Paper Award
We present SODA : the ﬁrst publicly available, million-scale high-quality social dialogue dataset. Using SODA , we train COSMO : a generalizable conversation agent outperforming previous best-performing agents on both in- and out-of-domain datasets. In…
NLPositionality: Characterizing Design Biases of Datasets and Models
Sebastin Santy, Jenny T. Liang, Ronan Le Bras, Katharina Reinecke, Maarten SapACL • 2023
ACL Outstanding Paper Award
Design biases in NLP systems, such as performance differences for different populations, often stem from their creator's positionality, i.e., views and lived experiences shaped by identity and background. Despite the prevalence and risks of design biases…
Related:
Demo
Do Androids Laugh at Electric Sheep? Humor"Understanding"Benchmarks from The New Yorker Caption Contest
Jack Hessel, Ana Marasović, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, Yejin ChoiACL • 2023
Best Paper Award
We challenge AI models to “demonstrate un-derstanding” of the sophisticated multimodal humor of The New Yorker Caption Contest. Concretely, we develop three carefully cir-cumscribed tasks for which it sufﬁces (but is not necessary) to grasp potentially…
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta, Aniruddha KembhaviCVPR • 2023
Best Paper Award
We present VISPROG, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions. VISPROG avoids the need for any task-speciﬁc training. Instead, it uses the in-context learning ability of large language…
Related:
Demo Code
The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks
Nikil Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot, Kai-Wei ChangACL • 2023
ACL Outstanding Paper Award
How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model? In this work, we study this question by contrasting social biases with non-social biases stemming from…
Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker
Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, Yulia TsvetkovACL • 2023
ACL Outstanding Paper Award
Theory of Mind (ToM)$\unicode{x2014}$the ability to reason about the mental states of other people$\unicode{x2014}$is a key element of our social intelligence. Yet, despite their ever more impressive performance, large-scale neural language models still lack…
LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization
Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle LoEACL • 2023
Outstanding Paper Award
While human evaluation remains best practice for accurately judging the faithfulness of automatically-generated summaries, few solutions exist to address the increased difficulty and workload when evaluating long-form summaries. Through a survey of 162 papers…
Related:
Code
CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context
Joseph Chee Chang, Amy X. Zhang, Jonathan Bragg, Andrew Head, Kyle Lo, Doug Downey, Daniel S. WeldCHI • 2023
Best Paper Award
When reading a scholarly article, inline citations help researchers contextualize the current article and discover relevant prior work. However, it can be challenging to prioritize and make sense of the hundreds of citations encountered during literature…

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Award Winning Papers

IfQA: A Dataset for Open-domain Question Answering under Counterfactual Presuppositions

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

NLPositionality: Characterizing Design Biases of Datasets and Models

Do Androids Laugh at Electric Sheep? Humor"Understanding"Benchmarks from The New Yorker Caption Contest

Visual Programming: Compositional visual reasoning without training

The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks

Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker

LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization

CiteSee: Augmenting Citations in Scientific Papers with Persistent and Personalized Historical Context