Papers

See AI2's Award Winning Papers

Learn more about AI2's Lasting Impact Award

Viewing 41-50 of 996 papers

Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, Peter ClarkNeurIPS • 2023 Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback…
Related:
Code
A Logic for Expressing Log-Precision Transformers
William Merrill, Ashish SabharwalNeurIPS • 2023 One way to interpret the reasoning power of transformer-based language models is to describe the types of logical rules they can resolve over some input text. Recently, Chiang et al. (2023) showed that finite-precision transformers can be equivalently…
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jian, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, S. Welleck, Xiang Ren, Allyson Ettinger, Zaïd Harchaoui, Yejin ChoiNeurIPS • 2023 Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question…
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hanna HajishirziNeurIPS • 2023 Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a…
Related:
Dataset Code
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hanna HajishirziNeurIPS • 2023 In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with state-of-the-art proprietary models, these claims are often accompanied…
Related:
Dataset
RealTime QA: What's the Answer Right Now?
Jungo Kasai, Keisuke Sakaguchi, Yoichi Takahashi, Ronan Le Bras, Akari Asai, Xinyan Velocity Yu, Dragomir R. Radev, Noah A. Smith, Yejin Choi, Kentaro InuiNeurIPS • 2023 We introduce R EAL T IME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). R E AL T IME QA inquires about the current world, and QA systems need to answer questions about…
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
Bill Yuchen Lin, Yicheng Fu, Karina Yang, Prithviraj Ammanabrolu, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Yejin Choi, Xiang RenNeurIPS • 2023 We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks. SwiftSage integrates the strengths of behavior cloning and prompting large…
SciRepEval: A Multi-Format Benchmark for Scientific Document Representations
Amanpreet Singh, Mike D'Arcy, Arman Cohan, Doug Downey, Sergey FeldmanEMNLP • 2023 Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks. In…
Related:
Dataset Demo Code
A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents
Benjamin Newman, Luca Soldaini, Raymond Fok, Arman Cohan, Kyle LoEMNLP • 2023 Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may find snippets difficult to understand as they lack context…
Related:
Code
Crystal: Introspective Reasoners Reinforced with Self-Feedback
Jiacheng Liu, Ramakanth Pasunuru, Hannaneh Hajishirzi, Yejin Choi, Asli CelikyilmazEMNLP • 2023 Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the reasoning process is explicitly verbalized and utilized. However…

1
•••
4
5
6
•••
100

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Papers

Self-Refine: Iterative Refinement with Self-Feedback

A Logic for Expressing Log-Precision Transformers

Faith and Fate: Limits of Transformers on Compositionality

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources

RealTime QA: What's the Answer Right Now?

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents

Crystal: Introspective Reasoners Reinforced with Self-Feedback