Papers

See AI2's Award Winning Papers

Learn more about AI2's Lasting Impact Award

Viewing 1-10 of 218 papers

ADaPT: As-Needed Decomposition and Planning with Language Models
Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar KhotNAACL Findings • 2024 Large Language Models (LLMs) are increasingly being used for interactive decision-making tasks requiring planning and adapting to the environment. Recent works employ LLMs-as-agents in broadly two ways: iteratively determining the next action (iterative…
Related:
Demo Code
Leveraging Code to Improve In-context Learning for Semantic Parsing
Ben Bogin, Shivanshu Gupta, Peter Clark, Ashish SabharwalNAACL • 2024 In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization. However, learning to parse to rare domain-specific languages (DSLs) from just a few demonstrations is challenging, limiting the…
Related:
Code
QualEval: Qualitative Evaluation for Model Improvement
Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin KalyanNAACL • 2024 Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world…
Related:
Code
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals
Ruihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson, Yanghua Xiao, Deqing Yangtechnical report • 2024 Language agents powered by large language models (LLMs) are increasingly valuable as decision-making tools in domains such as gaming and programming. However, these agents often face challenges in achieving high-level goals without detailed instructions and…
Related:
Dataset
Digital Socrates: Evaluating LLMs through explanation critiques
Yuling Gu, Oyvind Tafjord, Peter ClarkACL • 2024 While LLMs can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood. In response, our goal is to define a detailed way of characterizing the explanation capabilities of modern models…
Related:
Dataset
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Shashank Gupta, Vaishnavi Shrivastava, A. Deshpande, A. Kalyan, Peter Clark, Ashish Sabharwal, Tushar KhotICLR • 2024 Recent works have showcased the ability of LLMs to embody diverse personas in their responses, exemplified by prompts like 'You are Yoda. Explain the Theory of Relativity.' While this ability allows personalization of LLMs and enables human behavior…
Related:
Dataset Code
The Expressive Power of Transformers with Chain of Thought
William Merrill, Ashish SabharwalICLR • 2024 Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately after…
Closing the Curious Case of Neural Text Degeneration
Matthew Finlayson, John Hewitt, Alexander Koller, Swabha Swayamdipta, Ashish SabharwalICLR • 2024 Despite their ubiquity in language generation, it remains unknown why truncation sampling heuristics like nucleus sampling are so effective. We provide a theoretical explanation for the effectiveness of the truncation sampling by proving that truncation…
Related:
Code
Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic
Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Jiang, Bhavana Dalvi, Oyvind Tafjord, Peter Alexander Jansen, Peter Clark, Benjamin Van DurmearXiv.org • 2024 Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction…
Calibrating Large Language Models with Sample Consistency
Qing Lyu, Kumar Shridhar, Chaitanya Malaviya, Li Zhang, Yanai Elazar, Niket Tandon, Marianna Apidianaki, Mrinmaya Sachan, Chris Callison-BurcharXiv • 2024 Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often uncalibrated inherently and elude conventional calibration techniques due to their proprietary nature and…

1
2
3
•••
22

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Papers

ADaPT: As-Needed Decomposition and Planning with Language Models

Leveraging Code to Improve In-context Learning for Semantic Parsing

QualEval: Qualitative Evaluation for Model Improvement

SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

Digital Socrates: Evaluating LLMs through explanation critiques

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

The Expressive Power of Transformers with Chain of Thought

Closing the Curious Case of Neural Text Degeneration

Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic

Calibrating Large Language Models with Sample Consistency