Papers

Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 180 papers
  • Complexity-Based Prompting for Multi-Step Reasoning

    Yao Fu, Hao-Chun Peng, Ashish Sabharwal, Peter Clark, Tushar KhotICLR2023 We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer…
  • Decomposed Prompting: A Modular Approach for Solving Complex Tasks

    Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish SabharwalICLR2023 Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn…
  • Transformers Can Be Expressed In First-Order Logic with Majority

    William Merrill, Ashish SabharwalarXiv2023 Characterizing the implicit structure of the computation within neural networks is a foundational problem in the area of deep learning interpretability. Can the inner decision process of neural networks be captured symbolically in some familiar logic? We show…
  • Do language models have coherent mental models of everyday things?

    Yuling Gu, Bhavana Dalvi Mishra, Peter ClarkarXiv2022 When people think of everyday things like an “egg,” they typically have a mental image associated with it. This commonsense knowledge helps us understand how these everyday things work and how to interact with them. For example, when someone tries to make a…
  • DISCO: Distilling Phrasal Counterfactuals with Large Language Models

    Zeming Chen, Qiyue Gao, Kyle Richardson, Antoine Bosselut, Ashish SabharwalarXiv2022 Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if…
  • Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish SabharwalarXiv2022 Recent work has shown that large language models are capable of generating natural language reasoning steps or Chains-of-Thoughts (CoT) to answer a multi-step question when prompted to do so. This is insufficient, however, when the necessary knowledge is not…
  • Lila: A Unified Benchmark for Mathematical Reasoning

    Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin KalyanEMNLP2022 Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning…
  • Calibrating Trust of Multi-Hop Question Answering Systems with Decompositional Probes

    Kaige Xie, Sarah Wiegreffe, Mark O. RiedlFindings of EMNLP2022 Multi-hop Question Answering (QA) is a chal-lenging task since it requires an accurate ag-gregation of information from multiple context paragraphs and a thorough understanding of the underlying reasoning chains. Recent work in multi-hop QA has shown that…
  • Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning

    Oyvind Tafjord, Bhavana Dalvi Mishra, Peter ClarkEMNLP2022 Our goal is a question-answering (QA) system that can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning . Such a capability would allow better understanding of why a model produced the answer it did. Our approach…
  • Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning

    Xiangyu Peng, Siyan Li, Sarah Wiegreffe, Mark O. RiedlFindings of EMNLP2022 Transformer-based language model approaches to automated story generation currently provide state-of-the-art results. However, they still suffer from plot incoherence when generating narratives over time, and critically lack basic commonsense reasoning…