Papers

Learn more about AI2's Lasting Impact Award
Viewing 1-10 of 812 papers
  • Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling

    Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hanna Hajishirzi, Sameer Singh, Roy FoxarXiv2023 Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world, which makes learning complex tasks with sparse rewards difficult. If initialized with knowledge of high-level subgoals and transitions between subgoals, RL…
  • Does progress on ImageNet transfer to real-world datasets?

    Alexander W. Fang, Simon Kornblith, Ludwig SchmidtarXiv2023 Does progress on ImageNet transfer to real-world datasets? We investigate this question by evaluating ImageNet pre-trained models with varying accuracy (57% - 83%) on six practical image classification datasets. In particular, we study datasets collected with…
  • Reinforced Clarification Question Generation with Defeasibility Rewards for Disambiguating Social and Moral Situations

    Valentina Pyatkin, Jena D. Hwang, Vivek Srikumar, Ximing Lu, Liwei Jiang, Yejin Choi, Chandra BhagavatulaarXiv2022 Context is vital for commonsense moral reasoning. “Lying to a friend" is wrong if it is meant to deceive them, but may be morally okay if it is intended to protect them. Such nuanced but salient contextual information can potentially flip the moral judgment of…
  • SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

    Hyunwoo Kim, Jack Hessel, Liwei Jiang, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Le Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, Yejin ChoiarXiv2022 We present S ODA : the first publicly available, million-scale high-quality social dialogue dataset. Using S ODA , we train C OSMO : a generalizable conversation agent outperforming previous best-performing agents on both in- and out-of-domain datasets. In…
  • Reproducible scaling laws for contrastive language-image learning

    Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, J. JitsevarXiv2022 Scaling up neural networks has led to remarkable performance across a wide range of tasks. Moreover, performance often follows reliable scaling laws as a function of training set size, model size, and compute, which offers valuable guidance as large-scale…
  • Continued Pretraining for Better Zero- and Few-Shot Promptability

    Zhaofeng Wu, Robert L. Logan IV, Pete Walsh, Akshita Bhagia, Dirk Groeneveld, Sameer Singh, Iz BeltagyEMNLP2022 Recently introduced language model prompting methods can achieve high accuracy in zero-and few-shot settings while requiring few to no learned task-specific parameters. Never-theless, these methods still often trail behind full model finetuning. In this work…
  • Exploring The Landscape of Distributional Robustness for Question Answering Models

    Anas Awadalla, Mitchell Wortsman, Gabriel Ilharco, Sewon Min, Ian H. Magnusson, Hannaneh Hajishirzi, Ludwig SchmidtFindings of EMNLP2022 We conduct a large empirical evaluation to investigate the landscape of distributional robustness in question answering. Our investigation spans over 350 models and 16 question answering datasets, including a di-verse set of architectures, model sizes, and…
  • Hyperdecoders: Instance-specific decoders for multi-task NLP

    Hamish Ivison, Matthew E. PetersFindings of EMNLP2022 We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This approach produces a unique decoder for every input instance…
  • Lila: A Unified Benchmark for Mathematical Reasoning

    Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin KalyanEMNLP2022 Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning…
  • Statistical and Computational Guarantees for Influence Diagnostics

    Jillian Fisher, Lang Liu, Krishna Pillutla, Yejin Choi, Zaid HarchaouiarXiv2022 Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets…