Papers

Learn more about AI2's Lasting Impact Award
Viewing 21-30 of 835 papers
  • I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

    Chandra Bhagavatula, Jena D. Hwang, Doug Downey, Ronan Le Bras, Ximing Lu, Keisuke Sakaguchi, Swabha Swayamdipta, Peter West, Yejin ChoiarXiv2022 Pre-trained language models, despite their rapid advancements powered by scale, still fall short of robust commonsense capabilities. And yet, scale appears to be the win-ning recipe; after all, the largest models seem to have acquired the largest amount of…
  • Reproducible scaling laws for contrastive language-image learning

    Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, J. JitsevarXiv2022 Scaling up neural networks has led to remarkable performance across a wide range of tasks. Moreover, performance often follows reliable scaling laws as a function of training set size, model size, and compute, which offers valuable guidance as large-scale…
  • Continued Pretraining for Better Zero- and Few-Shot Promptability

    Zhaofeng Wu, Robert L. Logan IV, Pete Walsh, Akshita Bhagia, Dirk Groeneveld, Sameer Singh, Iz BeltagyEMNLP2022 Recently introduced language model prompting methods can achieve high accuracy in zero-and few-shot settings while requiring few to no learned task-specific parameters. Never-theless, these methods still often trail behind full model finetuning. In this work…
  • Exploring The Landscape of Distributional Robustness for Question Answering Models

    Anas Awadalla, Mitchell Wortsman, Gabriel Ilharco, Sewon Min, Ian H. Magnusson, Hannaneh Hajishirzi, Ludwig SchmidtFindings of EMNLP2022 We conduct a large empirical evaluation to investigate the landscape of distributional robustness in question answering. Our investigation spans over 350 models and 16 question answering datasets, including a di-verse set of architectures, model sizes, and…
  • Hyperdecoders: Instance-specific decoders for multi-task NLP

    Hamish Ivison, Matthew E. PetersFindings of EMNLP2022 We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This approach produces a unique decoder for every input instance…
  • Lila: A Unified Benchmark for Mathematical Reasoning

    Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin KalyanEMNLP2022 Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling. Towards evaluating and improving AI systems in this domain, we propose LILA, a unified mathematical reasoning…
  • Statistical and Computational Guarantees for Influence Diagnostics

    Jillian Fisher, Lang Liu, Krishna Pillutla, Yejin Choi, Zaid HarchaouiarXiv2022 Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets…
  • Abstract Visual Reasoning with Tangram Shapes

    Anya Ji, Noriyuki Kojima, N. Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav ArtziEMNLP2022
    Best Long Paper Award
    We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines. Drawing on the history of tangram puzzles as stimuli in cognitive science, we build a richly annotated dataset that, with > 1k distinct stimuli, is orders of…
  • Calibrating Trust of Multi-Hop Question Answering Systems with Decompositional Probes

    Kaige Xie, Sarah Wiegreffe, Mark O. RiedlFindings of EMNLP2022 Multi-hop Question Answering (QA) is a chal-lenging task since it requires an accurate ag-gregation of information from multiple context paragraphs and a thorough understanding of the underlying reasoning chains. Recent work in multi-hop QA has shown that…
  • Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems

    Yoshitomo Matsubara, Luca Soldaini, Eric Lind, Alessandro MoschittiFindings of EMNLP2022 Large transformer models can highly improve Answer Sentence Selection (AS2) tasks, but their high computational costs prevent their use in many real-world applications. In this pa-per, we explore the following research question: How can we make the AS2 models…