Papers

See AI2's Award Winning Papers

Learn more about AI2's Lasting Impact Award

Viewing 21-30 of 1033 papers

Digital Socrates: Evaluating LLMs through explanation critiques
Yuling Gu, Oyvind Tafjord, Peter ClarkACL • 2024 While LLMs can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood. In response, our goal is to define a detailed way of characterizing the explanation capabilities of modern models…
Related:
Dataset
A Design Space for Intelligent and Interactive Writing Assistants
Mina Lee, Katy Ilonka Gero, John Joon Young Chung, S. B. Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L.C. Guo, Md. Naimul Hoque, Yewon Kim, Seyed Parsa Neshaei, Agnia Sergeyuk, A. Shibani, Disha Shrivastava, Lila Shroff, Jessi Stark, S. Sterman, Sitong Wang, Antoine Bosselut, Daniel Buschek, Joseph Chee Chang, Sherol Chen, Max Kreminski, Joonsuk Park, Roy Pea, Eugenia H. Rho, Shannon Zejiang Shen, Pao SiangliulueCHI • 2024 In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine…
Mitigating Barriers to Public Social Interaction with Meronymous Communication
Nouran Soliman, Hyeonsu B. Kang, Matthew Latzke, Jonathan Bragg, Joseph Chee Chang, Amy X. Zhang, David R. KargerCHI • 2024
Best Paper Award
In communities with social hierarchies, fear of judgment can discourage communication. While anonymity may alleviate some social pressure, fully anonymous spaces enable toxic behavior and hide the social context that motivates people to participate and helps…
PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers
Yoonjoo Lee, Hyeonsu B Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, Pao SiangliulueCHI • 2024 With the rapid growth of scholarly archives, researchers subscribe to"paper alert"systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle…
Improving Language Models with Advantage-based Offline Policy Gradients
Ashutosh Baheti, Ximing Lu, Faeze Brahman, Ronan Le Bras, Maarten Sap, Mark O. RiedlICLR • 2024 Language Models (LMs) achieve substantial language capabilities when finetuned using Reinforcement Learning with Human Feedback (RLHF). However, RLHF is an unstable and data-hungry process that continually requires new high-quality LM-generated data for…
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Shashank Gupta, Vaishnavi Shrivastava, A. Deshpande, A. Kalyan, Peter Clark, Ashish Sabharwal, Tushar KhotICLR • 2024 Recent works have showcased the ability of LLMs to embody diverse personas in their responses, exemplified by prompts like 'You are Yoda. Explain the Theory of Relativity.' While this ability allows personalization of LLMs and enables human behavior…
Related:
Dataset Code
BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models
Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh HajishirziICLR • 2024 Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of…
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chun-yue Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng GaoICLR • 2024 Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit impressive problem-solving skills in many tasks and domains, but their ability in mathematical reasoning in visual contexts has not been systematically studied. To bridge this gap, we…
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh HajishirziICLR • 2024 Despite their remarkable capabilities, large language models (LLMs) often produce responses containing factual inaccuracies due to their sole reliance on the parametric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad hoc approach that…
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith, Luke ZettlemoyerICLR • 2024 The legality of training language models (LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show, model performance significantly degrades if trained only on low-risk text (e.g., out-of-copyright books or government…

Natural Language Processing

Computer Vision

AI for the Environment

Experimentation and Communication

Research

Research

Papers

Digital Socrates: Evaluating LLMs through explanation critiques

A Design Space for Intelligent and Interactive Writing Assistants

Mitigating Barriers to Public Social Interaction with Meronymous Communication

PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers

Improving Language Models with Advantage-based Offline Policy Gradients

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore