Towards Knowledgeable Language Models

@ ACL 2024 Workshop

August 12–17, 2024 hybrid in Bangkok, Thailand & remote

Towards Knowledgeable Language Models

Knowledge has been an important pre-requisite for a variety of NLP applications, and is typically sourced from either structured knowledge sources such as knowledge bases and dictionaries or unstructured knowledge sources such as Wikipedia documents.

More recently, researchers have discovered that language models already possess a significant amount of knowledge through pre-training: LLMs can be used to generate commonsense knowledge and factual knowledge context for question answering. While the results are encouraging, there are still lingering questions:

Where does this knowledge come from?
How much do language models know?
Is this knowledge reliable?
If some knowledge is wrong, can we fix it?

This workshop examines the lifecycle of knowledge within language models:

(1) the emergence of knowledge through language model pre-training;
(2) injection of external knowledge;
(3) the updating and modification of knowledge;
(4) probing and generation of knowledge.

Call for Papers

Knowledge has been an important prerequisite for various NLP applications and is typically derived from either structured knowledge sources such as knowledge bases and dictionaries or unstructured knowledge sources such as Wikipedia documents and news articles.

It is known that language models already possess a significant amount of knowledge through pre-training: LLMs can be used to generate commonsense knowledge and factual knowledge when prompted to do so. However, beyond the surface, there are still many lingering questions such as “where the knowledge comes from”, “how do we quantify the amount of knowledge”, “is the knowledge reliable (and do LMs themselves know)”, “how can we augment LMs with domain-specific knowledge”, “how can we revise knowledge without hurting the reasoning abilities of LMs” and “how can we leverage knowledge to assist the self-correction of LMs”.

In this workshop, we want to bring together researchers who focus on different stages and different aspects (structured knowledge, unstructured knowledge, and knowledge acquired from LMs themselves) of the knowledge lifecycle to discuss the role of knowledge in the era of large language models.

Submission Topics

We welcome long (8 page) and short (4 page) paper submissions on all topics related to knowledgable LMs, including:

Analysis of knowledge within LMs: how much they know and where that knowledge is from. Enhancing LMs with existing knowledge sources (knowledge graphs, domain-specific databases, manuals, and rules, etc, either during training or inference). Analyzing and improving RAG (retrieval-augmented generation) systems Updating and editing knowledge in LMs. Knowledge extraction and generation using LMs Evaluation of knowledge utilization (faithfulness, truthfulness) by LMs. Identification and mitigation of LM hallucinations, factual error correction

We will also announce a Best Paper Award at our workshop sponsored by Amazon.

Submission Instructions

We welcome two types of papers: regular workshop papers and non-archival submissions. Only regular workshop papers will be included in the workshop proceedings. All submissions should be in PDF format following the ACL template and made through OpenReview submission portal (https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/KnowledgeLM)

Important Dates

All deadlines are 11:59 pm UTC-12h (“Anywhere on Earth”).

Submission Deadline	May 30 2024 ~~May 25 2024~~
Decision Notifications	June 30 2024 ~~June 22 2024~~
Camera-Ready Deadline	July 5 2024 (to match ACL proceeding deadline) ~~July 10 2024~~
Submission Deadline for Presentation-Only (Finding Papers)	July 10 2024
Workshop Date	16 August 2024

Speakers

Eduard Hovy

CMU

Wenpeng Yin

PSU

Eric Wong

UPenn

Lianhui Qin

UCSD

Li Harry Zhang

Drexel University

Huajie Shao

William and Mary

Schedule

Time	Program
09:00-09:05	Opening Remarks
09:05-09:40	Keynote Speech Peter Clark: What do our Machines Believe? Do language models form anything like a ‘mental model’ when reasoning? And do they have coherent ‘beliefs’ about the world? Probing an LM, we find the LLM’s world views are only partially coherent, and often contain blatent inconsistencies. Taking this further, I’ll describe how we can extract ‘belief graphs’ from LMs and repair the inconsistencies they uncover. More generally, I’ll promote a two-layered architecture for future systems, consisting of the LM plus a symbolic representation of (parts of) the model’s belief state, supporting systematic reasoning, interaction, addition of external knowledge, and more rational behavior by our future LM companions.
09:40-10:15	Keynote Speech Luke Zettlemoyer: Chameleon: Universal Mixed-modal Modeling by Tokenizing Everything Existing multimodal models typically have custom architectures that are designed for specific modalities (image->text, text->image, text only, etc). In this talk, I will present our recent work on Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. The key idea is to tokenize images into a discrete space, wherever they appear within multimodal documents, and then model the resulting sequences of mixed-modal tokens with a single unified transformer. This approach allows us to trivial lift all of the advanced modeling techniques originally developed for text-only models to the multimodal setting, including multi-task alignment and retrieval augmentation, as I will show. It also performs well overall, demonstrating broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model.
10:15-10:50	Keynote Speech Tatsu Hashimoto
10:50-11:05	Coffee Break
11:05-12:25	Oral Presentation: Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs
	Oral Presentation: AcKnowledge: Acquired Knowledge Representation by Small Language Model Without Pre-training
	Oral Presentation: Unified Hallucination Detection for Multimodal Large Language Models
	Oral Presentation: Is Table Retrieval a Solved Problem? Join-Aware Multi-Table Retrieval
	Oral Presentation: Measuring the Inconsistency of Large Language Models in Preferential Ranking
	Oral Presentation: Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations
12:25-12:30	Best Paper and Outstanding Paper Announcement
12:30-13:30	Lunch Break
13:30-14:05	Keynote Speech Isabelle Augenstein: Revealing the Parametric Knowledge of Language Models Language Models acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model’s inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. Moreover, when using these language models for knoweldge-intensive language understanding tasks, LMs have to integrate relevant context, mitigating their inherent weaknesses, such as incomplete or outdated knowledge. Nevertheless, studies indicate that LMs often ignore the provided context as it can be in conflict with the pre-existing LM’s memory learned during pre-training. Conflicting knowledge can also already be present in the LM’s parameters, termed intra-memory conflict. This underscores the importance of unveiling exactly what knowledge is stored and its association with specific model components, and how this knowledge is used for downstream tasks. In this talk, I will present our research on evaluating the knowledge present in LMs, through a unified knowledge attribution framework, as well as diagnostic tests that can reveal knowledge conflicts.
14:05-14:40	Keynote Speech Eduard Hovy
14:40-15:15	Keynote Speech Hannah Rashkin: Challenges in measuring attribution in NLG models Large language models frequently ‘hallucinate’ information, making claims about the real world that are not supported by background knowledge. I will discuss our recent work, which explores metrics for attribution, a framework measuring how well information in LLM output is supported by external documents. I will cover our efforts to measure attribution in knowledge-grounded tasks using both human annotators and automatic metrics. Lastly, I will talk about ongoing challenges in measuring attribution and areas in which these metrics need further exploration.
15:15-15:50	Panel Discussion
16:00-17:30	Poster Session

Accepted Papers

PhonologyBench: Evaluating Phonological Skills of Large Language Models
Ashima Suvarna, Harshita Khandelwal and Nanyun Peng

Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?
Nishant Balepur and Rachel Rudinger

Reassess Summary Factual Inconsistency Detection with Large Language Model
Jiuding Yang, Hui Liu, Weidong Guo, Zhuwei Rao, Yu Xu and Di Niu

Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark
Xiao Liu, Jianfeng Lin and Jiawei Zhang

Retrieval-Augmented Knowledge Integration into Language Models: A Survey
Yuxuan Chen, Daniel Roder, Justus-Jonas Erker, Leonhard Hennig, Philippe Thomas, Sebastian Moller and Roland Roller

ClinicalRAG: Enhancing Clinical Decision Support through Heterogeneous Knowledge Retrieval
Yuxing Lu, Xukai Zhao and Jinzhuo Wang

Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs
Ye Liu, Rui Meng, Meghana Moorthy Bhat, Shafiq Joty, Caiming Xiong, Yingbo Zhou and Semih Yavuz

AcKnowledge: Acquired Knowledge Representation by Small Language Model Without Pre-training
Sourav Das, Sanjay Chatterji and Imon Mukherjee

Knowledge Acquisition through Continued Pretraining is Difficult: A Case Study on r/AskHistorians
Jan Vincent Hoffbauer, Sylwester Sawicki, Marc Lenard Ulrich, Tolga Buz, Konstantin Dobler, Moritz Schneider and Gerard De Melo

Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
Chenyang Lyu, Minghao Wu and Alham Fikri Aji

PromptRE: Weakly-Supervised Document-Level Relation Extraction via Prompting-Based Data Programming
Chufan Gao, Xulin Fan, Jimeng Sun and Xuan Wang

Patent Response System Optimised for Faithfulness: Procedural Knowledge Embodiment with Knowledge Graph and Retrieval Augmented Generation
Jung-Mei Chu, Hao-Cheng Lo, Jieh Hsiang and Chun-Chieh Cho

Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders
Jinseok Kim, Jaewon Jung, Sangyeop Kim, Sohhyung Park and Sungzoon Cho

Measuring the Inconsistency of Large Language Models in Preferential Ranking
Xiutian Zhao, Ke Wang and Wei Peng

Retrieval-augmented generation in multilingual settings
Nadezhda Chirkova, David Rau, Hervé Déjean, Thibault Formal, Stéphane Clinchant and Vassilina Nikoulina

Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models
Ioana Buhnila, Aman Sinha and Mathieu Constant