Towards Knowledgeable Language Models
@ ACL 2024 Workshop
August 12–17, 2024 hybrid in Bangkok, Thailand & remote
Knowledge has been an important pre-requisite for a variety of NLP applications, and is typically sourced from either structured knowledge sources such as knowledge bases and dictionaries or unstructured knowledge sources such as Wikipedia documents.
More recently, researchers have discovered that language models already possess a significant amount of knowledge through pre-training: LLMs can be used to generate commonsense knowledge and factual knowledge context for question answering. While the results are encouraging, there are still lingering questions:
This workshop examines the lifecycle of knowledge within language models:
Knowledge has been an important prerequisite for various NLP applications and is typically derived from either structured knowledge sources such as knowledge bases and dictionaries or unstructured knowledge sources such as Wikipedia documents and news articles.
It is known that language models already possess a significant amount of knowledge through pre-training: LLMs can be used to generate commonsense knowledge and factual knowledge when prompted to do so. However, beyond the surface, there are still many lingering questions such as “where the knowledge comes from”, “how do we quantify the amount of knowledge”, “is the knowledge reliable (and do LMs themselves know)”, “how can we augment LMs with domain-specific knowledge”, “how can we revise knowledge without hurting the reasoning abilities of LMs” and “how can we leverage knowledge to assist the self-correction of LMs”.
In this workshop, we want to bring together researchers who focus on different stages and different aspects (structured knowledge, unstructured knowledge, and knowledge acquired from LMs themselves) of the knowledge lifecycle to discuss the role of knowledge in the era of large language models.
Submission Topics
We welcome long (8 page) and short (4 page) paper submissions on all topics related to knowledgable LMs, including:
Analysis of knowledge within LMs: how much they know and where that knowledge is from. Enhancing LMs with existing knowledge sources (knowledge graphs, domain-specific databases, manuals, and rules, etc, either during training or inference). Analyzing and improving RAG (retrieval-augmented generation) systems Updating and editing knowledge in LMs. Knowledge extraction and generation using LMs Evaluation of knowledge utilization (faithfulness, truthfulness) by LMs. Identification and mitigation of LM hallucinations, factual error correction
We will also announce a Best Paper Award at our workshop sponsored by Amazon.
Submission Instructions
We welcome two types of papers: regular workshop papers and non-archival submissions. Only regular workshop papers will be included in the workshop proceedings. All submissions should be in PDF format following the ACL template and made through OpenReview submission portal (https://openreview.net/group?id=aclweb.org/ACL/2024/Workshop/KnowledgeLM)
All deadlines are 11:59 pm UTC-12h (“Anywhere on Earth”).
Submission Deadline | May 30 2024 |
---|---|
Decision Notifications | June 30 2024 |
Camera-Ready Deadline | July 5 2024 (to match ACL proceeding deadline) |
Submission Deadline for Presentation-Only (Finding Papers) | July 10 2024 |
Workshop Date | 16 August 2024 |
Time | Program |
---|---|
09:00-09:05 | Opening Remarks |
09:05-09:40 | Keynote Speech Peter Clark: What do our Machines Believe? Do language models form anything like a ‘mental model’ when reasoning? And do they have coherent ‘beliefs’ about the world? Probing an LM, we find the LLM’s world views are only partially coherent, and often contain blatent inconsistencies. Taking this further, I’ll describe how we can extract ‘belief graphs’ from LMs and repair the inconsistencies they uncover. More generally, I’ll promote a two-layered architecture for future systems, consisting of the LM plus a symbolic representation of (parts of) the model’s belief state, supporting systematic reasoning, interaction, addition of external knowledge, and more rational behavior by our future LM companions. |
09:40-10:15 | Keynote Speech Luke Zettlemoyer: Chameleon: Universal Mixed-modal Modeling by Tokenizing Everything Existing multimodal models typically have custom architectures that are designed for specific modalities (image->text, text->image, text only, etc). In this talk, I will present our recent work on Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. The key idea is to tokenize images into a discrete space, wherever they appear within multimodal documents, and then model the resulting sequences of mixed-modal tokens with a single unified transformer. This approach allows us to trivial lift all of the advanced modeling techniques originally developed for text-only models to the multimodal setting, including multi-task alignment and retrieval augmentation, as I will show. It also performs well overall, demonstrating broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. |
10:15-10:50 | Keynote Speech Tatsu Hashimoto |
10:50-11:05 | Coffee Break |
11:05-12:25 | Oral Presentation: Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs |
Oral Presentation: AcKnowledge: Acquired Knowledge Representation by Small Language Model Without Pre-training | |
Oral Presentation: Unified Hallucination Detection for Multimodal Large Language Models | |
Oral Presentation: Is Table Retrieval a Solved Problem? Join-Aware Multi-Table Retrieval | |
Oral Presentation: Measuring the Inconsistency of Large Language Models in Preferential Ranking | |
Oral Presentation: Merging Facts, Crafting Fallacies: Evaluating the Contradictory Nature of Aggregated Factual Claims in Long-Form Generations | |
12:25-12:30 | Best Paper and Outstanding Paper Announcement |
12:30-13:30 | Lunch Break |
13:30-14:05 | Keynote Speech Isabelle Augenstein: Revealing the Parametric Knowledge of Language Models Language Models acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model’s inner workings and further for updating or correcting this embedded knowledge without the significant cost of retraining. Moreover, when using these language models for knoweldge-intensive language understanding tasks, LMs have to integrate relevant context, mitigating their inherent weaknesses, such as incomplete or outdated knowledge. Nevertheless, studies indicate that LMs often ignore the provided context as it can be in conflict with the pre-existing LM’s memory learned during pre-training. Conflicting knowledge can also already be present in the LM’s parameters, termed intra-memory conflict. This underscores the importance of unveiling exactly what knowledge is stored and its association with specific model components, and how this knowledge is used for downstream tasks. In this talk, I will present our research on evaluating the knowledge present in LMs, through a unified knowledge attribution framework, as well as diagnostic tests that can reveal knowledge conflicts. |
14:05-14:40 | Keynote Speech Eduard Hovy |
14:40-15:15 | Keynote Speech Hannah Rashkin: Challenges in measuring attribution in NLG models Large language models frequently ‘hallucinate’ information, making claims about the real world that are not supported by background knowledge. I will discuss our recent work, which explores metrics for attribution, a framework measuring how well information in LLM output is supported by external documents. I will cover our efforts to measure attribution in knowledge-grounded tasks using both human annotators and automatic metrics. Lastly, I will talk about ongoing challenges in measuring attribution and areas in which these metrics need further exploration. |
15:15-15:50 | Panel Discussion |
16:00-17:30 | Poster Session |
PhonologyBench: Evaluating Phonological Skills of Large Language Models
Ashima Suvarna, Harshita Khandelwal and Nanyun Peng
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?
Nishant Balepur and Rachel Rudinger
Reassess Summary Factual Inconsistency Detection with Large Language Model
Jiuding Yang, Hui Liu, Weidong Guo, Zhuwei Rao, Yu Xu and Di Niu
Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark
Xiao Liu, Jianfeng Lin and Jiawei Zhang
Retrieval-Augmented Knowledge Integration into Language Models: A Survey
Yuxuan Chen, Daniel Roder, Justus-Jonas Erker, Leonhard Hennig, Philippe Thomas, Sebastian Moller and Roland Roller
ClinicalRAG: Enhancing Clinical Decision Support through Heterogeneous Knowledge Retrieval
Yuxing Lu, Xukai Zhao and Jinzhuo Wang
Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs
Ye Liu, Rui Meng, Meghana Moorthy Bhat, Shafiq Joty, Caiming Xiong, Yingbo Zhou and Semih Yavuz
AcKnowledge: Acquired Knowledge Representation by Small Language Model Without Pre-training
Sourav Das, Sanjay Chatterji and Imon Mukherjee
Knowledge Acquisition through Continued Pretraining is Difficult: A Case Study on r/AskHistorians
Jan Vincent Hoffbauer, Sylwester Sawicki, Marc Lenard Ulrich, Tolga Buz, Konstantin Dobler, Moritz Schneider and Gerard De Melo
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
Chenyang Lyu, Minghao Wu and Alham Fikri Aji
PromptRE: Weakly-Supervised Document-Level Relation Extraction via Prompting-Based Data Programming
Chufan Gao, Xulin Fan, Jimeng Sun and Xuan Wang
Patent Response System Optimised for Faithfulness: Procedural Knowledge Embodiment with Knowledge Graph and Retrieval Augmented Generation
Jung-Mei Chu, Hao-Cheng Lo, Jieh Hsiang and Chun-Chieh Cho
Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders
Jinseok Kim, Jaewon Jung, Sangyeop Kim, Sohhyung Park and Sungzoon Cho
Measuring the Inconsistency of Large Language Models in Preferential Ranking
Xiutian Zhao, Ke Wang and Wei Peng
Retrieval-augmented generation in multilingual settings
Nadezhda Chirkova, David Rau, Hervé Déjean, Thibault Formal, Stéphane Clinchant and Vassilina Nikoulina
Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models
Ioana Buhnila, Aman Sinha and Mathieu Constant