Towards Knowledgeable Foundation Models

@ AAAI 2025 Workshop

March 4, 2025 in Philadelphia, Pennsylvania

Towards Knowledgeable Foundation Models

Knowledge has been an important pre-requisite for a variety of AI applications, and is typically sourced from either structured knowledge sources such as knowledge bases and dictionaries or unstructured knowledge sources such as Wikipedia documents.

More recently, researchers have discovered that language models already possess a significant amount of knowledge through pre-training: LLMs can be used to generate commonsense knowledge and factual knowledge context for question answering. While the results are encouraging, there are still lingering questions:

Where does this knowledge come from?
How much do language models know?
Is this knowledge reliable?
If some knowledge is wrong, can we fix it?

This workshop examines the lifecycle of knowledge within language models:

(1) the emergence of knowledge through language model pre-training;
(2) injection of external knowledge;
(3) the updating and modification of knowledge;
(4) probing and generation of knowledge.

This is the 2nd workshop for Knowledgeable Foundation Model workshop. The previous workshop was hosted at KnowLM@ACL2024.

Stay tuned by following us on Twitter @lm_knowledge.

Call for Papers

Knowledge has been an important prerequisite for various NLP applications and is typically derived from either structured knowledge sources such as knowledge bases and dictionaries or unstructured knowledge sources such as Wikipedia documents and news articles.

It is known that language models already possess a significant amount of knowledge through pre-training: LLMs can be used to generate commonsense knowledge and factual knowledge when prompted to do so. However, beyond the surface, there are still many lingering questions such as “where the knowledge comes from”, “how do we quantify the amount of knowledge”, “is the knowledge reliable (and do LMs themselves know)”, “how can we augment LMs with domain-specific knowledge”, “how can we revise knowledge without hurting the reasoning abilities of LMs” and “how can we leverage knowledge to assist the self-correction of LMs”.

In this workshop, we want to bring together researchers who focus on different stages and different aspects (structured knowledge, unstructured knowledge, and knowledge acquired from LMs themselves) of the knowledge lifecycle to discuss the role of knowledge in the era of large language models.

Submission Topics

We welcome submissions on all topics related to knowledgable LMs, including:

Analysis of knowledge within LMs: how much they know and where that knowledge is from.
Enhancing LMs with existing knowledge sources (knowledge graphs, domain-specific databases, manuals, and rules, etc, either during training or inference).
Analyzing and improving RAG (retrieval-augmented generation) systems
Updating and editing knowledge in LMs.
Knowledge extraction and generation using LMs
Evaluation of knowledge utilization (faithfulness, truthfulness) by LMs.
Identification and mitigation of LM hallucinations, factual error correction

We will also announce a Best Paper Award at our workshop sponsored by Amazon.

Submission Instructions

We solicit long papers (7 pages), short papers (4 pages), abstract papers (2 pages) with unlimited references/appendices. The contributions will be non-archival but will be hosted on our workshop website. Papers must be formatted in AAAI two-column, camera-ready style; see the AAAI-25 author kit for details. Please submit through OpenReview submission portal.

Important Dates

All deadlines are 11:59 pm UTC-12h (“Anywhere on Earth”).

Submission Deadline	Dec 1st 2024 (23:59pm AoE)
Decision Notifications	Dec 15th 2024 (23:59pm AoE)
Camera-Ready Deadline	Dec 22nd 2024 (23:59pm AoE)
Workshop Date	4th March 2024

Speakers

Eduard Hovy

CMU

Wenpeng Yin

PSU

Eric Wong

UPenn

Lianhui Qin

UCSD

Li Harry Zhang

Drexel University

Huajie Shao

William and Mary

Schedule

Time	Program
09:00-09:05	Opening Remarks
09:05-09:50	Keynote Speech Edward Hovy: Declarative and Procedural Knowledge in LLMs Large Language Models LLMs combine (declarative) knowledge and (procedural) inference, and sometimes are able to apply inference to produce new knowledge as well. But their amazing capabilities mask unexpected gaps and inaccuracies. Pinpointing specific bits of knowledge and specific inference pathways (so-called “circuits”), and identifying gaps, injecting desired new knowledge, and guiding inference is a challenge for NLP. This talk provides background and a few ideas to explore.
09:50-10:35	Keynote Speech Wenpeng Yin: LLM Editing and Unlearning Large Language Models (LLMs), despite their remarkable success, often contain erroneous, incomplete, inappropriate, or private/confidential knowledge. This raises two key research challenges: i) Updating LLMs with new knowledge when existing information is incorrect or missing. ii) Removing unwanted knowledge, such as inappropriate or private content, without disrupting other capabilities. This talk will cover two key pieces of work: The first explores LLM editing, reviewing prior approaches for handling sequential edits and analyzing factors that can mitigate their negative effects. The second focuses on LLM unlearning, highlighting the failure of existing methods under quantization, explaining the underlying causes, and introducing a new quantization-robust unlearning method that removes knowledge while preserving the model’s general capabilities. Together, these studies provide insights into the strengths and limitations of current LLM editing and unlearning techniques. The talk will conclude with open research questions for the community.
10:35-11:00	Coffee Break
11:00-11:45	Keynote Speech Eric Wong: What does a Foundation Model (not) Know? Foundation models demonstrate remarkable capabilities, but what specific knowledge do these models truly comprehend? This talk presents our research on understanding the knowledge landscape within foundation models and introduces task elicitation: a technique for systematically identifying their strengths and weaknesses. Our exploration spans a spectrum from concrete physical knowledge—such as leaf morphology and texture characteristics relevant to forestry applications—to abstract reasoning capabilities involving factual accuracy, social norms, and safety.
11:45-12:00	Oral Presentation: Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition Jiyeon Kim, Hyunji Lee, Hyowon Cho, Joel Jang, Hyeonbin Hwang, Seungpil Won, Youbin Ahn, Dohaeng Lee, Minjoon Seo
12:00-12:15	Oral Presentation: IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates Aissatou Diallo, Antonis Bikakis, Luke Dickens, Anthony Hunter, Rob Miller
12:15-12:30	Oral Presentation: Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing Akshat Gupta, Christine Fang, Atahan Ozdemir, Maochuan Lu, Thomas Hartvigsen, Ahmed Alaa, Gopala Anumanchipalli
12:30-12:35	Best Paper Announcement
12:35-14:00	Lunch Break (Student Mentoring Session + Poster Session)
14:00-14:45	Keynote Speech From Knowing to Reasoning: Teaching Machines to Reason and Understand the World Large language and multimodal models have amassed vast knowledge, yet they often fail at scientific reasoning, creative problem-solving, and decision-making in the physical and social world. We show that many of these failures stem not from a lack of knowledge, but from the inability of reasoning and understanding the knowledge in the real world. In this talk, I'll present our recent work to address these gaps, including (1) teaching LLMs to reason structurally to solve complex chemistry reasoning, (2) teaching LLMs to reason divergently for creative problem solving, (3) revealing fundamental limitations of modern vision-language models in understanding and reasoning about the physical world, and (4) building new open-ended world simulators for LLMs/VLMs to interact and learn through "experiencing" the worlds.
14:45-15:30	Keynote Speech Li Harry Zhang: Executable and Trustworthy Planning with Large Language Models While large language models (LLM) can provide decent instructions, they are far from able to come up an executable and trustworthy plan for a particular user or agent, grounding to their specific situation and needs. To address this, I advocate for the methodology of using LLM as a code generator to create a formal representation of the planning environment. In conjunction with tools in classical AI planning, a plan can be found deterministically and faithfully. In this talk, I will discuss two strands of efforts. The first tackles fully-observed planning domains, where the model is given complete information and must propose a complete plan that satisfies given constraints. The second tackles partially-observed planning domains, where the model makes partial observations about the environment, propose partial plans, and iteratively acquire knowledge to complete a task. In both settings, we show that state-of-the-art models like DeepSeek-R1 and gpt-4o are heavily challenged by even the simplest tasks like rearranging or looking for objects. When prompted to generate the planning domain definition language (PDDL) input into a solver, LLMs outperform generating the plans directly. Even so, both syntactic and semantic errors point to LLMs' weakened ability to generate formal representations, especially when the language or domain is underrepresented in their pre-training.
15:30-16:00	Coffee Break
16:00-16:45	Keynote Speech Huajie Shao: Physics Knowledge-Guided Foundation Models for Dynamical Systems Dynamical systems play a crucial role in diverse applications, including autonomous driving, climate science, and brain dynamics. However, achieving long-term dynamic forecasting in complex and uncertain environments remains a significant challenge. In this talk, I will introduce physics-guided foundation models designed to enhance both the generalization capability and accuracy of dynamic forecasting, even in the presence of noisy and incomplete data. First, I will present a general-purpose, physics-enhanced state-space model for real-world dynamical systems with partial physics knowledge. Next, I will discuss variational formulation-based neural ODEs for learning implicit physics from observed data. Finally, I will conclude by exploring future directions for advancing foundational models for dynamical systems.
16:45-16:50	Lightning Talk: Mechanistic Understanding of Language Models in Syntactic Code Completion Samuel Miller, Daking Rai, Ziyu Yao
16:50-16:55	Lightning Talk: Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games Nicholas R Waytowich, Devin White, MD Sunbeam, Vinicius G. Goecks
16:55-17:00	Lightning Talk: Knowledge Graph-Enhanced LLM for Food Recommendation through Question Answering Fnu Mohbat, Mohammed J Zaki
17:00-17:05	Lightning Talk: CricRAG: Retrieval Augmented Vision-Language Models for Personalized Cricket Coaching Agamdeep Singh, Sujit PB, Mayank Vatsa
17:05-17:10	Lightning Talk: Auto-Q : Automated Domain Questions Generation for Industrial Assets Christodoulos Constantinides, Vivek Sharma, Shuxin Lin, Nianjun Zhou, Bharathi Chaudhury, Dhaval C Patel
17:10-17:30	QA & Closing Remarks

Accepted Papers

Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing (oral)
Akshat Gupta, Christine Fang, Atahan Ozdemir, Maochuan Lu, Thomas Hartvigsen, Ahmed Alaa, Gopala Anumanchipalli

IAO Prompting: Making Knowledge Flow Explicit in LLMs through Structured Reasoning Templates (oral)
Aissatou Diallo, Antonis Bikakis, Luke Dickens, Anthony Hunter, Rob Miller

Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition (oral)
Jiyeon Kim, Hyunji Lee, Hyowon Cho, Joel Jang, Hyeonbin Hwang, Seungpil Won, Youbin Ahn, Dohaeng Lee, Minjoon Seo

Continuously Steering LLMs Sensitivity to Contextual Knowledge with Proxy Models
Yilin Wang, Heng Wang, Minnan Luo

Mechanistic Understanding of Language Models in Syntactic Code Completion
Samuel Miller, Daking Rai, Ziyu Yao

Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games
Nicholas R Waytowich, Devin White, MD Sunbeam, Vinicius G. Goecks

Knowledge Graph-Enhanced LLM for Food Recommendation through Question Answering
Fnu Mohbat, Mohammed J Zaki

CricRAG: Retrieval Augmented Vision-Language Models for Personalized Cricket Coaching
Agamdeep Singh, Sujit PB, Mayank Vatsa

Semiparametric Token-Sequence Co-Supervision
Hyunji Lee, Doyoung Kim, Jihoon Jun, Se June Joo, Joel Jang, Kyoung-Woon On, Minjoon Seo

Auto-Q: Automated Domain Questions Generation for Industrial Assets
Christodoulos Constantinides, Vivek Sharma, Shuxin Lin, Nianjun Zhou, Bharathi Chaudhury, Dhaval C Patel