Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
Abstract
Large language models exhibit brittle beliefs under contextual perturbations, which are better measured by structural consistency metrics and addressed through structure-aware training methods.
As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%. Code will be available at https://github.com/zjunlp/belief.
Community
We show that many LLM “beliefs” that look confident collapse under small context changes, and propose Neighbor-Consistency Belief (NCB) and Structure-Aware Training to measure and train models to keep their knowledge stable and robust under such interference.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Fact-Checking with Large Language Models via Probabilistic Certainty and Consistency (2026)
- Red Teaming Large Reasoning Models (2025)
- The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models (2025)
- Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits (2025)
- From Facts to Conclusions : Integrating Deductive Reasoning in Retrieval-Augmented LLMs (2025)
- ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models (2026)
- Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper