RAGLens Collection A lightweight hallucination detector that uses sparse autoencoders to identify, explain, and mitigate unfaithful RAG outputs • 4 items • Updated 3 days ago
Benchmarking Retrieval-Augmented Generation for Medicine Paper • 2402.13178 • Published Feb 20, 2024 • 10
Ensuring Safety and Trust: Analyzing the Risks of Large Language Models in Medicine Paper • 2411.14487 • Published Nov 20, 2024
Humans Continue to Outperform Large Language Models in Complex Clinical Decision-Making: A Study with Medical Calculators Paper • 2411.05897 • Published Nov 8, 2024
Demystifying Large Language Models for Medicine: A Primer Paper • 2410.18856 • Published Oct 24, 2024
MedCalc-Bench: Evaluating Large Language Models for Medical Calculations Paper • 2406.12036 • Published Jun 17, 2024 • 1
RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision Paper • 2502.13957 • Published Feb 19, 2025 • 1
Toward Reliable Biomedical Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models Paper • 2505.14599 • Published May 20, 2025 • 2
Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning Paper • 2506.02911 • Published Jun 3, 2025
GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability Paper • 2508.21197 • Published Aug 28, 2025 • 1
Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models Paper • 2601.08058 • Published Jan 12