LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
Abstract
A hierarchical time series reasoning dataset and model are introduced to improve LLM understanding of temporal data through visualized patterns and numerical tables.
Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoning (TSR) via a four-level taxonomy of increasing cognitive complexity. We introduce HiTSR, a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories. Leveraging HiTSR, we propose LLaTiSA, a strong TSRM that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs). Through a multi-stage curriculum fine-tuning strategy, LLaTiSA achieves superior performance and exhibits robust out-of-distribution generalization across diverse TSR tasks and real-world scenarios. Our code is available at https://github.com/RainingNovember/LLaTiSA.
Community
HITSR - a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories.
LLATISA - a strong Time Series Reasoning Model (TSRM) that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs).
the dual-view input—global pattern perception from plots and precise point evidence from an index–value table—is the most interesting bit here. grounding numeric evidence to explicit time indices seems to tame numeric hallucinations that usually bite vision-language models on time series. i worry how this holds up with irregular sampling or missing data, like jittered ecg streams; does the index alignment degrade gracefully or require extra preprocessing? btw the arxivlens breakdown (https://arxivlens.com/PaperView/Details/llatisa-towards-difficulty-stratified-time-series-reasoning-from-visual-perception-to-semantics-6860-20bccd57) helped me parse this wiring, it does a nice job unpacking where the cot supervision sits relative to the dual-view grounding.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering (2026)
- Thoth: Mid-Training Bridges LLMs to Time Series Understanding (2026)
- VisDoT : Enhancing Visual Reasoning through Human-Like Interpretation Grounding and Decomposition of Thought (2026)
- KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning (2026)
- Seek-and-Solve: Benchmarking MLLMs for Visual Clue-Driven Reasoning in Daily Scenarios (2026)
- Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models (2026)
- Do Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.17295 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper