arxiv:2604.17295

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

Published on Apr 19

· Submitted by

Rui Dai on Apr 24

AMAP-ML

Upvote

Authors:

Yueyang Ding ,

Rui Dai ,

Abstract

A hierarchical time series reasoning dataset and model are introduced to improve LLM understanding of temporal data through visualized patterns and numerical tables.

AI-generated summary

Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoning (TSR) via a four-level taxonomy of increasing cognitive complexity. We introduce HiTSR, a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories. Leveraging HiTSR, we propose LLaTiSA, a strong TSRM that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs). Through a multi-stage curriculum fine-tuning strategy, LLaTiSA achieves superior performance and exhibits robust out-of-distribution generalization across diverse TSR tasks and real-world scenarios. Our code is available at https://github.com/RainingNovember/LLaTiSA.

View arXiv page View PDF GitHub 71 Add to collection

Community

DerryD

Paper author Paper submitter 1 day ago

HITSR - a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories.
LLATISA - a strong Time Series Reasoning Model (TSRM) that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs).

avahal

1 day ago

the dual-view input—global pattern perception from plots and precise point evidence from an index–value table—is the most interesting bit here. grounding numeric evidence to explicit time indices seems to tame numeric hallucinations that usually bite vision-language models on time series. i worry how this holds up with irregular sampling or missing data, like jittered ecg streams; does the index alignment degrade gracefully or require extra preprocessing? btw the arxivlens breakdown (https://arxivlens.com/PaperView/Details/llatisa-towards-difficulty-stratified-time-series-reasoning-from-visual-perception-to-semantics-6860-20bccd57) helped me parse this wiring, it does a nice job unpacking where the cot supervision sits relative to the dual-view grounding.