Papers
arxiv:2604.17295

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

Published on Apr 19
· Submitted by
Rui Dai
on Apr 24
Authors:
,
,
,
,

Abstract

A hierarchical time series reasoning dataset and model are introduced to improve LLM understanding of temporal data through visualized patterns and numerical tables.

AI-generated summary

Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models(TSRMs). To bridge this gap, we formalize Time Series Reasoning (TSR) via a four-level taxonomy of increasing cognitive complexity. We introduce HiTSR, a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories. Leveraging HiTSR, we propose LLaTiSA, a strong TSRM that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs). Through a multi-stage curriculum fine-tuning strategy, LLaTiSA achieves superior performance and exhibits robust out-of-distribution generalization across diverse TSR tasks and real-world scenarios. Our code is available at https://github.com/RainingNovember/LLaTiSA.

Community

Paper author Paper submitter

HITSR - a hierarchical time series reasoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories.
LLATISA - a strong Time Series Reasoning Model (TSRM) that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs).

the dual-view input—global pattern perception from plots and precise point evidence from an index–value table—is the most interesting bit here. grounding numeric evidence to explicit time indices seems to tame numeric hallucinations that usually bite vision-language models on time series. i worry how this holds up with irregular sampling or missing data, like jittered ecg streams; does the index alignment degrade gracefully or require extra preprocessing? btw the arxivlens breakdown (https://arxivlens.com/PaperView/Details/llatisa-towards-difficulty-stratified-time-series-reasoning-from-visual-perception-to-semantics-6860-20bccd57) helped me parse this wiring, it does a nice job unpacking where the cot supervision sits relative to the dual-view grounding.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.17295
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.17295 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.17295 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.