Papers
arxiv:2512.07328

ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video Generation

Published on Dec 8
ยท Submitted by
Ziyang Mai
on Dec 17
Authors:

Abstract

ContextAnyone uses a diffusion framework with emphasis attention and gap-rope positional embeddings to generate consistent character videos from text and a single reference image.

AI-generated summary

Text-to-video (T2V) generation has advanced rapidly, yet maintaining consistent character identities across scenes remains a major challenge. Existing personalization methods often focus on facial identity but fail to preserve broader contextual cues such as hairstyle, outfit, and body shape, which are critical for visual coherence. We propose ContextAnyone, a context-aware diffusion framework that achieves character-consistent video generation from text and a single reference image. Our method jointly reconstructs the reference image and generates new video frames, enabling the model to fully perceive and utilize reference information. Reference information is effectively integrated into a DiT-based diffusion backbone through a novel Emphasize-Attention module that selectively reinforces reference-aware features and prevents identity drift across frames. A dual-guidance loss combines diffusion and reference reconstruction objectives to enhance appearance fidelity, while the proposed Gap-RoPE positional embedding separates reference and video tokens to stabilize temporal modeling. Experiments demonstrate that ContextAnyone outperforms existing reference-to-video methods in identity consistency and visual quality, generating coherent and context-preserving character videos across diverse motions and scenes. Project page: https://github.com/ziyang1106/ContextAnyone{https://github.com/ziyang1106/ContextAnyone}.

Community

Paper author Paper submitter

ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video Generation

arXiv lens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/contextanyone-context-aware-diffusion-for-character-consistent-text-to-video-generation-8351-8e66cf82

  • Key Findings
  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.07328 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.07328 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.07328 in a Space README.md to link it from this page.

Collections including this paper 1