DriveTiTok-s-640
DriveTiTok-s-640 is an image tokenizer for driving scene images based on the TiTok architecture. It generates seed tokens from past frames and uses Self-Attention to model temporal differences while integrating global context, enabling efficient compression. The model is trained on driving data collected in Tokyo.
Installation
uv sync --all-extras
Usage
We provide sample code for reconstructing images using DriveTiTok-s-640 and visualizing the reconstructed images. See tutorial.ipynb.
License
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for details.
Acknowledgement
This model was developed as part of the project JPNP20017, which is subsidized by the New Energy and Industrial Technology Development Organization (NEDO), Japan.
The training and inference code is based on ByteDance's https://github.com/bytedance/1d-tokenizer under the Apache-2.0 license. Additionally, the tutorial.ipynb in this repository is derived from https://github.com/bytedance/1d-tokenizer. We acknowledge and appreciate their contribution as the foundation of our development.
