pjajal
/

adaperceiver-v1

model_hub_mixin

pytorch_model_hub_mixin

adaptive-computation

Model card Files Files and versions

adaperceiver-v1 / README.md

pjajal's picture

Update README.md

a71cb1b verified 2 months ago

|

history blame contribute delete

2.73 kB

	---
	tags:
	- model_hub_mixin
	- pytorch_model_hub_mixin
	- vision
	- perceiver
	- adaptive-computation
	license: mit
	datasets:
	- timm/imagenet-12k-wds
	---

	# AdaPerceiver (Logit + Feature Distilled from ViT-H CLIP)

	This repository hosts the logit + feature distilled AdaPerceiver model, introduced in
	“AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens”.

	📄 Paper: https://arxiv.org/abs/2511.18105
	📦 Code: https://github.com/pjajal/AdaPerceiver
	📚 Model Collection: https://huggingface.co/collections/pjajal/adaperceiver-v1

	This model is distilled from [ViT-H CLIP model](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k).

	---

	## Model Description

	AdaPerceiver is a Perceiver-style transformer architecture designed for runtime-adaptive computation.
	A single trained model can dynamically trade off accuracy and compute by adjusting:

	- the number of latent tokens,
	- the effective depth, and
	- the embedding dimension.

	This specific checkpoint corresponds to the logit + feature distilled AdaPerceiver model, trained on ImageNet-12K using a ViT-H teacher. It exposes both:
	- classification logits, and
	- feature representations

	---

	## Training Details

	- Training Data: ImageNet-12K
	- Training Objective: Logit distillation + feature distillation
	- Teacher Model: [ViT-H/14 CLIP model](https://huggingface.co/timm/vit_huge_patch14_clip_224.laion2b_ft_in12k).
	- Architecture: Adaptive Perceiver with block-masked attention and Matryoshka FFNs
	- Adaptivity Axes: Tokens, Depth, Width

	For full training details, see Appendix D of the paper.

	---

	## How to Use

	This model can be loaded using the AdaPerceiver Hub-compatible class.

	```python
	import torch
	from hub.networks.adaperceiver_distill import DistillAdaPerceiver

	model = DistillAdaPerceiver.from_pretrained("pjajal/adaperceiver-v1")

	# forward(
	# x: input image tensor (B, C, H, W)
	# num_tokens: number of latent tokens to process (optional)
	# mat_dim: embedding dimension (optional)
	# depth: early-exit depth (optional)
	# token_grans: block-mask granularities (optional)
	# )
	out = model(
	torch.randn(1, 3, 224, 224),
	num_tokens=256,
	mat_dim=128,
	depth=12,
	)

	print(out.logits.shape, out.features.shape)
	```

	## Reference

	If you use this models please cite the AdaPerceiver paper:

	```bibtex
	@article{jajal2025adaperceiver,
	title={AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens},
	author={Jajal, Purvish and Eliopoulos, Nick John and Chou, Benjamin Shiue-Hal and Thiruvathukal, George K and Lu, Yung-Hsiang and Davis, James C},
	journal={arXiv preprint arXiv:2511.18105},
	year={2025}
	}
	```