SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos
Abstract
SAM-Body4D is a training-free framework that enhances 3D human mesh recovery from videos by ensuring temporal consistency and robustness to occlusions through masklet generation and refinement.
Human Mesh Recovery (HMR) aims to reconstruct 3D human pose and shape from 2D observations and is fundamental to human-centric understanding in real-world scenarios. While recent image-based HMR methods such as SAM 3D Body achieve strong robustness on in-the-wild images, they rely on per-frame inference when applied to videos, leading to temporal inconsistency and degraded performance under occlusions. We address these issues without extra training by leveraging the inherent human continuity in videos. We propose SAM-Body4D, a training-free framework for temporally consistent and occlusion-robust HMR from videos. We first generate identity-consistent masklets using a promptable video segmentation model, then refine them with an Occlusion-Aware module to recover missing regions. The refined masklets guide SAM 3D Body to produce consistent full-body mesh trajectories, while a padding-based parallel strategy enables efficient multi-human inference. Experimental results demonstrate that SAM-Body4D achieves improved temporal stability and robustness in challenging in-the-wild videos, without any retraining. Our code and demo are available at: https://github.com/gaomingqi/sam-body4d.
Community
Code & Gradio Demo: https://github.com/gaomingqi/sam-body4d
See our FULL demo and Gradio Demo video below:
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- HuPrior3R: Incorporating Human Priors for Better 3D Dynamic Reconstruction from Monocular Videos (2025)
- SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting (2025)
- ViSA: 3D-Aware Video Shading for Real-Time Upper-Body Avatar Creation (2025)
- PhySIC: Physically Plausible 3D Human-Scene Interaction and Contact from a Single Image (2025)
- One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfer (2025)
- OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects (2025)
- 3D Scene Prompting for Scene-Consistent Camera-Controllable Video Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper

