Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Paper • 2406.05629 • Published • 8
How to use mhamilton723/DenseAV-language with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("mhamilton723/DenseAV-language", dtype="auto")This model has been pushed to the Hub using the PytorchModelHubMixin integration: