Instructions to use openai/clip-vit-base-patch32 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/clip-vit-base-patch32 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="openai/clip-vit-base-patch32") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoProcessor, AutoModelForZeroShotImageClassification processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32") model = AutoModelForZeroShotImageClassification.from_pretrained("openai/clip-vit-base-patch32") - Notebooks
- Google Colab
- Kaggle
Improve model card: Add Hear-Your-Click context and refined metadata
#63
by nielsr HF Staff - opened
This PR updates the model card for openai/clip-vit-base-patch32. It clarifies that this CLIP model serves as a crucial component (visual encoder) within the "Hear-Your-Click: Interactive Object-Specific Video-to-Audio Generation" framework.
The changes include:
- Retaining the detailed description of the
openai/clip-vit-base-patch32model. - Adding a new section that introduces "Hear-Your-Click", its abstract, a link to its paper (2507.04959), and its GitHub repository (https://github.com/SynapGrid/Hear-Your-Click-2024).
- Updating metadata with
license: mit,library_name: transformers, and confirmingpipeline_tag: zero-shot-image-classification. - Adding additional tags like
clipandvideo-to-audiofor better discoverability and context. - Including the BibTeX citation for the "Hear-Your-Click" paper.
This update provides valuable context for users interested in the applications of this foundational CLIP model.