Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

KaushalB
/
ViTForMusicClassification

Image Classification
Transformers
Safetensors
vit
ViT
music
CV
Model card Files Files and versions
xet
Community
ViTForMusicClassification
1.4 GB
  • 1 contributor
History: 3 commits
KaushalB's picture
KaushalB
This is an implementation of the Google's Vision Transformer large patch 32 that is used for music classification into different genres. The dataset used is the gtzan dataset which has melspectrograms of many songs.
faba7b5 verified over 1 year ago
  • .gitattributes
    1.56 kB
    Upload 12 files over 1 year ago
  • README.md
    167 Bytes
    This is an implementation of the Google's Vision Transformer large patch 32 that is used for music classification into different genres. The dataset used is the gtzan dataset which has melspectrograms of many songs. over 1 year ago
  • all_results.json
    391 Bytes
    Upload 12 files over 1 year ago
  • config.json
    1.1 kB
    Upload 12 files over 1 year ago
  • eval_results.json
    203 Bytes
    Upload 12 files over 1 year ago
  • model.safetensors
    350 MB
    xet
    Upload 12 files over 1 year ago
  • optimizer.pt
    700 MB
    xet
    Upload 12 files over 1 year ago
  • preprocessor_config.json
    578 Bytes
    Upload 12 files over 1 year ago
  • rng_state.pth
    14.2 kB
    xet
    Upload 12 files over 1 year ago
  • scheduler.pt
    1.06 kB
    xet
    Upload 12 files over 1 year ago
  • state.db
    350 MB
    xet
    Upload 12 files over 1 year ago
  • train_results.json
    209 Bytes
    Upload 12 files over 1 year ago
  • trainer_state.json
    44.4 kB
    Upload 12 files over 1 year ago
  • training_args.bin
    4.92 kB
    xet
    Upload 12 files over 1 year ago