Spaces:
Configuration error
Configuration error
| # Demucs Music Source Separation | |
| [](https://opensource.fb.com/support-ukraine) | |
|  | |
|  | |
| **Important:** As I am no longer working at Meta, **this repository is not maintained anymore**. | |
| I've created a fork at [github.com/adefossez/demucs](https://github.com/adefossez/demucs). Note that this project is not actively maintained anymore | |
| and only important bug fixes will be processed on the new repo. Please do not open issues for feature request or if Demucs doesn't work perfectly for your use case :) | |
| This is the 4th release of Demucs (v4), featuring Hybrid Transformer based source separation. | |
| **For the classic Hybrid Demucs (v3):** [Go this commit][demucs_v3]. | |
| If you are experiencing issues and want the old Demucs back, please file an issue, and then you can get back to Demucs v3 with | |
| `git checkout v3`. You can also go [Demucs v2][demucs_v2]. | |
| Demucs is a state-of-the-art music source separation model, currently capable of separating | |
| drums, bass, and vocals from the rest of the accompaniment. | |
| Demucs is based on a U-Net convolutional architecture inspired by [Wave-U-Net][waveunet]. | |
| The v4 version features [Hybrid Transformer Demucs][htdemucs], a hybrid spectrogram/waveform separation model using Transformers. | |
| It is based on [Hybrid Demucs][hybrid_paper] (also provided in this repo), with the innermost layers | |
| replaced by a cross-domain Transformer Encoder. This Transformer uses self-attention within each domain, | |
| and cross-attention across domains. | |
| The model achieves a SDR of 9.00 dB on the MUSDB HQ test set. Moreover, when using sparse attention | |
| kernels to extend its receptive field and per source fine-tuning, we achieve state-of-the-art 9.20 dB of SDR. | |
| Samples are available [on our sample page](https://ai.honu.io/papers/htdemucs/index.html). | |
| Checkout [our paper][htdemucs] for more information. | |
| It has been trained on the [MUSDB HQ][musdb] dataset + an extra training dataset of 800 songs. | |
| This model separates drums, bass and vocals and other stems for any song. | |
| As Hybrid Transformer Demucs is brand new, it is not activated by default, you can activate it in the usual | |
| commands described hereafter with `-n htdemucs_ft`. | |
| The single, non fine-tuned model is provided as `-n htdemucs`, and the retrained baseline | |
| as `-n hdemucs_mmi`. The Sparse Hybrid Transformer model decribed in our paper is not provided as its | |
| requires custom CUDA code that is not ready for release yet. | |
| We are also releasing an experimental 6 sources model, that adds a `guitar` and `piano` source. | |
| Quick testing seems to show okay quality for `guitar`, but a lot of bleeding and artifacts for the `piano` source. | |
| <p align="center"> | |
| <img src="./demucs.png" alt="Schema representing the structure of Hybrid Transformer Demucs, | |
| with a dual U-Net structure, one branch for the temporal domain, | |
| and one branch for the spectral domain. There is a cross-domain Transformer between the Encoders and Decoders." | |
| width="800px"></p> | |
| ## Important news if you are already using Demucs | |
| See the [release notes](./docs/release.md) for more details. | |
| - 22/02/2023: added support for the [SDX 2023 Challenge](https://www.aicrowd.com/challenges/sound-demixing-challenge-2023), | |
| see the dedicated [doc page](./docs/sdx23.md) | |
| - 07/12/2022: Demucs v4 now on PyPI. **htdemucs** model now used by default. Also releasing | |
| a 6 sources models (adding `guitar` and `piano`, although the latter doesn't work so well at the moment). | |
| - 16/11/2022: Added the new **Hybrid Transformer Demucs v4** models. | |
| Adding support for the [torchaudio implementation of HDemucs](https://pytorch.org/audio/stable/tutorials/hybrid_demucs_tutorial.html). | |
| - 30/08/2022: added reproducibility and ablation grids, along with an updated version of the paper. | |
| - 17/08/2022: Releasing v3.0.5: Set split segment length to reduce memory. Compatible with pyTorch 1.12. | |
| - 24/02/2022: Releasing v3.0.4: split into two stems (i.e. karaoke mode). | |
| Export as float32 or int24. | |
| - 17/12/2021: Releasing v3.0.3: bug fixes (thanks @keunwoochoi), memory drastically | |
| reduced on GPU (thanks @famzah) and new multi-core evaluation on CPU (`-j` flag). | |
| - 12/11/2021: Releasing **Demucs v3** with hybrid domain separation. Strong improvements | |
| on all sources. This is the model that won Sony MDX challenge. | |
| - 11/05/2021: Adding support for MusDB-HQ and arbitrary wav set, for the MDX challenge. For more information | |
| on joining the challenge with Demucs see [the Demucs MDX instructions](docs/mdx.md) | |
| ## Comparison with other models | |
| We provide hereafter a summary of the different metrics presented in the paper. | |
| You can also compare Hybrid Demucs (v3), [KUIELAB-MDX-Net][kuielab], [Spleeter][spleeter], Open-Unmix, Demucs (v1), and Conv-Tasnet on one of my favorite | |
| songs on my [soundcloud playlist][soundcloud]. | |
| ### Comparison of accuracy | |
| `Overall SDR` is the mean of the SDR for each of the 4 sources, `MOS Quality` is a rating from 1 to 5 | |
| of the naturalness and absence of artifacts given by human listeners (5 = no artifacts), `MOS Contamination` | |
| is a rating from 1 to 5 with 5 being zero contamination by other sources. We refer the reader to our [paper][hybrid_paper], | |
| for more details. | |
| | Model | Domain | Extra data? | Overall SDR | MOS Quality | MOS Contamination | | |
| |------------------------------|-------------|-------------------|-------------|-------------|-------------------| | |
| | [Wave-U-Net][waveunet] | waveform | no | 3.2 | - | - | | |
| | [Open-Unmix][openunmix] | spectrogram | no | 5.3 | - | - | | |
| | [D3Net][d3net] | spectrogram | no | 6.0 | - | - | | |
| | [Conv-Tasnet][demucs_v2] | waveform | no | 5.7 | - | | | |
| | [Demucs (v2)][demucs_v2] | waveform | no | 6.3 | 2.37 | 2.36 | | |
| | [ResUNetDecouple+][decouple] | spectrogram | no | 6.7 | - | - | | |
| | [KUIELAB-MDX-Net][kuielab] | hybrid | no | 7.5 | **2.86** | 2.55 | | |
| | [Band-Spit RNN][bandsplit] | spectrogram | no | **8.2** | - | - | | |
| | **Hybrid Demucs (v3)** | hybrid | no | 7.7 | **2.83** | **3.04** | | |
| | [MMDenseLSTM][mmdenselstm] | spectrogram | 804 songs | 6.0 | - | - | | |
| | [D3Net][d3net] | spectrogram | 1.5k songs | 6.7 | - | - | | |
| | [Spleeter][spleeter] | spectrogram | 25k songs | 5.9 | - | - | | |
| | [Band-Spit RNN][bandsplit] | spectrogram | 1.7k (mixes only) | **9.0** | - | - | | |
| | **HT Demucs f.t. (v4)** | hybrid | 800 songs | **9.0** | - | - | | |
| ## Requirements | |
| You will need at least Python 3.8. See `requirements_minimal.txt` for requirements for separation only, | |
| and `environment-[cpu|cuda].yml` (or `requirements.txt`) if you want to train a new model. | |
| ### For Windows users | |
| Everytime you see `python3`, replace it with `python.exe`. You should always run commands from the | |
| Anaconda console. | |
| ### For musicians | |
| If you just want to use Demucs to separate tracks, you can install it with | |
| ```bash | |
| python3 -m pip install -U demucs | |
| ``` | |
| For bleeding edge versions, you can install directly from this repo using | |
| ```bash | |
| python3 -m pip install -U git+https://github.com/facebookresearch/demucs#egg=demucs | |
| ``` | |
| Advanced OS support are provided on the following page, **you must read the page for your OS before posting an issues**: | |
| - **If you are using Windows:** [Windows support](docs/windows.md). | |
| - **If you are using macOS:** [macOS support](docs/mac.md). | |
| - **If you are using Linux:** [Linux support](docs/linux.md). | |
| ### For machine learning scientists | |
| If you have anaconda installed, you can run from the root of this repository: | |
| ```bash | |
| conda env update -f environment-cpu.yml # if you don't have GPUs | |
| conda env update -f environment-cuda.yml # if you have GPUs | |
| conda activate demucs | |
| pip install -e . | |
| ``` | |
| This will create a `demucs` environment with all the dependencies installed. | |
| You will also need to install [soundstretch/soundtouch](https://www.surina.net/soundtouch/soundstretch.html): on macOS you can do `brew install sound-touch`, | |
| and on Ubuntu `sudo apt-get install soundstretch`. This is used for the | |
| pitch/tempo augmentation. | |
| ### Running in Docker | |
| Thanks to @xserrat, there is now a Docker image definition ready for using Demucs. This can ensure all libraries are correctly installed without interfering with the host OS. See his repo [Docker Facebook Demucs](https://github.com/xserrat/docker-facebook-demucs) for more information. | |
| ### Running from Colab | |
| I made a Colab to easily separate track with Demucs. Note that | |
| transfer speeds with Colab are a bit slow for large media files, | |
| but it will allow you to use Demucs without installing anything. | |
| [Demucs on Google Colab](https://colab.research.google.com/drive/1dC9nVxk3V_VPjUADsnFu8EiT-xnU1tGH?usp=sharing) | |
| ### Web Demo | |
| Integrated to [Hugging Face Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [](https://huggingface.co/spaces/akhaliq/demucs) | |
| ### Graphical Interface | |
| @CarlGao4 has released a GUI for Demucs: [CarlGao4/Demucs-Gui](https://github.com/CarlGao4/Demucs-Gui). Downloads for Windows and macOS is available [here](https://github.com/CarlGao4/Demucs-Gui/releases). Use [FossHub mirror](https://fosshub.com/Demucs-GUI.html) to speed up your download. | |
| @Anjok07 is providing a self contained GUI in [UVR (Ultimate Vocal Remover)](https://github.com/facebookresearch/demucs/issues/334) that supports Demucs. | |
| ### Other providers | |
| Audiostrip is providing free online separation with Demucs on their website [https://audiostrip.co.uk/](https://audiostrip.co.uk/). | |
| [MVSep](https://mvsep.com/) also provides free online separation, select `Demucs3 model B` for the best quality. | |
| [Neutone](https://neutone.space/) provides a realtime Demucs model in their free VST/AU plugin that can be used in your favorite DAW. | |
| ## Separating tracks | |
| In order to try Demucs, you can just run from any folder (as long as you properly installed it) | |
| ```bash | |
| demucs PATH_TO_AUDIO_FILE_1 [PATH_TO_AUDIO_FILE_2 ...] # for Demucs | |
| # If you used `pip install --user` you might need to replace demucs with python3 -m demucs | |
| python3 -m demucs --mp3 --mp3-bitrate BITRATE PATH_TO_AUDIO_FILE_1 # output files saved as MP3 | |
| # use --mp3-preset to change encoder preset, 2 for best quality, 7 for fastest | |
| # If your filename contain spaces don't forget to quote it !!! | |
| demucs "my music/my favorite track.mp3" | |
| # You can select different models with `-n` mdx_q is the quantized model, smaller but maybe a bit less accurate. | |
| demucs -n mdx_q myfile.mp3 | |
| # If you only want to separate vocals out of an audio, use `--two-stems=vocals` (You can also set to drums or bass) | |
| demucs --two-stems=vocals myfile.mp3 | |
| ``` | |
| If you have a GPU, but you run out of memory, please use `--segment SEGMENT` to reduce length of each split. `SEGMENT` should be changed to a integer describing the length of each segment in seconds. | |
| A segment length of at least 10 is recommended (the bigger the number is, the more memory is required, but quality may increase). Note that the Hybrid Transformer models only support a maximum segment length of 7.8 seconds. | |
| Creating an environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING=1` is also helpful. If this still does not help, please add `-d cpu` to the command line. See the section hereafter for more details on the memory requirements for GPU acceleration. | |
| Separated tracks are stored in the `separated/MODEL_NAME/TRACK_NAME` folder. There you will find four stereo wav files sampled at 44.1 kHz: `drums.wav`, `bass.wav`, | |
| `other.wav`, `vocals.wav` (or `.mp3` if you used the `--mp3` option). | |
| All audio formats supported by `torchaudio` can be processed (i.e. wav, mp3, flac, ogg/vorbis on Linux/macOS, etc.). On Windows, `torchaudio` has limited support, so we rely on `ffmpeg`, which should support pretty much anything. | |
| Audio is resampled on the fly if necessary. | |
| The output will be a wav file encoded as int16. | |
| You can save as float32 wav files with `--float32`, or 24 bits integer wav with `--int24`. | |
| You can pass `--mp3` to save as mp3 instead, and set the bitrate (in kbps) with `--mp3-bitrate` (default is 320). | |
| It can happen that the output would need clipping, in particular due to some separation artifacts. | |
| Demucs will automatically rescale each output stem so as to avoid clipping. This can however break | |
| the relative volume between stems. If instead you prefer hard clipping, pass `--clip-mode clamp`. | |
| You can also try to reduce the volume of the input mixture before feeding it to Demucs. | |
| Other pre-trained models can be selected with the `-n` flag. | |
| The list of pre-trained models is: | |
| - `htdemucs`: first version of Hybrid Transformer Demucs. Trained on MusDB + 800 songs. Default model. | |
| - `htdemucs_ft`: fine-tuned version of `htdemucs`, separation will take 4 times more time | |
| but might be a bit better. Same training set as `htdemucs`. | |
| - `htdemucs_6s`: 6 sources version of `htdemucs`, with `piano` and `guitar` being added as sources. | |
| Note that the `piano` source is not working great at the moment. | |
| - `hdemucs_mmi`: Hybrid Demucs v3, retrained on MusDB + 800 songs. | |
| - `mdx`: trained only on MusDB HQ, winning model on track A at the [MDX][mdx] challenge. | |
| - `mdx_extra`: trained with extra training data (**including MusDB test set**), ranked 2nd on the track B | |
| of the [MDX][mdx] challenge. | |
| - `mdx_q`, `mdx_extra_q`: quantized version of the previous models. Smaller download and storage | |
| but quality can be slightly worse. | |
| - `SIG`: where `SIG` is a single model from the [model zoo](docs/training.md#model-zoo). | |
| The `--two-stems=vocals` option allows separating vocals from the rest of the accompaniment (i.e., karaoke mode). | |
| `vocals` can be changed to any source in the selected model. | |
| This will mix the files after separating the mix fully, so this won't be faster or use less memory. | |
| The `--shifts=SHIFTS` performs multiple predictions with random shifts (a.k.a the *shift trick*) of the input and average them. This makes prediction `SHIFTS` times | |
| slower. Don't use it unless you have a GPU. | |
| The `--overlap` option controls the amount of overlap between prediction windows. Default is 0.25 (i.e. 25%) which is probably fine. | |
| It can probably be reduced to 0.1 to improve a bit speed. | |
| The `-j` flag allow to specify a number of parallel jobs (e.g. `demucs -j 2 myfile.mp3`). | |
| This will multiply by the same amount the RAM used so be careful! | |
| ### Memory requirements for GPU acceleration | |
| If you want to use GPU acceleration, you will need at least 3GB of RAM on your GPU for `demucs`. However, about 7GB of RAM will be required if you use the default arguments. Add `--segment SEGMENT` to change size of each split. If you only have 3GB memory, set SEGMENT to 8 (though quality may be worse if this argument is too small). Creating an environment variable `PYTORCH_NO_CUDA_MEMORY_CACHING=1` can help users with even smaller RAM such as 2GB (I separated a track that is 4 minutes but only 1.5GB is used), but this would make the separation slower. | |
| If you do not have enough memory on your GPU, simply add `-d cpu` to the command line to use the CPU. With Demucs, processing time should be roughly equal to 1.5 times the duration of the track. | |
| ## Calling from another Python program | |
| The main function provides an `opt` parameter as a simple API. You can just pass the parsed command line as this parameter: | |
| ```python | |
| # Assume that your command is `demucs --mp3 --two-stems vocals -n mdx_extra "track with space.mp3"` | |
| # The following codes are same as the command above: | |
| import demucs.separate | |
| demucs.separate.main(["--mp3", "--two-stems", "vocals", "-n", "mdx_extra", "track with space.mp3"]) | |
| # Or like this | |
| import demucs.separate | |
| import shlex | |
| demucs.separate.main(shlex.split('--mp3 --two-stems vocals -n mdx_extra "track with space.mp3"')) | |
| ``` | |
| To use more complicated APIs, see [API docs](docs/api.md) | |
| ## Training Demucs | |
| If you want to train (Hybrid) Demucs, please follow the [training doc](docs/training.md). | |
| ## MDX Challenge reproduction | |
| In order to reproduce the results from the Track A and Track B submissions, checkout the [MDX Hybrid Demucs submission repo][mdx_submission]. | |
| ## How to cite | |
| ``` | |
| @inproceedings{rouard2022hybrid, | |
| title={Hybrid Transformers for Music Source Separation}, | |
| author={Rouard, Simon and Massa, Francisco and D{\'e}fossez, Alexandre}, | |
| booktitle={ICASSP 23}, | |
| year={2023} | |
| } | |
| @inproceedings{defossez2021hybrid, | |
| title={Hybrid Spectrogram and Waveform Source Separation}, | |
| author={D{\'e}fossez, Alexandre}, | |
| booktitle={Proceedings of the ISMIR 2021 Workshop on Music Source Separation}, | |
| year={2021} | |
| } | |
| ``` | |
| ## License | |
| Demucs is released under the MIT license as found in the [LICENSE](LICENSE) file. | |
| [hybrid_paper]: https://arxiv.org/abs/2111.03600 | |
| [waveunet]: https://github.com/f90/Wave-U-Net | |
| [musdb]: https://sigsep.github.io/datasets/musdb.html | |
| [openunmix]: https://github.com/sigsep/open-unmix-pytorch | |
| [mmdenselstm]: https://arxiv.org/abs/1805.02410 | |
| [demucs_v2]: https://github.com/facebookresearch/demucs/tree/v2 | |
| [demucs_v3]: https://github.com/facebookresearch/demucs/tree/v3 | |
| [spleeter]: https://github.com/deezer/spleeter | |
| [soundcloud]: https://soundcloud.com/honualx/sets/source-separation-in-the-waveform-domain | |
| [d3net]: https://arxiv.org/abs/2010.01733 | |
| [mdx]: https://www.aicrowd.com/challenges/music-demixing-challenge-ismir-2021 | |
| [kuielab]: https://github.com/kuielab/mdx-net-submission | |
| [decouple]: https://arxiv.org/abs/2109.05418 | |
| [mdx_submission]: https://github.com/adefossez/mdx21_demucs | |
| [bandsplit]: https://arxiv.org/abs/2209.15174 | |
| [htdemucs]: https://arxiv.org/abs/2211.08553 | |