--- pipeline_tag: text-to-audio library_name: audiocraft language: en tags: - text-to-audio - musicgen - songstarter license: cc-by-nc-4.0 --- # Model Card for musicgen-songstarter-v0.2 [![Replicate demo and cloud API](https://replicate.com/nateraw/musicgen-songstarter-v0.2/badge)](https://replicate.com/nateraw/musicgen-songstarter-v0.2) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/gist/nateraw/0cb4c242b70af10044e9ae73f4617c86/songstarter-v0-2-demo.ipynb) [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/nateraw/singing-songstarter) musicgen-songstarter-v0.2 is a [`musicgen-stereo-melody-large`](https://huggingface.co/facebook/musicgen-stereo-melody-large) fine-tuned on a dataset of melody loops from my Splice sample library. It's intended to be used to generate song ideas that are useful for music producers. It generates stereo audio in 32khz. **👀 Update:** I wrote a [blogpost](https://nateraw.com/posts/training_musicgen_songstarter.html) detailing how and why I trained this model, including training details, the dataset, Weights and Biases logs, etc. Compared to [`musicgen-songstarter-v0.1`](https://huggingface.co/nateraw/musicgen-songstarter-v0.1), this new version: - was trained on 3x more unique, manually-curated samples that I painstakingly purchased on Splice - Is twice the size, bumped up from size `medium` ➡️ `large` transformer LM If you find this model interesting, please consider: - following me on [GitHub](https://github.com/nateraw) - following me on [Twitter](https://twitter.com/_nateraw) ## Usage Install [audiocraft](https://github.com/facebookresearch/audiocraft): ``` pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft ``` Then, you should be able to load this model just like any other musicgen checkpoint here on the Hub: ```python import torchaudio from audiocraft.models import MusicGen from audiocraft.data.audio import audio_write model = MusicGen.get_pretrained('nateraw/musicgen-songstarter-v0.2') model.set_generation_params(duration=8) # generate 8 seconds. wav = model.generate_unconditional(4) # generates 4 unconditional audio samples descriptions = ['acoustic, guitar, melody, trap, d minor, 90 bpm'] * 3 wav = model.generate(descriptions) # generates 3 samples. melody, sr = torchaudio.load('./assets/bach.mp3') # generates using the melody from the given audio and the provided descriptions. wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr) for idx, one_wav in enumerate(wav): # Will save under {idx}.wav, with loudness normalization at -14 db LUFS. audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True) ``` ## Prompt Format Follow the following prompt format: ``` {tag_1}, {tag_2}, ..., {tag_n}, {key}, {bpm} bpm ``` For example: ``` hip hop, soul, piano, chords, jazz, neo jazz, G# minor, 140 bpm ``` For some example tags, [see the prompt format section of musicgen-songstarter-v0.1's readme](https://huggingface.co/nateraw/musicgen-songstarter-v0.1#prompt-format). The tags there are for the smaller v1 dataset, but should give you an idea of what the model saw. ## Samples
Audio Prompt Text Prompt Output
trap, synthesizer, songstarters, dark, G# minor, 140 bpm
acoustic, guitar, melody, trap, D minor, 90 bpm
## Training Details For more verbose details, you can check out the [blogpost](https://nateraw.com/posts/training_musicgen_songstarter.html#training). - **code**: - Repo is [here](https://github.com/nateraw/audiocraft). It's an undocumented fork of [facebookresearch/audiocraft](https://github.com/facebookresearch/audiocraft) where I rewrote the training loop with PyTorch Lightning, which worked a bit better for me. - **data**: - around 1700-1800 samples I manually listened to + purchased via my personal [Splice](https://splice.com) account. About 7-8 hours of audio. - Given the licensing terms, I cannot share the data. - **hardware**: - 8xA100 40GB instance from Lambda Labs - **procedure**: - trained for 10k steps, which took about 6 hours - reduced segment duration at train time to 15 seconds - **hparams/logs**: - See the wandb [run](https://wandb.ai/nateraw/musicgen-songstarter-v0.2/runs/63gh4l7m), which includes training metrics, logs, hardware metrics at train time, hyperparameters, and the exact command I used when I ran the training script. ## Acknowledgements This work would not have been possible without: - [Lambda Labs](https://lambdalabs.com/), for subsidizing larger training runs by providing some compute credits - [Replicate](https://replicate.com/), for early development compute resources Thank you ❤️