Today we're launching one of our most ambitious agents yet: the AI Music Generator. Unlike our text-focused agents, this one crosses into a completely different creative domain — generating actual audio from a text description.
The result is something genuinely surprising: describe the music you want, watch AI compose a detailed blueprint, and receive a playable audio track — all within a few minutes.
How It Works
The Music Generator uses the same 4-stage pipeline architecture as our other agents, adapted for audio generation:
Three Generation Backends
We deliberately built multi-backend support because no single music AI service is right for everyone:
- HuggingFace MusicGen — Completely free with a HuggingFace token. Returns WAV audio directly from Meta's MusicGen model. Best for experimentation; ~30 requests/day on a free token.
- Replicate MusicGen stereo-large — ~$0.002 per generation. Excellent stereo quality, MP3 output. Best for professional use where audio quality matters.
- ModelsLab — Freemium tier available. Async MP3 generation, good for longer tracks up to 60 seconds.
What We're Proud Of
The Music Generator has two design choices we're particularly proud of. First, the spinning disc animation — the vinyl record rotates while audio plays and pauses when you do. It's a small touch but it transforms the experience from "audio file playing" to "music being performed."
Second, the production tags system — the AI generates 10-15 descriptive tags per track (genre, mood, instrument, era tags) displayed as pills below the player. These tags make each generated track feel like a real music release, not just a file.
Available to all Pro and Business subscribers. Start with a preset (Lo-fi Rain, Epic Cinematic, Jazz Café) or describe exactly the music you want. HuggingFace tokens are free to get.