|
| 1 | +--- |
| 2 | +title: 'TanStack AI Just Learned to Compose Music' |
| 3 | +published: 2026-04-24 |
| 4 | +excerpt: TanStack AI adds a new generateAudio activity with streaming, plus fal and Gemini Lyria adapters for music, sound effects, text-to-speech, and transcription. One typed API, any provider. |
| 5 | +authors: |
| 6 | + - Alem Tuzlak |
| 7 | +--- |
| 8 | + |
| 9 | + |
| 10 | + |
| 11 | +The AI audio ecosystem is a mess. Gemini's Lyria wants a natural-language prompt and returns raw PCM you have to wrap in a RIFF header yourself. Fal hosts dozens of audio models where one wants `music_length_ms` in milliseconds, the next wants `seconds_total`, and most want plain `duration`. ElevenLabs has its own shape. Whisper has another. Every provider disagrees on whether you get a URL, a base64 blob, or a raw buffer. |
| 12 | + |
| 13 | +If you are shipping an AI product that needs music, sound effects, speech, or transcription, you end up writing the same boring glue code five times. |
| 14 | + |
| 15 | +**TanStack AI just removed that glue.** The latest release lands a full audio stack: a new `generateAudio` activity, streaming support, fal and Gemini Lyria adapters, and framework hooks for React, Solid, Vue, and Svelte. One typed API, any provider. |
| 16 | + |
| 17 | +Here is what shipped and why you should care. |
| 18 | + |
| 19 | +## One activity, any audio model |
| 20 | + |
| 21 | +The new `generateAudio()` activity sits alongside `generateImage`, `generateSpeech`, `generateVideo`, and `generateTranscription` in `@tanstack/ai`. It takes a text prompt, dispatches to whatever adapter you hand it, and returns a `GeneratedAudio` object with exactly one of `url` or `b64Json`. |
| 22 | + |
| 23 | +```typescript |
| 24 | +import { generateAudio } from '@tanstack/ai' |
| 25 | +import { geminiAudio } from '@tanstack/ai-gemini/adapters' |
| 26 | + |
| 27 | +const adapter = geminiAudio('lyria-3-pro-preview') |
| 28 | + |
| 29 | +const result = await generateAudio({ |
| 30 | + adapter, |
| 31 | + prompt: 'A cinematic orchestral piece with a rising string motif', |
| 32 | +}) |
| 33 | + |
| 34 | +// result.audio is { url: string } | { b64Json: string } — exactly one, enforced by the type |
| 35 | +``` |
| 36 | + |
| 37 | +Swap `geminiAudio` for `falAudio` and the exact same call generates music through MiniMax, DiffRhythm, Stable Audio 2.5, or any of the other models in fal's catalog. The adapter translates per-model details (like fal's `music_length_ms` vs `seconds_total` vs `duration` naming) so your app code never sees them. |
| 38 | + |
| 39 | +## Streaming, because audio generation takes seconds |
| 40 | + |
| 41 | +Music and SFX generation is slow. Lyria 3 Pro takes several seconds. Stable Audio takes longer. If you are building a UI, blocking the request the whole time is a bad experience. |
| 42 | + |
| 43 | +`generateAudio` now supports `stream: true`, returning an `AsyncIterable<StreamChunk>` you can pipe straight through `toServerSentEventsResponse()`: |
| 44 | + |
| 45 | +```typescript |
| 46 | +export async function POST(req: Request) { |
| 47 | + const { prompt } = await req.json() |
| 48 | + |
| 49 | + const stream = await generateAudio({ |
| 50 | + adapter: falAudio('fal-ai/minimax-music/v2.6'), |
| 51 | + prompt, |
| 52 | + stream: true, |
| 53 | + }) |
| 54 | + |
| 55 | + return toServerSentEventsResponse(stream) |
| 56 | +} |
| 57 | +``` |
| 58 | + |
| 59 | +The client receives progress events and the final audio over a single SSE connection, the same transport model already used by `generateImage` and `generateVideo`. No new infrastructure, no special-case code paths. |
| 60 | + |
| 61 | +## Framework hooks that feel like the others |
| 62 | + |
| 63 | +Every framework integration gets a new hook matching the existing media-hook shape: |
| 64 | + |
| 65 | +- `@tanstack/ai-react`: `useGenerateAudio` |
| 66 | +- `@tanstack/ai-solid`: `useGenerateAudio` |
| 67 | +- `@tanstack/ai-vue`: `useGenerateAudio` |
| 68 | +- `@tanstack/ai-svelte`: `createGenerateAudio` |
| 69 | + |
| 70 | +The API is identical to `useGenerateImage` and friends: |
| 71 | + |
| 72 | +```tsx |
| 73 | +import { useGenerateAudio } from '@tanstack/ai-react' |
| 74 | + |
| 75 | +function MusicGen() { |
| 76 | + const { generate, result, isLoading, error, stop, reset } = useGenerateAudio({ |
| 77 | + connection, |
| 78 | + }) |
| 79 | + |
| 80 | + return ( |
| 81 | + <> |
| 82 | + <button onClick={() => generate({ prompt: 'Lo-fi hip-hop beat' })}> |
| 83 | + Generate |
| 84 | + </button> |
| 85 | + {isLoading && <button onClick={stop}>Stop</button>} |
| 86 | + {result?.audio.url && <audio src={result.audio.url} controls />} |
| 87 | + </> |
| 88 | + ) |
| 89 | +} |
| 90 | +``` |
| 91 | + |
| 92 | +Both `connection` (SSE) and `fetcher` (plain HTTP) transports are supported, so this works with TanStack Start, Next.js, Remix, or any backend you already have. |
| 93 | + |
| 94 | +## Providers that shipped in this release |
| 95 | + |
| 96 | +**Gemini** gets two new entry points: |
| 97 | + |
| 98 | +- `geminiAudio()` for Lyria 3 Pro and Lyria 3 Clip music generation. Lyria Pro reads duration from the natural-language prompt; Clip is fixed at 30 seconds and returns MP3. |
| 99 | +- A new `gemini-3.1-flash-tts-preview` TTS model with 70+ languages, 200+ audio tags, and multi-speaker dialogue via `multiSpeakerVoiceConfig`. |
| 100 | + |
| 101 | +**Fal** gets three tree-shakeable adapters: |
| 102 | + |
| 103 | +- `falSpeech()` for TTS via `fal-ai/gemini-3.1-flash-tts`, `fal-ai/minimax/speech-2.6-hd`, and the `fal-ai/kokoro/*` family. |
| 104 | +- `falTranscription()` for STT via `fal-ai/whisper`, `fal-ai/wizper`, and `fal-ai/speech-to-text/turbo`. |
| 105 | +- `falAudio()` for music, SFX, and the wider fal catalog: audio-to-audio, voice conversion and cloning, enhancement, separation, isolation, understanding, and merge. |
| 106 | + |
| 107 | +All four follow the tree-shakeable subpath-import pattern, so your bundle only grows by the adapters you actually import. |
| 108 | + |
| 109 | +## Try it |
| 110 | + |
| 111 | +The new activity is live in `@tanstack/ai` and the two provider packages: |
| 112 | + |
| 113 | +```bash |
| 114 | +pnpm add @tanstack/ai @tanstack/ai-fal |
| 115 | +# or |
| 116 | +pnpm add @tanstack/ai @tanstack/ai-gemini |
| 117 | +``` |
| 118 | + |
| 119 | +Then open the [audio generation guide](/ai/docs/media/audio-generation) for the full adapter matrix, or pull the `ts-react-chat` example to see working TTS and transcription tabs plus a `/generations/audio` route covering Lyria and fal side by side. |
| 120 | + |
| 121 | +**Star [TanStack AI on GitHub](https://github.com/TanStack/ai)** if you want to see where this goes next. |
0 commit comments