Skip to content

Commit 3cd2b24

Browse files
docs(blog): TanStack AI Just Learned to Compose Music (#854)
* docs(blog): TanStack AI just learned to compose music Announces the new generateAudio activity, streaming support, and fal + Gemini Lyria adapters for music, speech, and transcription. Covers: - One generateAudio() call, any adapter - stream: true -> AsyncIterable<StreamChunk> via toServerSentEventsResponse() - useGenerateAudio (React/Solid/Vue) + createGenerateAudio (Svelte) - geminiAudio(): Lyria 3 Pro/Clip + Gemini 3.1 Flash TTS multi-speaker - falAudio() / falSpeech() / falTranscription() tree-shakeable adapters * ci: apply automated fixes --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
1 parent b3f2fb8 commit 3cd2b24

2 files changed

Lines changed: 121 additions & 0 deletions

File tree

2.67 MB
Loading
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
title: 'TanStack AI Just Learned to Compose Music'
3+
published: 2026-04-24
4+
excerpt: TanStack AI adds a new generateAudio activity with streaming, plus fal and Gemini Lyria adapters for music, sound effects, text-to-speech, and transcription. One typed API, any provider.
5+
authors:
6+
- Alem Tuzlak
7+
---
8+
9+
![TanStack AI Just Learned to Compose Music](/blog-assets/tanstack-ai-audio-generation/header.png)
10+
11+
The AI audio ecosystem is a mess. Gemini's Lyria wants a natural-language prompt and returns raw PCM you have to wrap in a RIFF header yourself. Fal hosts dozens of audio models where one wants `music_length_ms` in milliseconds, the next wants `seconds_total`, and most want plain `duration`. ElevenLabs has its own shape. Whisper has another. Every provider disagrees on whether you get a URL, a base64 blob, or a raw buffer.
12+
13+
If you are shipping an AI product that needs music, sound effects, speech, or transcription, you end up writing the same boring glue code five times.
14+
15+
**TanStack AI just removed that glue.** The latest release lands a full audio stack: a new `generateAudio` activity, streaming support, fal and Gemini Lyria adapters, and framework hooks for React, Solid, Vue, and Svelte. One typed API, any provider.
16+
17+
Here is what shipped and why you should care.
18+
19+
## One activity, any audio model
20+
21+
The new `generateAudio()` activity sits alongside `generateImage`, `generateSpeech`, `generateVideo`, and `generateTranscription` in `@tanstack/ai`. It takes a text prompt, dispatches to whatever adapter you hand it, and returns a `GeneratedAudio` object with exactly one of `url` or `b64Json`.
22+
23+
```typescript
24+
import { generateAudio } from '@tanstack/ai'
25+
import { geminiAudio } from '@tanstack/ai-gemini/adapters'
26+
27+
const adapter = geminiAudio('lyria-3-pro-preview')
28+
29+
const result = await generateAudio({
30+
adapter,
31+
prompt: 'A cinematic orchestral piece with a rising string motif',
32+
})
33+
34+
// result.audio is { url: string } | { b64Json: string } — exactly one, enforced by the type
35+
```
36+
37+
Swap `geminiAudio` for `falAudio` and the exact same call generates music through MiniMax, DiffRhythm, Stable Audio 2.5, or any of the other models in fal's catalog. The adapter translates per-model details (like fal's `music_length_ms` vs `seconds_total` vs `duration` naming) so your app code never sees them.
38+
39+
## Streaming, because audio generation takes seconds
40+
41+
Music and SFX generation is slow. Lyria 3 Pro takes several seconds. Stable Audio takes longer. If you are building a UI, blocking the request the whole time is a bad experience.
42+
43+
`generateAudio` now supports `stream: true`, returning an `AsyncIterable<StreamChunk>` you can pipe straight through `toServerSentEventsResponse()`:
44+
45+
```typescript
46+
export async function POST(req: Request) {
47+
const { prompt } = await req.json()
48+
49+
const stream = await generateAudio({
50+
adapter: falAudio('fal-ai/minimax-music/v2.6'),
51+
prompt,
52+
stream: true,
53+
})
54+
55+
return toServerSentEventsResponse(stream)
56+
}
57+
```
58+
59+
The client receives progress events and the final audio over a single SSE connection, the same transport model already used by `generateImage` and `generateVideo`. No new infrastructure, no special-case code paths.
60+
61+
## Framework hooks that feel like the others
62+
63+
Every framework integration gets a new hook matching the existing media-hook shape:
64+
65+
- `@tanstack/ai-react`: `useGenerateAudio`
66+
- `@tanstack/ai-solid`: `useGenerateAudio`
67+
- `@tanstack/ai-vue`: `useGenerateAudio`
68+
- `@tanstack/ai-svelte`: `createGenerateAudio`
69+
70+
The API is identical to `useGenerateImage` and friends:
71+
72+
```tsx
73+
import { useGenerateAudio } from '@tanstack/ai-react'
74+
75+
function MusicGen() {
76+
const { generate, result, isLoading, error, stop, reset } = useGenerateAudio({
77+
connection,
78+
})
79+
80+
return (
81+
<>
82+
<button onClick={() => generate({ prompt: 'Lo-fi hip-hop beat' })}>
83+
Generate
84+
</button>
85+
{isLoading && <button onClick={stop}>Stop</button>}
86+
{result?.audio.url && <audio src={result.audio.url} controls />}
87+
</>
88+
)
89+
}
90+
```
91+
92+
Both `connection` (SSE) and `fetcher` (plain HTTP) transports are supported, so this works with TanStack Start, Next.js, Remix, or any backend you already have.
93+
94+
## Providers that shipped in this release
95+
96+
**Gemini** gets two new entry points:
97+
98+
- `geminiAudio()` for Lyria 3 Pro and Lyria 3 Clip music generation. Lyria Pro reads duration from the natural-language prompt; Clip is fixed at 30 seconds and returns MP3.
99+
- A new `gemini-3.1-flash-tts-preview` TTS model with 70+ languages, 200+ audio tags, and multi-speaker dialogue via `multiSpeakerVoiceConfig`.
100+
101+
**Fal** gets three tree-shakeable adapters:
102+
103+
- `falSpeech()` for TTS via `fal-ai/gemini-3.1-flash-tts`, `fal-ai/minimax/speech-2.6-hd`, and the `fal-ai/kokoro/*` family.
104+
- `falTranscription()` for STT via `fal-ai/whisper`, `fal-ai/wizper`, and `fal-ai/speech-to-text/turbo`.
105+
- `falAudio()` for music, SFX, and the wider fal catalog: audio-to-audio, voice conversion and cloning, enhancement, separation, isolation, understanding, and merge.
106+
107+
All four follow the tree-shakeable subpath-import pattern, so your bundle only grows by the adapters you actually import.
108+
109+
## Try it
110+
111+
The new activity is live in `@tanstack/ai` and the two provider packages:
112+
113+
```bash
114+
pnpm add @tanstack/ai @tanstack/ai-fal
115+
# or
116+
pnpm add @tanstack/ai @tanstack/ai-gemini
117+
```
118+
119+
Then open the [audio generation guide](/ai/docs/media/audio-generation) for the full adapter matrix, or pull the `ts-react-chat` example to see working TTS and transcription tabs plus a `/generations/audio` route covering Lyria and fal side by side.
120+
121+
**Star [TanStack AI on GitHub](https://github.com/TanStack/ai)** if you want to see where this goes next.

0 commit comments

Comments
 (0)