Tone Touch Music — Sakib Hasan

What it is

Tone Touch Music is a Spotify-shaped streaming app — browse, charts, radio, library, communities, made-for-you — built as a single Cloudflare Worker. Next.js 16 App Router renders the frontend, Payload CMS 3 owns the catalog and admin, and the whole bundle compiles through @opennextjs/cloudflare. The data plane is D1 for the catalog, R2 for audio and artwork, Vectorize for taste and lyric embeddings, Workers AI for in-runtime inference, and a long-context external model routed through AI Gateway for the cases where the in-runtime window isn’t enough. Queues drive the ingest pipeline, Durable Objects host the stateful agents, and there’s no Postgres, no Redis, and no separate inference service to operate.

The catalog

Twenty-one Payload collections, organized around the listening loop:

Group	Collections
Identity	`Users`, `Plans`, `Subscriptions`
Catalog	`Artists`, `Albums`, `Songs`, `Genres`, `Lyrics`, `Media`
Library	`Playlists`, `LibraryItems`, `PlayHistory`, `MadeForYouMixes`
Browse	`BrowseCategories`, `Charts`, `RadioStations`
Social	`Communities`, `Posts`, `Comments`, `Follows`
AI	`AIJobs`

AIJobs is the spine of every generative feature. Every embedding call, every external LLM completion, every cover render writes a row with the type, provider, model, gateway, input, output, prompt and cached token counts, cost, and latency. One ledger for usage, billing attribution, and “why does this bio say the band broke up in 2019” debugging.

Where each AI feature runs

The interesting question isn’t which model — it’s where. Workers AI is cheap, in-region, and bound directly to the worker, but the best Llama on it tops out at a 24k context window. An external long-context model is higher quality but pay-per-token. The split below is driven by context length and quality, not preference. If the prompt fits, it stays on Workers AI. If it doesn’t, it goes through AI Gateway to a long-context model.

Workers AI (in-runtime, no egress)

Feature	Model
Semantic / lyric / multilingual search	`@cf/baai/bge-m3` (1024-d) + Vectorize
Voice search	`@cf/openai/whisper-large-v3-turbo`
Made-for-You, “Because you listened to X”, Smart Radio, mood mixes	Vectorize similarity + `@cf/baai/bge-reranker-base`
Community moderation	`@cf/meta/llama-guard-3-8b`
Image alt-text	`@cf/llava-hf/llava-1.5-7b-hf`
Live-lyric translation	`@cf/meta/m2m100-1.2b`
Auto playlist cover art	`@cf/black-forest-labs/flux-1-schnell`
AI DJ voice	`@cf/deepgram/aura-2-en`

External long-context (via AI Gateway)

Anything that needs more than 24k tokens — artist bios, album reviews, track-by-track notes, the “About this track” sidebar, AI DJ scripts before they go to TTS, thread summaries, weekly listening recaps, “why this is rising” blurbs on the charts page, and Playlist Builder prompt parsing — gets routed through gateway.ai.cloudflare.com/v1/<acct>/tonetouch/.... The Gateway gives caching, rate-limit handling, and per-feature analytics for free, but the lever that makes the economics work is provider-side prompt caching. Artist and album context refreshes nightly, so the same 30k-token block is reused across every “About this track” and review request for that artist until the next cron tick. Cache-hit ratio sits high enough that the external LLM cost per request lands below a typical in-runtime LLM call.

Vectorize

Five 1024-d indexes (256-d for audio fingerprints): tracks, lyrics, user taste, playlists, and fingerprints. The centerpiece is the per-user taste index, a running vector updated as a moving average over listening history. Made-for-You and “Because you listened to X” both reduce to a single similarity query against the tracks index with the user’s taste vector and a small metadata filter (region, explicit-allowed, not-recently-played). One vector primitive, two product surfaces.

Stateful agents — Durable Objects via the Agents SDK

Three DO-backed agents own anything stateful or long-running. The DJ holds the queue, the upcoming-track context, and the running persona. The player UI attaches via WebSocket and gets streaming tokens as the LLM writes the next commentary block; the script then goes to Aura-2 for TTS and gets pre-buffered before the current track ends. The Playlist Agent is per-build-session: it takes a natural-language prompt like “60 minutes of ’90s trip-hop into ambient downtempo,” parses the intent into seeds and constraints, walks the tracks index with the reranker for ordering, and streams the playlist back as it builds. The Discovery Agent is per-user and scheduled — it runs nightly through the Agents SDK’s schedule() to refresh daily mixes and the “new for you” row. SQL state on the DO holds last-seen track IDs so the same drop doesn’t surface two days running.

Durable Objects are the right shape for these because they all need memory across calls. The DJ shouldn’t restart its persona every track. The Playlist Builder shouldn’t lose its constraints mid-build. DOs replace what would otherwise be a Redis-backed session store, and the only thing to operate is the binding.

Ingest pipeline

A track upload kicks off a Workflow that stages the audio in R2, runs Chromaprint in a Container for fingerprint-based dedupe, extracts audio features (BPM, key, genre, mood, energy) with Essentia.js in WASM, embeds the lyrics and a synthesized song description into Vectorize, generates LLaVA alt-text for the cover, enqueues a long-context job for the “About this track” copy, and finally marks the song published. Every step writes an AIJobs row, which means the admin can see exactly where in the pipeline a track is stuck. Embeddings get batched through the INGEST_Q queue at up to 25 per batch so we stay under the Workers AI per-minute rate limits.

Three rules the app refuses to break

The first is that Workers AI doesn’t get called from the request path during ingest. Embedding 1500 tracks at once will trip the 1500–3000 rpm cap, so everything ingest-side goes through INGEST_Q. The second is that cover art is content-addressed in R2 — a render is keyed by sha256(playlistTitle + sortedTrackIds). Edits to a playlist that don’t change membership reuse the existing image, because playlist cover regen is the most expensive per-user AI operation and most “edits” don’t actually need a new image. The third is that per-user crons fan out through a Workflow, not a Worker. Daily Discovery for a million users can’t be one cron handler; the cron tick enqueues users in chunks, and a Workflow consumer paces the per-user Discovery calls inside the Workers AI quota.

Frontend

The route groups under src/app/(frontend)/ are roughly what you’d expect — Discovery (/, /browse, /charts, /recently-added, /made-for-you, /search), Catalog (/artists/[id], /albums/[id], /songs/[id], /playlists/[id], /radio/[id]), Personal (/library, /account), Social (/communities), Commerce (/pricing, /cart), and Auth (/signin, /signup, /forgot-password). The densest AI surface is the player in src/components/player/now-playing-bar.tsx — synced lyrics with on-the-fly translation, the “About this track” sidebar, DJ commentary intros between tracks, and a frequency-waves visualizer wired to the audio element. Everything else hangs off the existing Spotify-shaped IA, which I won’t claim is original but is the layout users already know how to use.

The bits worth remembering

AIJobs as the single ledger means cost attribution, debugging, and rate-limit forensics all read from the same table. Workers AI is the default and external is the exception, gated by context length. Vectorize is the only recommendation primitive — no separate recsys service. Durable Objects own any AI session with memory. And every generative output keys on its input hash, so a re-render is a cache hit, not a re-bill.