Tunes — Building a Personal Music Recommendation System from Scratch

Live Preview实时预览

Your personal driving DJ

你的专属驾驶 DJ

Paste your playlist, hit play, and the AI starts curating. Skip songs it'll learn. Heart songs it'll remember. The longer you drive, the better it gets.

粘贴你的歌单，点击播放，AI 开始为你策展。跳过的歌它会学习，喜欢的歌它会记住。开得越久，推荐越准。

🎶

Dancing in the Kitchen

LANY

"Dreamy indie pop matching your chill, romantic vibe"

♥

◀◀

▶

▶▶

Up Next即将播放

🎧

Save Your Tears

The Weeknd

🎵

Malibu Nights

LANY

🎼

Redbone

Childish Gambino

How It Works工作原理

From playlist to personalized radio

从歌单到个性化电台

No training data, no matrix factorization, no user-item graphs. Just a single LLM call that understands music taste from text.

无需训练数据，无需矩阵分解，无需用户-物品图谱。只需一次 LLM 调用，从文本中理解音乐品味。

1

Paste Your Playlist粘贴你的歌单

Copy-paste from Spotify, Apple Music, or YouTube Music. Raw text — messy formatting is fine. The AI parses song titles and artists from any format.

从 Spotify、Apple Music 或 YouTube Music 复制粘贴。原始文本——格式混乱也没关系。AI 能从任何格式中解析歌名和歌手。

2

AI Extracts Taste ProfileAI 提取品味画像

Gemini analyzes your songs and builds a structured profile: genres, moods, eras, energyLevel. Output is constrained JSON via responseSchema — no parsing surprises.

Gemini 分析你的歌曲，构建结构化画像：genres、moods、eras、energyLevel。通过 responseSchema 约束 JSON 输出——不会有解析意外。

3

Generate Initial Recommendations生成初始推荐

Same API call returns 10 driving-friendly songs. Each includes bilingual reasons (EN/ZH) and a YouTube search query for instant playback.

同一次 API 调用返回 10 首适合驾驶的歌曲。每首附带双语推荐理由（EN/ZH）和 YouTube 搜索关键词，可即时播放。

4

Resolve & Play via YouTube通过 YouTube 解析并播放

Each song's searchQuery is sent to YouTube Data API v3 (filtered to music category). First matching embeddable video is cached and played via IFrame API.

每首歌的 searchQuery 发送到 YouTube Data API v3（过滤音乐类别）。第一个匹配的可嵌入视频被缓存并通过 IFrame API 播放。

5

Learn from Listening Behavior从听歌行为中学习

Every interaction is feedback. Listened ≥50% of the song? Enjoyed. Skipped early? Disliked. Hearted? Loved. This data feeds the next recommendation cycle.

每次互动都是反馈。听了 ≥50%？喜欢。提前跳过？不喜欢。点了红心？很喜欢。这些数据驱动下一轮推荐。

The Feedback Loop反馈循环

How the AI learns you

AI 如何学习你

The core innovation: implicit feedback from listening behavior drives an adaptive recommendation cycle. No ratings, no surveys — just play music and the system gets smarter.

核心创新：通过听歌行为的隐式反馈驱动自适应推荐循环。无需评分，无需问卷——只管听歌，系统自动变聪明。

≥50%

Listen Threshold

0.9

Temperature

10

Songs per Batch

<10

Queue Refill Trigger

// Adaptive prompt — the heart of the recommendation engine
SONGS THEY ENJOYED (listened longer):
  "LANY - Dancing in the Kitchen" (240s/240s, liked)
  "The Weeknd - Save Your Tears" (200s/220s)

SONGS THEY SKIPPED:
  "Sad Indie Song - Unknown" (45s/200s)

INSTRUCTIONS:
  1. Keep the existing taste profile (return it unchanged)
  2. Recommend 10 NEW songs based on:
     - Songs they enjoyed → recommend more like those
     - Songs they skipped → avoid that style/mood
  3. Do NOT repeat any already played songs
    

Key insight: By encoding behavioral signals directly into the prompt text, we turn a stateless LLM into a stateful recommendation engine. No embeddings, no retraining — just natural language describing what the user likes and dislikes.

关键洞察：通过将行为信号直接编码到 prompt 文本中，我们把一个无状态的 LLM 变成了有状态的推荐引擎。无需 embedding，无需重训练——只用自然语言描述用户的喜好和厌恶。

Runtime Architecture运行时架构

How a song flows

一首歌的流转之旅

From the moment the queue runs low to the next track playing — every recommendation travels through this serverless pipeline.

从队列即将耗尽到下一首歌播放——每条推荐都经过这个 serverless 管道。

📋

Feedback反馈

Client records skip/listen/like behavior per song

客户端记录每首歌的跳过/收听/喜欢行为

→

⚡

Vercel Function

Builds adaptive prompt with taste + feedback + played list

构建包含品味画像 + 反馈 + 已播列表的自适应 prompt

→

🧠

Gemini 2.0 Flash

Returns structured JSON: 10 songs + reasons + search queries

返回结构化 JSON：10 首歌 + 推荐理由 + 搜索关键词

→

▶

YouTube API

Resolves search query to embeddable video ID

将搜索关键词解析为可嵌入的视频 ID

→

🔊

IFrame PlayerIFrame 播放器

Plays video, reports duration for feedback

播放视频，上报播放时长用于反馈

Tech Stack技术栈

Built with minimal pieces

用极简组件构建

No ML frameworks, no training pipelines, no recommendation libraries. Just an LLM, a database, and a video player.

没有 ML 框架，没有训练管道，没有推荐库。只需一个 LLM、一个数据库和一个视频播放器。

🧠

Gemini 2.0 Flash

The recommendation engine. Structured JSON output via responseSchema ensures type-safe results every time.

推荐引擎。通过 responseSchema 输出结构化 JSON，确保每次返回类型安全的结果。

AI / Rec Engine

▶

YouTube IFrame API

Free music playback with full event hooks. Reports play state and duration for implicit feedback collection.

免费音乐播放，提供完整的事件钩子。上报播放状态和时长，用于隐式反馈收集。

Playback

🔴

Upstash Redis

Serverless Redis stores user feedback, queue state, and auth data. Per-request pricing, zero cold starts.

Serverless Redis 存储用户反馈、队列状态和认证数据。按请求计费，零冷启动。

Persistence

⚛️

React 19 + TypeScript

Refs for live state (queue, feedback), hooks for UI. Vite-powered with instant HMR.

Refs 管理实时状态（队列、反馈），Hooks 驱动 UI。Vite 构建，即时热更新。

Frontend

▲

Vercel Functions

4 serverless endpoints: recommend, auth, user-data, youtube-search. Auto-scaling, zero config.

4 个 serverless 端点：推荐、认证、用户数据、YouTube 搜索。自动扩缩，零配置。

Backend

🔒

JWT + bcrypt Auth

Lightweight email/password auth with 30-day tokens. No OAuth complexity — just enough for personalization.

轻量级邮箱/密码认证，30 天有效 token。没有 OAuth 的复杂性——刚好满足个性化需求。

Auth

Key Design Decisions关键设计决策

What makes it work

是什么让它有效

The details that turn a basic "ask AI for songs" idea into a system that genuinely improves over a listening session.

这些细节把一个简单的"让 AI 推荐歌"的想法变成了一个在听歌过程中真正越来越好的系统。

📈 Implicit Feedback📈 隐式反馈

No star ratings or thumbs up/down required. Listen duration is the primary signal: ≥50% = enjoyed, <50% = skipped. Hearts are a bonus signal. Friction-free data collection.

无需星级评分或点赞/踩。播放时长是主要信号：≥50% = 喜欢，<50% = 跳过。红心是额外信号。零摩擦的数据收集。

🎯 Structured JSON Output🎯 结构化 JSON 输出

Gemini's responseSchema guarantees type-safe output. No regex parsing, no "please format as JSON" prayers. Schema defines exact shape: song, artist, reason, searchQuery.

Gemini 的 responseSchema 保证类型安全的输出。不需要正则解析，不需要"请按 JSON 格式输出"的祈祷。Schema 精确定义数据结构：歌曲、歌手、理由、搜索关键词。

🔄 Deduplication🔄 去重机制

Every played song is tracked by "Artist - Song" key. The full played list is sent in the prompt as a blocklist. Combined with KNOWN_SONGS constant for the user's existing library.

每首播放过的歌通过"歌手 - 歌名"键追踪。完整的已播列表作为黑名单发送到 prompt 中，结合 KNOWN_SONGS 常量排除用户已有曲库。

📋 Queue Pre-fetching📋 队列预加载

When queue drops below 10, a new Gemini call fires automatically. Users never wait — the next batch is ready before the current one finishes.

当队列低于 10 首时，自动触发新的 Gemini 调用。用户永远不用等——下一批推荐在当前批次播完前就准备好了。

🌐 Bilingual Reasons🌐 双语推荐理由

Every recommendation includes reason (EN) and reasonCn (ZH). Language auto-detected from navigator.language. Taste profile also has bilingual summaries.

每条推荐包含 reason（英文）和 reasonCn（中文）。语言通过 navigator.language 自动检测。品味画像也有双语摘要。

💾 Session Persistence💾 会话持久化

Redis saves the full state: feedback history, queue, liked songs, last Gemini prompt/response. Close the tab, come back tomorrow — pick up exactly where you left off.

Redis 保存完整状态：反馈历史、队列、喜欢的歌、上次 Gemini 的 prompt/响应。关闭标签页，明天回来——从上次中断的地方继续。

Trade-offs权衡取舍

LLM-as-RecSys: pros and cons

LLM-as-RecSys：优势与劣势

Using an LLM as the recommendation engine is unconventional. Here's an honest look at what you gain and what you give up.

用 LLM 作为推荐引擎并不常见。以下是你会得到什么、放弃什么的真实分析。

Dimension	LLM Approach (Tunes)	Traditional RecSys
Cold start	Zero — works from 1 playlist	Needs thousands of users
Training data	None required	Massive interaction logs
Explainability	Natural language reasons per song	Opaque similarity scores
Adaptability	Real-time via prompt context	Requires model retraining
Scalability	~2s latency per batch	Sub-millisecond lookups
Consistency	Non-deterministic (temp=0.9)	Reproducible rankings
Cost	Per-call API pricing	Fixed infra cost at scale
Music knowledge	World knowledge baked in	Only knows what it's trained on

维度	LLM 方案 (Tunes)	传统推荐系统
冷启动	零门槛——一份歌单即可	需要数千用户
训练数据	无需任何数据	海量交互日志
可解释性	每首歌附自然语言理由	不透明的相似度分数
适应性	通过 prompt 上下文实时调整	需要重新训练模型
可扩展性	每批次约 2 秒延迟	亚毫秒级查询
一致性	不确定性 (temp=0.9)	可复现的排序
成本	按调用计费	规模化后固定基础设施成本
音乐知识	内置世界知识	只了解训练数据中的内容

Verdict: For a personal tool with one user, the LLM approach wins hands down. Zero infrastructure, instant cold start, and natural language feedback. The trade-offs (latency, cost, non-determinism) only matter at scale — and this isn't meant to be Spotify.

结论：对于只有一个用户的个人工具，LLM 方案完胜。零基础设施、即时冷启动、自然语言反馈。那些权衡（延迟、成本、不确定性）只在规模化时才重要——而这不是要做 Spotify。

Data Sync Strategy数据同步策略

Never lose a beat

永不丢失一个节拍

Driving means unpredictable connectivity. The sync strategy ensures no feedback is lost, even if you close the tab mid-song.

驾驶意味着不可预测的网络连接。同步策略确保不丢失任何反馈，即使你在歌曲播放中关闭标签页。

⏱ Debounced Save⏱ 防抖保存

After each feedback event, a 2-second debounce timer starts. Multiple rapid skips batch into one Redis write. Efficient without losing data.

每次反馈事件后，启动 2 秒防抖计时器。多次快速跳过合并为一次 Redis 写入。高效且不丢数据。

👁 Visibility Change👁 可见性变化

When you switch tabs or lock your phone, visibilitychange fires an immediate save. Catches the "put phone in pocket" scenario.

切换标签页或锁屏时，visibilitychange 立即触发保存。捕获"手机放进口袋"的场景。

📡 sendBeacon Fallback📡 sendBeacon 兜底

On beforeunload, uses navigator.sendBeacon() to fire a final save. Works even when the page is being destroyed. Token passed in body, not headers.

在 beforeunload 时，使用 navigator.sendBeacon() 发送最后一次保存。即使页面正在销毁也能工作。Token 放在 body 而非 headers 中。

🗃 Cached Queue🗃 缓存队列

Upcoming songs are persisted to Redis. Next session loads instantly from cache — no Gemini call needed until the queue runs out.

待播歌曲持久化到 Redis。下次会话从缓存即时加载——队列用完前无需调用 Gemini。

Taste Profile品味画像

The schema that drives it all

驱动一切的数据结构

The taste profile is the DNA of the recommendation engine. Extracted once from your playlist, then passed unchanged through every cycle.

品味画像是推荐引擎的 DNA。从歌单中提取一次，然后在每个推荐周期中原封不动地传递。

TasteProfile {
  genres:      ["Pop", "R&B", "Indie Pop", "Hip-Hop",
                "Synth-Pop", "Neo-Soul", "C-Pop", "K-Pop"]

  moods:       ["Nostalgic", "Romantic", "Chill",
                "Melancholy", "Uplifting"]

  eras:        ["2000s", "2010s", "2020s"]

  energyLevel: "Medium — mix of chill vibes and upbeat grooves"

  summary:     "Multilingual listener with a strong taste for
                smooth R&B, dreamy indie pop (big LANY fan),
                and polished synth-pop..."

  summaryCn:   "多语言听众，偏爱丝滑 R&B、梦幻独立流行
                （LANY 铁粉）和精致合成器流行..."
}
    

Design choice: The profile is immutable after creation. Feedback steers recommendations through the prompt, but the taste profile itself stays constant. This prevents drift — the AI always remembers your core preferences.

设计选择：画像创建后不可变。反馈通过 prompt 引导推荐方向，但品味画像本身保持不变。这防止了漂移——AI 始终记得你的核心偏好。

A personal music recommendation system from scratch

从 0 到 1 搭建个人音乐推荐系统

Your personal driving DJ

你的专属驾驶 DJ

From playlist to personalized radio

从歌单到个性化电台

Paste Your Playlist粘贴你的歌单

AI Extracts Taste ProfileAI 提取品味画像

Generate Initial Recommendations生成初始推荐

Resolve & Play via YouTube通过 YouTube 解析并播放

Learn from Listening Behavior从听歌行为中学习

How the AI learns you

AI 如何学习你

How a song flows

一首歌的流转之旅

Feedback反馈

Vercel Function

Gemini 2.0 Flash

YouTube API

IFrame PlayerIFrame 播放器

Built with minimal pieces

用极简组件构建

Gemini 2.0 Flash

YouTube IFrame API

Upstash Redis

React 19 + TypeScript

Vercel Functions

JWT + bcrypt Auth

What makes it work

是什么让它有效

📈 Implicit Feedback📈 隐式反馈

🎯 Structured JSON Output🎯 结构化 JSON 输出

🔄 Deduplication🔄 去重机制

📋 Queue Pre-fetching📋 队列预加载

🌐 Bilingual Reasons🌐 双语推荐理由

💾 Session Persistence💾 会话持久化

LLM-as-RecSys: pros and cons

LLM-as-RecSys：优势与劣势

Never lose a beat

永不丢失一个节拍

⏱ Debounced Save⏱ 防抖保存

👁 Visibility Change👁 可见性变化

📡 sendBeacon Fallback📡 sendBeacon 兜底

🗃 Cached Queue🗃 缓存队列

The schema that drives it all

驱动一切的数据结构