Building a RAG Search System for a Japanese Learning App with Next.js and OpenAI

Introduction

Kaiwakai is a Japanese language learning app that helps users practice listening through video content. Each lesson includes a YouTube video, bilingual transcripts, and vocabulary lists organized by JLPT level. While building the app, I faced an interesting challenge: how do you search across different types of content in a way that understands meaning, not just exact words?

For example, when a user searches for "迷信" (superstition), they should find:

The vocabulary entry for that word
Related words like "信じる" (to believe)
Transcript segments discussing superstitions, even if they don't use that exact word
An AI-generated explanation with context

Traditional keyword search couldn't do this. I needed something smarter: a RAG (Retrieval-Augmented Generation) system.

What is RAG?

RAG combines two powerful concepts:

Traditional Search: Looks for exact keyword matches. Fast but literal.

Vector Search: Converts text into numerical representations (embeddings) that capture meaning. Searches by semantic similarity, not just exact words.

RAG: Takes vector search results and feeds them to an LLM (Large Language Model) to generate contextual, intelligent responses.

Think of it like this:

Traditional search is finding a word in a dictionary
Vector search is understanding what concepts are related
RAG is having a knowledgeable teacher explain everything in context

For a language learning app, this is perfect. Users don't just get search results—they get explanations, usage examples, and related concepts.

Architecture: The Big Picture

Here's the complete system architecture:

The system is built on five major layers:

Vercel + Next.js (Deployment & Framework) (1)

Vercel hosts the entire application using Next.js 15. The key here is Next.js API Routes—these are serverless functions that run on Vercel's infrastructure, not in the browser. This is critical for security: our OpenAI API keys never touch the client.

When a user searches, the React component calls /api/search, which runs completely server-side. This means we can safely use API keys, query the database, and call OpenAI without exposing credentials.

Neon Postgres + pgvector (Vector Database) (2)

Neon provides a serverless Postgres database with a generous free tier. The magic ingredient is pgvector—a PostgreSQL extension that adds vector operations to standard SQL.

Our database has three tables, each with a 1536-dimension vector column:

episodes: Video metadata with summary embeddings
transcript_segments: Timestamped transcript text with embeddings
vocabulary: Japanese words with readings, translations, and embeddings

Instead of storing just text, we store mathematical representations of that text's meaning. This enables semantic search using simple SQL queries.

OpenAI API (Embeddings + LLM) (3)

We use two OpenAI models:

text-embedding-3-small ($0.02 per 1M tokens): Converts text into 1536-dimension vectors. We use this for both:

Generating embeddings for all our content (one-time)
Converting user queries into vectors (every search)

GPT-4o-mini ($0.15/1M input, $0.60/1M output): Takes search results and generates clear, contextual explanations in English. It acts as a Japanese language teacher, explaining concepts with examples.

Total cost: ~$1 per year for a realistic usage pattern.

The RAG Pipeline (Intelligence Layer) (4)

This is where the magic happens. Each search goes through five stages:

Query → Embedding: User's search term becomes a vector
Hybrid Search: Search the database using both exact text matching AND vector similarity
Ranking: Prioritize exact matches, then semantic similarity
Context Building: Take top 5 results (vocabulary + transcript segments)
AI Explanation: GPT-4o-mini generates a comprehensive explanation

The hybrid approach was key. Pure vector search sometimes missed exact matches, so we implemented two-stage ranking: exact matches get priority, then semantic similarity fills in the gaps.

React Frontend (User Interface) (5)

The frontend is intentionally simple:

SpotlightSearch component (⌘K to open)
Vocabulary cards with search icons
Real-time rendering using react-markdown

Everything is designed for minimal friction: click a word, get an instant deep-dive.

How It Works: The Search Flow

Let's walk through what happens when a user searches for "迷信":

Step 1: User Input User types "迷信" and hits enter in the SpotlightSearch modal.

Step 2: API Call React component calls: GET /api/search?q=迷信

Step 3: Generate Query Embedding (Server-side) "迷信" → OpenAI API → [0.234, -0.567, 0.891, ... ] (1536 numbers)

Step 4: Hybrid Search (Database) Run two SQL queries in parallel:

Exact match: WHERE word = '迷信' OR text_ja LIKE '%迷信%'
Vector similarity: ORDER BY embedding <=> query_vector (cosine distance)

Combine results with exact matches ranked higher.

Step 5: Results Retrieved Top 5 matches:

Vocabulary: 迷信 (めいしん) - "superstition"
Related words: 信じる, 真偽, 風説
Transcript segment: "そして人はなぜ風雪や言い伝えを信じてしまうのか考えます。"

Step 6: Build Context Format results as text for the LLM: Vocabulary: 迷信 (めいしん) - superstition [N2] Vocabulary: 信じる (しんじる) - to believe [N4] Transcript [02:12]: そして人はなぜ... (And we'll consider why...)

Step 7: AI Explanation Send context to GPT-4o-mini with prompt:

"You are a Japanese language teacher. Explain this concept clearly..."

Step 8: Return Response API returns JSON with:

Original query
AI-generated explanation (markdown)
Vocabulary matches with similarity scores
Transcript segments with timestamps

Step 9: Render React component displays:

Formatted markdown explanation
Clickable vocabulary cards
Clickable timestamps (jumps to video position)

Total time: ~1-2 seconds.

Conclusion

We covered how to build a RAG search system using Vercel + Next.js + Neon + OpenAI. You could see how you can add intelligent, semantic search to a web application.

The key insights:

Hybrid search > pure vector search for most use cases
Next.js API Routes solve the security problem elegantly
pgvector brings vector search to familiar SQL territory
OpenAI embeddings are shockingly cheap at scale

If you're building something similar, start simple: get embeddings working, add basic vector search, then layer in the LLM explanations. The future of search isn't just finding information—it's understanding it.