How I Built an AI Content Pipeline for Every Writing Post
Using Claude Haiku for topic tags, Supabase vector embeddings for related posts, and build-time scripts to turn 18 blog posts into a connected content product
Ingredients
- Claude Code — terminal-based AI for building scripts and components ($200/yr)
- Claude API — Haiku model — for generating topic tags per post (own API key, pay-per-use)
- Supabase — pgvector embeddings for semantic similarity (free tier)
- Next.js — the framework running the site (free)
- Vercel — hosting and deployment (free)
What I Was Trying to Solve
After 18 posts, the Writing section was a flat list. Every post had a title, a subtitle, and a date. Five manual tags — Website, Features, Automation, Headless Linux, Games — were the only way to filter. No related posts. No way to discover connections between builds. If you finished reading about the market briefing bot, there was nothing pointing you toward the Garmin automation that uses the same architecture.
I wanted three things: richer topic tags generated by AI from a proper taxonomy, related post suggestions based on actual content similarity, and verified reading times. All static. All generated at build time. All following the same pattern I’d already established with TL;DR by Goose.
The Foundation That Already Existed
This build didn’t start from zero. Two earlier projects laid the groundwork:
- TL;DR by Goose (March 5) — established the pattern: a TypeScript script reads post content, calls Claude Haiku, writes results to a static JSON file, and a React component reads from that JSON at render time. Zero runtime API calls.
- Vector Embeddings (April 4) — every post already had a semantic embedding stored in Supabase via
generate-embeddings.ts. These embeddings power the site’s vector search. I just needed to query them differently.
The architecture decision was already made. I just needed to follow it.
The Build
Part 1: AI Tag Generation
I built scripts/generate-tags.ts following the exact same pattern as the TL;DR script. It reads each post’s JSX file, strips the markup to extract prose, and sends the content to Claude Haiku with a fixed taxonomy of 11 tags:
AI Tools, Backend, Frontend, Automation, Product Thinking, Game Dev, Data, DevOps, Security, API Design, Linux
Haiku picks 3–5 per post. The prompt is strict: return only a JSON array of strings from the approved list. No explanations, no creativity. The model is a classifier here, not a writer.
18 posts tagged in under a minute. Results saved to app/lib/tags.json.
The script also counts words in each post and computes a reading time at 230 words per minute. The original reading times were manually estimated — some were close, some were off by a few minutes. The automated counts replaced all of them.
🔧 Developer section: tag generation
- Script follows the
generate-tldr.tspattern: manual.env.localparsing, JSX stripping via regex, Claude Haiku API call - First run had 6 failures — Haiku wrapped JSON output in markdown code fences (
```json ... ```). Fixed by stripping fences beforeJSON.parse - Tags are validated against the taxonomy after parsing — any hallucinated tags are filtered out
- Output is
app/lib/tags.jsonkeyed by slug, containing tags array, word count, and computed reading time - Reading time uses the full stripped text (not truncated), while the Haiku prompt gets the first 4,000 characters
Part 2: Related Posts via Vector Similarity
The site already had semantic embeddings for every post stored in Supabase, generated by the generate-embeddings.ts script from the vector search build. Those embeddings use the MiniLM-L6-v2 model — each post is represented as a 384-dimensional vector based on its title, description, and TL;DR summary.
I built scripts/generate-related.ts to query all post embeddings from Supabase, compute cosine similarity between every pair, and pick the top 2 most similar posts for each. The results are written to app/lib/related.json — same static JSON pattern as everything else.
The similarity scores make intuitive sense. The cron ops post is most related to the server alerts post (0.600). The market daily briefing maps to the Garmin recaps (0.532) — both are automated email pipelines on the Alienware. The Gemini Grades post maps to the original site build post (0.675) — they’re literally about the same project.
🔧 Developer section: related posts
- Embeddings are stored as JSON strings in Supabase — script parses them to
number[]arrays before computing similarity - Cosine similarity is computed in pure TypeScript (dot product / product of magnitudes) — no external math library
- Output is
app/lib/related.jsonkeyed by slug, each value is an array of 2 objects withslugandtitle - The script filters to
content_type = 'post'only — static pages and features are excluded from related suggestions
Part 3: The UI Components
Two new components, both following the TLDRBadge pattern: client components that read from static JSON, no API calls, no loading states.
PostTags renders tag pills below the post meta (date and read time). Each pill is a small rounded badge in the site’s forest-green-on-pale-green color scheme. The tags come directly from the post metadata in posts.ts.
RelatedPosts appears at the bottom of each post, above the back navigation. Two cards with the related post’s title and reading time, styled as bordered links that highlight on hover. It reads from related.json and cross-references posts.ts for the reading time.
The Writing index page also got an upgrade: the tag filter pills are now derived dynamically from the posts array instead of being hardcoded. Adding a new tag to any post automatically adds it to the filter bar — no manual list to maintain.
🔧 Developer section: components
- Both components are
"use client"—PostTagsbecause it imports from the posts module,RelatedPostsbecause it reads static JSON WritingFilter.tsxderives tags withArray.from(new Set(posts.flatMap(p => p.tags))).sort()— deduped and alphabetized- All 18 post pages were updated programmatically via a Node script that added imports, inserted
<PostTags>after.post-meta, and<RelatedPosts>before.post-back--bottom - CSS uses existing design tokens:
--forest-palebackgrounds,--foresttext for tags,--ruleborders for related post cards
The Full Pipeline
Here’s what the content pipeline looks like now, from writing a post to deploying it:
- Write the post as a TSX file in
app/writing/[slug]/page.tsx - Add metadata to
app/lib/posts.ts - Run
npx tsx scripts/generate-tldr.ts [slug]— AI summary - Run
npx tsx scripts/generate-tags.ts [slug]— AI topic tags + reading time - Run
npx tsx scripts/generate-embeddings.ts— vector embeddings - Run
npx tsx scripts/generate-related.ts— related post suggestions - Deploy with
vercel --prod
Steps 3–6 are all build-time, all idempotent, all writing to static JSON or Supabase. The live site never touches an API at read time. Every piece of AI-generated content is baked into the build.
What This Unlocks
The Writing section went from a flat blog to a connected content product. Readers can filter by 11 topic tags, see related posts at the bottom of every article, and get accurate reading times. Every post is now enriched with AI-generated metadata that would have taken hours to create manually.
More importantly: all of this data — the TL;DR summaries, the topic tags, the vector embeddings, the related post graph — feeds directly into Ask Goose, the conversational AI assistant coming to the site. When Ask Goose answers a question about what I’ve built, it won’t be searching raw text. It’ll be retrieving semantically similar content from a curated, tagged, summarized knowledge base.
The content pipeline isn’t just a feature. It’s the retrieval layer for everything that comes next.