Skip to content

Client-Side Search for a Hugo Site (No Backend)

The Problem

This site is a Hugo static site deployed to S3. There’s no server to run a search query against. I wanted to be able to search posts by title, tags, and — for novel/manga reviews — the character names in each post.

The standard options are: pay for Algolia, self-host Meilisearch, or add a build-time JSON index and filter it in the browser. With ~1700 posts and no real-time write requirements, the browser option is obviously right.

The Index

Hugo’s output formats let you add a non-HTML output to any page. Adding "JSON" to the home page outputs in config.toml means layouts/index.json gets rendered at build time and served as /index.json.

The template iterates all posts and emits one JSON object per post with title, url, slug, date, description, tags, and characters. The characters field is the interesting one.

My novel and manga review posts have an HTML comment near the top that lists the main characters — something like <!-- 小暮 春斗 -->. The template uses findRE to pull the first matching comment out of RawContent:

{{- $m := findRE `<!--\s*(?:[\p{Han}\p{Hiragana}\p{Katakana}#]|[A-Z]:)[^-\n]*-->` $p.RawContent 1 -}}

The regex is a positive whitelist: the first non-whitespace character inside the comment must be CJK, #, or an X: legend prefix (like K:小暮). This turned out to matter — one post had <!-- TODO get honto link --> before the real character comment, and the original [A-Z] class matched T. Changing to [A-Z]: (uppercase ASCII must be followed by a colon) fixed it. The Python integration test checks this specific post by URL and asserts TODO is absent and マルス is present.

The Filter

static/js/search.js exports a single function, filterPosts(posts, query). It lowercases the query and checks all five fields with Array.some. Results are sorted newest-first and capped at 100.

The function is loaded as a separate file (not inlined) so it can be unit-tested with Node’s built-in test runner without a browser. Thirteen tests cover empty queries, CJK matching, case-insensitive ASCII, sort order, the 100-cap, and invalid inputs.

One thing that came up in review: (p[k] || '').toLowerCase() throws if a field is an object or array (truthy non-strings bypass the || '' fallback). Changed to typeof p[k] === 'string' ? p[k] : ''.

The Page

layouts/page/search.html fetches the index, holds a indexState string ('loading' | 'ready' | 'failed') instead of two booleans, and renders results on each input event. The fetch chain checks r.ok before calling r.json() so a 404 surfaces as “index load failed” rather than a confusing JSON parse error.

Keyboard navigation came up during testing: up/down arrows move through results, Enter follows the active link. The implementation tracks activeEl (a direct DOM reference) so each arrow key touches exactly two elements — the old highlight and the new one — rather than clearing all results on every keypress.

Review Cycles

The PR went through two rounds of Gemini Code Assist review and one Codex pass before merge. Between the three reviews and a /simplify pass, the diff shrank by about 25 lines:

  • Two indexReady/indexFailed booleans → one indexState string
  • A bail(msg) helper collapsed three nearly-identical early-return branches
  • fields.some(...) replaced a chained five-field OR in filterPosts
  • getArticles() was being called twice per keydown event (once in the handler, once inside setActive); collapsed to one query

Codex passed cleanly on the second round. Gemini found two more things — the non-string field guard and the r.ok check — which were fixed TDD-style: failing test first, then the fix.

What I’d Do Differently

The character extraction regex is doing real work but lives in a Go template string, which is hard to test in isolation. The Python integration test covers it end-to-end (build the site, load the JSON, assert on specific posts), but a unit test against raw markdown strings would catch regressions faster. Hugo’s test story for templates is thin enough that the integration test is probably the right tradeoff here.

The search page could also surface which field matched (title vs tag vs character), but that would complicate the result rendering for something that works fine as-is.

Total: one JSON template, one JS file, one Hugo layout page, one test file each. About the right size for the feature.


Implementation, tests, review-response commits, and this post written by Claude Sonnet 4.6 across a single session. The design spec and brainstorming happened in an earlier session; Gemini Code Assist and Codex reviewed the PR and caught the issues described above.

相關文章

  1. Hugo setup notes
  2. Subtitling Cardcaptor Sakura Archive: Three Evenings, Two Pivots
  3. Migrating honto Extraction from gemma3 to gemma4
  4. When Over-Engineering Meets Reality: The Author Database Story
  5. Script for creating New Post