The 5-Minute Problem
I’ve been writing book reviews on this blog for a while, and one thing has always been a drag: the manual, repetitive process of creating a new post. For each review, I had to go to honto.jp, copy the title and author, create a new markdown file, fill in the Hugo front matter, and add all the boilerplate. It’s not hard, but it’s a solid 5-10 minutes of tedious clicking and typing. That small friction adds up, and it often became a barrier to me actually sitting down to write.
I’d been thinking about automating it for a while, and with the recent explosion in local LLMs and coding assistants, I figured now was the time to tackle it. This post is a little story about that project—a journey I took with my new AI pair programmers: Gemini, Claude, and a locally-running gemma3:12b model.
The Plan: An AI-Powered Assembly Line
My goal was simple: take a honto.jp URL and have a script spit out a perfectly formatted, pre-filled Hugo post, ready for me to write the actual review.
The project, which I built as a set of Python scripts, ended up having a few key parts:
book_review_helper.py: The core tool that does the heavy lifting of extracting metadata.create_review.py: A wrapper script that orchestrates the whole process from URL to final.mdfile.validate_random_sample.py: A crucial testing framework to keep me honest and measure my progress.
This wasn’t a solo effort. I acted as the architect, but the implementation was a constant back-and-forth with my AI assistants. I’d ask Gemini to sketch out the overall structure of a Python script, then hand the rough code to Claude for refactoring and adding proper error handling. It was a surprisingly effective workflow.
How It Works
The process is a multi-step pipeline that I pieced together.
Step 1: The Magic of Metadata Extraction
This is the heart of the operation. When I feed it a URL, book_review_helper.py kicks off a sequence:
- HTML Caching: First, it checks a local
html_cache/directory. If it’s seen the URL before, it uses the cached version. This was a huge time-saver during development and stopped me from getting rate-limited by honto.jp. - Content Extraction: It uses
BeautifulSoupto parse the HTML and intelligently extracts about 3000 characters of the most relevant text from the main product block. - LLM Inference: Here’s the fun part. The script feeds that text into a locally running
gemma3:12bmodel via Ollama. I worked with Gemini to craft very specific prompts (one for novels, one for manga) that instruct the model to return a clean JSON object with the book’s title, author, illustrator, and tags. - Post-Processing: The raw JSON from the LLM gets cleaned up. For instance, it splits a single author string like “Author Name, Illustrator Name” into separate
authorandillustratorfields.
Step 2: Building the Post
The create_review.py script then takes over:
- It calls the helper to get the metadata.
- In a flash of “let’s use AI for everything,” I had it call the LLM again with a different prompt to romanize the Japanese title into a URL-friendly slug.
- It then runs
hugo newto create the post file from the correct archetype (manga.mdornovel.md). - Finally, it injects all the extracted metadata into the new file.
Voila! A perfectly formatted draft appears, ready for me to write in.
Choosing the Right Brain: Model Selection
Before settling on gemma3:12b, I actually spent some time testing different models. The goal was to find the right balance of speed, accuracy, and cost for this specific task of structured data extraction.
I considered using powerful cloud-based models, but I wanted the script to be fast and free to run, which pointed me toward a local solution with Ollama. I tested a few different models available at the time. While larger models were slightly more capable, they were also slower. Smaller models were fast but often failed to follow the strict JSON output format I needed.
In the end, gemma3:12b hit the sweet spot. It was fast enough to not feel sluggish (about 1-2 seconds per extraction on my machine), consistently returned valid JSON, and was surprisingly accurate at identifying the correct author and illustrator fields once the prompt was tuned. The model comparison was one of the first phases of the project, and getting it right was key to making the whole idea viable.
“But Does It Actually Work?”
I couldn’t just trust a few successful runs. I needed data. So, I built validate_random_sample.py. This script scans my nearly 300 existing book review posts, extracts the “ground truth” metadata from them, and then runs my new script against their honto URLs to compare the results. It calculates an accuracy score and tracks which posts it has tested in a .validation_state.json file.
This validation framework was probably the most important part of the project. It turned my vague sense of “it seems to work” into hard numbers.
The Rocky Road to 84% Accuracy
The first run of the validation script gave me a score of 77.4%. Not bad, but the failures were almost all the same: it couldn’t find the illustrator for light novels.
I dug into the HTML of the failed pages. Boy, was I wrong about how the data was structured. I assumed there would always be a clear “イラスト” (illustrator) label. But honto.jp often just lists the author and illustrator under a single “著者” (author) tag, separated by a comma. My script was seeing this, getting confused, and incorrectly classifying the book as a manga, so it never even looked for an illustrator.
My first fix was to work with Gemini to improve the prompt for the gemma3:12b model, explicitly telling it how to handle that comma-separated case. I re-ran the test. Accuracy jumped to 82.2%. Better!
But the root cause was still the faulty auto-detection in my Python code. The prompt was a patch, not a fix. So, I went back to the code and made the detection logic smarter. If it sees that 著者 Name1, Name2 pattern, it now confidently classifies the book as a novel.
After that change, the accuracy hit 83.7%.
Where It Stands Now
Today, the script is holding steady at 83.7% accuracy across about a quarter of my posts. The path to my 90% goal is clear: tackle the remaining edge cases, like manga with three or more creators, and improve the validation logic to handle things like full-width vs. half-width character differences.
This project was a fascinating experiment in human-AI collaboration. I was the architect, debugger, and project manager. Gemini was my brainstorming partner for prompts and high-level code structure. Claude was my tireless pair programmer, refactoring my messy first drafts into clean, robust Python. It’s a workflow I’m sure I’ll be using again.
Update: Breaking Through 95%
After writing this initial post, I continued iterating with Claude. The breakthrough came from fixing how the script extracts content from HTML pages. By prioritizing the product detail section over promotional banners, the LLM stopped hallucinating author names and accuracy jumped to 95.1%.
I also added automatic slug romanization—the LLM now converts Japanese titles like 痴漢されそうになっているS級美少女... into clean,
20-character URLs like chikan-s-rank-bishou. One less manual step!
The system now reduces what was 5-10 minutes of tedious work down to about 30 seconds—and it’s accurate enough that I trust it for initial drafts. Mission accomplished.
Original post written by Gemini 🤖, updates by me with Claude’s help