add small models rock blog by psschwei · Pull Request #36 · generative-computing/mellea-website

psschwei · 2026-05-05T03:00:47Z

/schedule 2026-05-20

ajbozarth

first pass with technical review, will follow with a review of the blog content later

ajbozarth

Here's a handful of items Claude found. I also opened #39 to address the image issues.

planetf1 · 2026-05-07T08:50:06Z

A broader editorial observation, on top of the inline notes above.

Long prose stretches with no visual breaks. "The Bet", the ALoRA explanation, and the "Why intrinsics are cheap to compose" sections each run 3–5 paragraphs of uninterrupted running text. A reader skimming from the link-sharing site of your choice has nothing to hook onto — no bullet list, no pull-quote, no callout. Consider breaking the longer argumentative sections with either (a) a short bulleted summary of the three differentiators, (b) a margin callout or blockquote for the one-line claim that matters, or (c) an extra H3 that lets the reader resume after an interrupt. The Dijkstra passage is strong enough that it could earn a standalone pull-quote.

Long code blocks without internal narration. Quick scan of the 12 fenced Python blocks:

Block	Lines	Inline comments
BOM validator (L155)	19	0
Reformat instruct (L181)	11	0
Async `extract_bom` (L206)	15	0
Catalog load (L232)	11	0
Pricing loop (L332)	57	2
`find_citations` (L405)	8	0
Report generation (L425)	31	0

The 57-line pricing loop is the main offender — a reader has to hold the whole thing in their head to reach the "verdict == "answerable" is the gate" claim the surrounding prose is building to. A few options that would help without gutting the post:

Highlight the key line in a preceding sentence, e.g., "The one line that matters is if verdict == "answerable": — everything above it is ceremony to get that gate into place," then show the block.
Split the pricing loop into a short get_catalog_for(entry) helper + the actual priced extraction, so each block is ~15 lines.
Add a handful of inline # comments on the non-obvious lines (the .get(entry.category) returning None for lumber, the continue after append, the if catalog: fallthrough producing unit_price=None).

Inline comments in tutorial code are an anti-pattern in production but are the right call in a blog post — readers copy the block into a notebook and the comments are their only in-line teacher.

Neither of these is a blocker, just things that would turn this from "good if you read carefully" into "easy to follow on first scroll."

planetf1 · 2026-05-07T08:58:38Z

One more pass, purely editorial — positioning and discoverability asks, not correctness. All are optional polish.

No fast hook for skimmers. The post is ~2,800 words / 14-min read, and the first hands-on code appears at L123. A reader linking in from HN or a social share needs a 30-second on-ramp. Consider a short callout between the excerpt and "The Bet" along these lines:

What this post does: walks through a construction-cost-estimation pipeline that one-shot prompting needs GPT-5-tier models for, rebuilt on a 3B Granite model running locally — same accuracy, no API keys, ~$0/run. If you're paying frontier-model prices for structured extraction or matching, the same pattern applies.

"Harness" is load-bearing but undefined. The word carries most of the argument (L19, L25, L28, L30, L383, L508) but never gets a definition. A reader who hasn't already absorbed Mellea's framing has to infer it. One sentence near first use — e.g., "By 'harness' we mean the software scaffolding around the model call: decomposition, validation, retries, tool dispatch — the part that isn't the forward pass." — makes the rest of the post land harder.

Pain points skew finance-y; the dev concerns are missing. The three differentiators (cost, data sovereignty, vendor-agnostic) hit procurement and regulated-industries buyers well. The pains that devs themselves feel are under-represented:

Latency / rate limits — a frontier API can rate-limit you mid-backtest; local inference doesn't
Observability in production — when a prompt-pipeline fails at 3am, debugging is about which step went wrong; Mellea's decomposition surfaces the failure point
Fine-tuning vs. harness trade-off — the obvious alternative to "better harness + small model" is "fine-tune a small model"; why harness first?

These are one-paragraph each. The "Trade-offs" section at the bottom is a natural home if you don't want to expand the opening argument.

Cost comparison is one-sided. The $1/run vs "no per-token billing" framing is accurate but omits the local side of the ledger: GPU/laptop amortisation, electricity, and the engineering time to build the decomposed pipeline. The "Trade-offs" section admits "decomposition takes engineering effort" but doesn't put a number on it. Even a rough "a senior engineer can port a prompt pipeline in a day or two" would neutralise the "you're hiding the real cost" objection that readers will raise in the comments either way.

Terminology: "intrinsics" vs "adapters". The post uses both terms interchangeably — ten uses of intrinsic(s) (L64, L246, L255, L259, L301, L312, L329, L393, L406, L521) and six uses of adapter(s) (L255, L316, L319, L322, L396, L398), including the mixed phrasing on L319 "Granite intrinsics ship as ALoRA adapters". My understanding is the Mellea/Granite framing has shifted toward adapters as the external-facing term (with intrinsic still in the module path for now). If that's right, it's worth a sweep to standardise — probably adapters everywhere in prose, with one parenthetical acknowledgement that the Python import path uses intrinsic. Also drop intrinsics from the tag list; local-llm would be a sensible addition there:

tags: ["granite", "rag", "adapters", "small-models", "docling", "local-llm"]

(I'll defer to you on whether adapters or intrinsics is the preferred term — the ask is consistency, not a specific choice.)

None of the above is a blocker — the core argument is strong .

planetf1

as per comments (need evaluation - but your interpretation about what should be changed is fine)

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

ajbozarth · 2026-05-13T18:24:18Z

So I tried walking through the code in the blog manually and just kept hitting issue and issue and tinkering with the code to get it to work, but I think a better solution is for you to do it yourself when you next have bandwidth. Since it's your blog you would know best what should be edited and how in order for a user to iteratively copy paste and run the code blocks

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

psschwei · 2026-05-14T00:00:41Z

I think we may have gotten a little off-track here. The goal of this post is not to be a tutorial but to showcase the idea that you can get frontier-level performance from small models by using mellea as the harness. The code is meant to be illustrative, but if folks want to actually run the example, then they should pull up the notebook.

planetf1

Five items not covered by the existing reviews, mostly minor polish.

planetf1 · 2026-05-14T10:59:16Z

+Mellea turns a task that needs a frontier model into one a small model can
+handle through three patterns:
+
+![Decompose, externalize control flow, and modularize capabilities](/images/small-models-rock/three-steps.png)


The alt text here repeats the section heading verbatim. A screen reader gets no useful information about what the diagram actually depicts. Something like "Diagram showing the three patterns: decompose the task into narrow steps, externalize control flow into Python, modularise model capabilities with validated components" would let a reader who can't see the image follow the argument.

the image is three mushrooms, each one representing one of the patterns. there really isn't any argument there to follow

and the section heading is also "the three patterns"

planetf1 · 2026-05-14T10:59:32Z

+
+```python
+from mellea.backends.huggingface import LocalHFBackend
+from mellea.stdlib.components.intrinsic.rag import check_answerability


The import path uses intrinsic (mellea.stdlib.components.intrinsic.rag) while the prose throughout uses adapters. planetf1's earlier note asked for a parenthetical acknowledging the mismatch in the module path — it wasn't added. One sentence is enough: "(The import path still reflects an older internal term; it will align with adapters in a future release.)")"

i'm on the fence about this one. I don't really want to call out implementation details, but it's probably useful to call out more explicitly that check_answerability is an adapter.

planetf1 · 2026-05-14T10:59:40Z

+        return None, None  # adapter isn't confident → unknown, not hallucinated
+
+    unit = extract_unit_price(entry, catalog)          # m.instruct, format=...
+    total = extract_total(unit, entry.quantity)        # m.instruct, format=...


extract_unit_price and extract_total are called here but never shown or linked to the notebook. The note above says the snippets are trimmed for reading, and the inline comments hint at m.instruct, format=..., but a reader adapting this pattern has no idea what Pydantic type to pass as format=. Either name the type (e.g. # m.instruct(..., format=UnitPrice)) or add a sentence before the block pointing at the notebook for the full definitions.

if the user wants to know how to run the code, they should look at the notebook. there is a note about this at the beginning of the code snippets

planetf1 · 2026-05-14T10:59:49Z

+    tool_calls=True,
+    model_options={ModelOption.TOOLS: [MelleaTool.from_callable(local_code_interpreter)]},
+)
+chart.tool_calls["local_code_interpreter"].call_func()


.call_func() appears once with no explanation. A reader can't tell whether this executes the generated code locally, sends it to a subprocess, or is a Mellea API call. One clause resolves it — e.g. "— which executes the generated Python in-process" — or a short inline comment # executes the model-generated code locally.

see previous comment

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

ajbozarth

The code is meant to be illustrative

I had claude re-review with that in mind and here's an updated review:

Re-reading under the framing you laid out (snippets are sketches; the notebook is the runnable artifact) — I think that contract is the right one, and the existing callout above the first code block ("trimmed for reading… full runnable notebook has the install commands, sample data, and the glue between steps") already establishes it. Most of my prior blocker-flavored comments dissolve under that lens. Withdrawing those.

Two things to verify before merge:

The notebook is the load-bearing artifact now. Last commit to notebooks/atai_2026/tutorial.ipynb was 2026-05-06, but the blog has had code edits since (the _bom_is_valid rewrite during review on 2026-05-05, the async block fix on 2026-05-11, etc.). Worth a quick top-to-bottom run of the notebook before merge to make sure the shapes shown in the blog still exist there.
Sketch-internal consistency still matters even without runnability. A reader who can't pip install and try will still notice if a name introduced in one block has a different name two blocks later. The earlier _bom_entries_are_well_formed (plural) → _bom_is_valid rename is fully resolved; nothing else jumps out, but worth one read-through with that lens.

The inline comments below are the other direction: places where the current snippets carry plumbing that was in service of being copy-pasteable, and could simplify now that they don't have to be.

None of these is a blocker. Glad to take "no, the longer form is the lesson" on any of them.

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

ajbozarth

I think this is pretty much ready to publish, other reviewers might find some other nits or improvements, but I see no reason not to move up the publish date on this to sometime next week

psschwei · 2026-05-15T20:36:12Z

I see no reason not to move up the publish date on this to sometime next week

works for me. feel free to do a new review, suggest a new date and commit, and merge as you see fit

ajbozarth

Let's put this out next Wednesday. That will give @planetf1 and any other reviewers a couple days to do final passes and would match up with our planned very other Wednesday-ish release cycle. I'll update the description with an scheduled merge.

psschwei · 2026-05-15T20:49:19Z

/schedule 2026-05-20

ajbozarth · 2026-05-15T20:51:46Z

FYI you don't want to enable auto-merge, that's what the /schedule is for so it doesn't merge until the publish date (otherwise it merges immediately upon approval). Also the /schedule needs to be in the description not a comment to work.

Edit: I looked into catching /schedule in comments and it was non-trivial compared to checking the description content.

psschwei requested review from abrahamdaniels and nrfulton May 5, 2026 03:00

psschwei requested review from a team and ajbozarth as code owners May 5, 2026 03:00

psschwei requested a review from serjikibm May 5, 2026 03:00

ajbozarth requested changes May 5, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

Comment thread content/blogs/small-models-rock.md Outdated

Comment thread content/blogs/small-models-rock.md Outdated

Comment thread next-env.d.ts Outdated

Comment thread package-lock.json Outdated

This was referenced May 5, 2026

fix: parse blog dates as local time to avoid UTC offset #38

Merged

feat: add rehype-raw and prose image styles #39

Merged

ajbozarth requested changes May 5, 2026

View reviewed changes

planetf1 reviewed May 7, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

planetf1 reviewed May 7, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md

planetf1 reviewed May 7, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md

planetf1 reviewed May 7, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

planetf1 reviewed May 7, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

planetf1 reviewed May 7, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

planetf1 reviewed May 7, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

planetf1 reviewed May 7, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

planetf1 requested changes May 7, 2026

View reviewed changes

psschwei added 5 commits May 8, 2026 21:54

add small models rock blog

d9826a3

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

add nathan as coauthor

f694d37

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

lint

bc8b395

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

review comments

46c0dcf

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

updates

2d70fe8

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

psschwei force-pushed the blog-small-models branch from 57e876f to 2d70fe8 Compare May 9, 2026 01:57

psschwei added 3 commits May 8, 2026 21:59

Revert next-env.d.ts auto-regenerated change

711b432

Revert package-lock.json peer flag changes

27bdefb

update

c8b85a8

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

This comment was marked as outdated.

Sign in to view

ajbozarth reviewed May 12, 2026

View reviewed changes

Comment thread src/app/globals.css Outdated

ajbozarth reviewed May 12, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

ajbozarth reviewed May 12, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

reviewer comments

819b9a9

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

ajbozarth reviewed May 12, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

wgetting

b2f017d

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

ajbozarth reviewed May 12, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

fix

4cb2699

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

psschwei added 2 commits May 13, 2026 19:40

updates

3766117

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

fixes

77eb29f

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

planetf1 reviewed May 14, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

updates

ca32fb9

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

ajbozarth reviewed May 14, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

Comment thread content/blogs/small-models-rock.md Outdated

Comment thread content/blogs/small-models-rock.md

Comment thread content/blogs/small-models-rock.md

reviews

bc8661e

Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>

ajbozarth added the blog PRs adding or updating blog posts in content/blogs/ label May 15, 2026

ajbozarth approved these changes May 15, 2026

View reviewed changes

Comment thread content/blogs/small-models-rock.md Outdated

Update content/blogs/small-models-rock.md

3f84f65

psschwei enabled auto-merge May 15, 2026 20:49

ajbozarth disabled auto-merge May 15, 2026 20:50

Conversation

psschwei commented May 5, 2026 • edited by ajbozarth Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

planetf1 commented May 7, 2026

Uh oh!

planetf1 commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ajbozarth commented May 13, 2026

Uh oh!

psschwei commented May 14, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

psschwei commented May 5, 2026 •

edited by ajbozarth

Loading

planetf1 commented May 7, 2026 •

edited

Loading

ajbozarth commented May 15, 2026 •

edited

Loading