add small models rock blog#36
Conversation
ajbozarth
left a comment
There was a problem hiding this comment.
first pass with technical review, will follow with a review of the blog content later
|
A broader editorial observation, on top of the inline notes above. Long prose stretches with no visual breaks. "The Bet", the ALoRA explanation, and the "Why intrinsics are cheap to compose" sections each run 3–5 paragraphs of uninterrupted running text. A reader skimming from the link-sharing site of your choice has nothing to hook onto — no bullet list, no pull-quote, no callout. Consider breaking the longer argumentative sections with either (a) a short bulleted summary of the three differentiators, (b) a margin callout or blockquote for the one-line claim that matters, or (c) an extra H3 that lets the reader resume after an interrupt. The Dijkstra passage is strong enough that it could earn a standalone pull-quote. Long code blocks without internal narration. Quick scan of the 12 fenced Python blocks:
The 57-line pricing loop is the main offender — a reader has to hold the whole thing in their head to reach the "
Inline comments in tutorial code are an anti-pattern in production but are the right call in a blog post — readers copy the block into a notebook and the comments are their only in-line teacher. Neither of these is a blocker, just things that would turn this from "good if you read carefully" into "easy to follow on first scroll." |
|
One more pass, purely editorial — positioning and discoverability asks, not correctness. All are optional polish. No fast hook for skimmers. The post is ~2,800 words / 14-min read, and the first hands-on code appears at L123. A reader linking in from HN or a social share needs a 30-second on-ramp. Consider a short callout between the excerpt and "The Bet" along these lines:
"Harness" is load-bearing but undefined. The word carries most of the argument (L19, L25, L28, L30, L383, L508) but never gets a definition. A reader who hasn't already absorbed Mellea's framing has to infer it. One sentence near first use — e.g., "By 'harness' we mean the software scaffolding around the model call: decomposition, validation, retries, tool dispatch — the part that isn't the forward pass." — makes the rest of the post land harder. Pain points skew finance-y; the dev concerns are missing. The three differentiators (cost, data sovereignty, vendor-agnostic) hit procurement and regulated-industries buyers well. The pains that devs themselves feel are under-represented:
These are one-paragraph each. The "Trade-offs" section at the bottom is a natural home if you don't want to expand the opening argument. Cost comparison is one-sided. The $1/run vs "no per-token billing" framing is accurate but omits the local side of the ledger: GPU/laptop amortisation, electricity, and the engineering time to build the decomposed pipeline. The "Trade-offs" section admits "decomposition takes engineering effort" but doesn't put a number on it. Even a rough "a senior engineer can port a prompt pipeline in a day or two" would neutralise the "you're hiding the real cost" objection that readers will raise in the comments either way. Terminology: "intrinsics" vs "adapters". The post uses both terms interchangeably — ten uses of intrinsic(s) (L64, L246, L255, L259, L301, L312, L329, L393, L406, L521) and six uses of adapter(s) (L255, L316, L319, L322, L396, L398), including the mixed phrasing on L319 "Granite intrinsics ship as ALoRA adapters". My understanding is the Mellea/Granite framing has shifted toward adapters as the external-facing term (with intrinsic still in the module path for now). If that's right, it's worth a sweep to standardise — probably adapters everywhere in prose, with one parenthetical acknowledgement that the Python import path uses tags: ["granite", "rag", "adapters", "small-models", "docling", "local-llm"](I'll defer to you on whether adapters or intrinsics is the preferred term — the ask is consistency, not a specific choice.) None of the above is a blocker — the core argument is strong . |
planetf1
left a comment
There was a problem hiding this comment.
as per comments (need evaluation - but your interpretation about what should be changed is fine)
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
57e876f to
2d70fe8
Compare
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
|
So I tried walking through the code in the blog manually and just kept hitting issue and issue and tinkering with the code to get it to work, but I think a better solution is for you to do it yourself when you next have bandwidth. Since it's your blog you would know best what should be edited and how in order for a user to iteratively copy paste and run the code blocks |
|
I think we may have gotten a little off-track here. The goal of this post is not to be a tutorial but to showcase the idea that you can get frontier-level performance from small models by using mellea as the harness. The code is meant to be illustrative, but if folks want to actually run the example, then they should pull up the notebook. |
planetf1
left a comment
There was a problem hiding this comment.
Five items not covered by the existing reviews, mostly minor polish.
| Mellea turns a task that needs a frontier model into one a small model can | ||
| handle through three patterns: | ||
|
|
||
|  |
There was a problem hiding this comment.
The alt text here repeats the section heading verbatim. A screen reader gets no useful information about what the diagram actually depicts. Something like "Diagram showing the three patterns: decompose the task into narrow steps, externalize control flow into Python, modularise model capabilities with validated components" would let a reader who can't see the image follow the argument.
There was a problem hiding this comment.
the image is three mushrooms, each one representing one of the patterns. there really isn't any argument there to follow
There was a problem hiding this comment.
and the section heading is also "the three patterns"
|
|
||
| ```python | ||
| from mellea.backends.huggingface import LocalHFBackend | ||
| from mellea.stdlib.components.intrinsic.rag import check_answerability |
There was a problem hiding this comment.
The import path uses intrinsic (mellea.stdlib.components.intrinsic.rag) while the prose throughout uses adapters. planetf1's earlier note asked for a parenthetical acknowledging the mismatch in the module path — it wasn't added. One sentence is enough: "(The import path still reflects an older internal term; it will align with adapters in a future release.)")"
There was a problem hiding this comment.
i'm on the fence about this one. I don't really want to call out implementation details, but it's probably useful to call out more explicitly that check_answerability is an adapter.
| return None, None # adapter isn't confident → unknown, not hallucinated | ||
|
|
||
| unit = extract_unit_price(entry, catalog) # m.instruct, format=... | ||
| total = extract_total(unit, entry.quantity) # m.instruct, format=... |
There was a problem hiding this comment.
extract_unit_price and extract_total are called here but never shown or linked to the notebook. The note above says the snippets are trimmed for reading, and the inline comments hint at m.instruct, format=..., but a reader adapting this pattern has no idea what Pydantic type to pass as format=. Either name the type (e.g. # m.instruct(..., format=UnitPrice)) or add a sentence before the block pointing at the notebook for the full definitions.
There was a problem hiding this comment.
if the user wants to know how to run the code, they should look at the notebook. there is a note about this at the beginning of the code snippets
| tool_calls=True, | ||
| model_options={ModelOption.TOOLS: [MelleaTool.from_callable(local_code_interpreter)]}, | ||
| ) | ||
| chart.tool_calls["local_code_interpreter"].call_func() |
There was a problem hiding this comment.
.call_func() appears once with no explanation. A reader can't tell whether this executes the generated code locally, sends it to a subprocess, or is a Mellea API call. One clause resolves it — e.g. "— which executes the generated Python in-process" — or a short inline comment # executes the model-generated code locally.
ajbozarth
left a comment
There was a problem hiding this comment.
The code is meant to be illustrative
I had claude re-review with that in mind and here's an updated review:
Re-reading under the framing you laid out (snippets are sketches; the notebook is the runnable artifact) — I think that contract is the right one, and the existing callout above the first code block ("trimmed for reading… full runnable notebook has the install commands, sample data, and the glue between steps") already establishes it. Most of my prior blocker-flavored comments dissolve under that lens. Withdrawing those.
Two things to verify before merge:
- The notebook is the load-bearing artifact now. Last commit to
notebooks/atai_2026/tutorial.ipynbwas 2026-05-06, but the blog has had code edits since (the_bom_is_validrewrite during review on 2026-05-05, the async block fix on 2026-05-11, etc.). Worth a quick top-to-bottom run of the notebook before merge to make sure the shapes shown in the blog still exist there. - Sketch-internal consistency still matters even without runnability. A reader who can't
pip installand try will still notice if a name introduced in one block has a different name two blocks later. The earlier_bom_entries_are_well_formed(plural) →_bom_is_validrename is fully resolved; nothing else jumps out, but worth one read-through with that lens.
The inline comments below are the other direction: places where the current snippets carry plumbing that was in service of being copy-pasteable, and could simplify now that they don't have to be.
None of these is a blocker. Glad to take "no, the longer form is the lesson" on any of them.
ajbozarth
left a comment
There was a problem hiding this comment.
I think this is pretty much ready to publish, other reviewers might find some other nits or improvements, but I see no reason not to move up the publish date on this to sometime next week
works for me. feel free to do a new review, suggest a new date and commit, and merge as you see fit |
ajbozarth
left a comment
There was a problem hiding this comment.
Let's put this out next Wednesday. That will give @planetf1 and any other reviewers a couple days to do final passes and would match up with our planned very other Wednesday-ish release cycle. I'll update the description with an scheduled merge.
|
/schedule 2026-05-20 |
|
FYI you don't want to enable auto-merge, that's what the Edit: I looked into catching |
/schedule 2026-05-20