Skip to content

add small models rock blog#36

Open
psschwei wants to merge 19 commits into
generative-computing:mainfrom
psschwei:blog-small-models
Open

add small models rock blog#36
psschwei wants to merge 19 commits into
generative-computing:mainfrom
psschwei:blog-small-models

Conversation

@psschwei
Copy link
Copy Markdown
Member

@psschwei psschwei commented May 5, 2026

/schedule 2026-05-20

@psschwei psschwei requested review from a team and ajbozarth as code owners May 5, 2026 03:00
@psschwei psschwei requested a review from serjikibm May 5, 2026 03:00
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass with technical review, will follow with a review of the blog content later

Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread next-env.d.ts Outdated
Comment thread package-lock.json Outdated
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a handful of items Claude found. I also opened #39 to address the image issues.

Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md
Comment thread content/blogs/small-models-rock.md
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
@planetf1
Copy link
Copy Markdown
Collaborator

planetf1 commented May 7, 2026

A broader editorial observation, on top of the inline notes above.

Long prose stretches with no visual breaks. "The Bet", the ALoRA explanation, and the "Why intrinsics are cheap to compose" sections each run 3–5 paragraphs of uninterrupted running text. A reader skimming from the link-sharing site of your choice has nothing to hook onto — no bullet list, no pull-quote, no callout. Consider breaking the longer argumentative sections with either (a) a short bulleted summary of the three differentiators, (b) a margin callout or blockquote for the one-line claim that matters, or (c) an extra H3 that lets the reader resume after an interrupt. The Dijkstra passage is strong enough that it could earn a standalone pull-quote.

Long code blocks without internal narration. Quick scan of the 12 fenced Python blocks:

Block Lines Inline comments
BOM validator (L155) 19 0
Reformat instruct (L181) 11 0
Async extract_bom (L206) 15 0
Catalog load (L232) 11 0
Pricing loop (L332) 57 2
find_citations (L405) 8 0
Report generation (L425) 31 0

The 57-line pricing loop is the main offender — a reader has to hold the whole thing in their head to reach the "verdict == "answerable" is the gate" claim the surrounding prose is building to. A few options that would help without gutting the post:

  1. Highlight the key line in a preceding sentence, e.g., "The one line that matters is if verdict == "answerable": — everything above it is ceremony to get that gate into place," then show the block.
  2. Split the pricing loop into a short get_catalog_for(entry) helper + the actual priced extraction, so each block is ~15 lines.
  3. Add a handful of inline # comments on the non-obvious lines (the .get(entry.category) returning None for lumber, the continue after append, the if catalog: fallthrough producing unit_price=None).

Inline comments in tutorial code are an anti-pattern in production but are the right call in a blog post — readers copy the block into a notebook and the comments are their only in-line teacher.

Neither of these is a blocker, just things that would turn this from "good if you read carefully" into "easy to follow on first scroll."

@planetf1
Copy link
Copy Markdown
Collaborator

planetf1 commented May 7, 2026

One more pass, purely editorial — positioning and discoverability asks, not correctness. All are optional polish.

No fast hook for skimmers. The post is ~2,800 words / 14-min read, and the first hands-on code appears at L123. A reader linking in from HN or a social share needs a 30-second on-ramp. Consider a short callout between the excerpt and "The Bet" along these lines:

What this post does: walks through a construction-cost-estimation pipeline that one-shot prompting needs GPT-5-tier models for, rebuilt on a 3B Granite model running locally — same accuracy, no API keys, ~$0/run. If you're paying frontier-model prices for structured extraction or matching, the same pattern applies.

"Harness" is load-bearing but undefined. The word carries most of the argument (L19, L25, L28, L30, L383, L508) but never gets a definition. A reader who hasn't already absorbed Mellea's framing has to infer it. One sentence near first use — e.g., "By 'harness' we mean the software scaffolding around the model call: decomposition, validation, retries, tool dispatch — the part that isn't the forward pass." — makes the rest of the post land harder.

Pain points skew finance-y; the dev concerns are missing. The three differentiators (cost, data sovereignty, vendor-agnostic) hit procurement and regulated-industries buyers well. The pains that devs themselves feel are under-represented:

  • Latency / rate limits — a frontier API can rate-limit you mid-backtest; local inference doesn't
  • Observability in production — when a prompt-pipeline fails at 3am, debugging is about which step went wrong; Mellea's decomposition surfaces the failure point
  • Fine-tuning vs. harness trade-off — the obvious alternative to "better harness + small model" is "fine-tune a small model"; why harness first?

These are one-paragraph each. The "Trade-offs" section at the bottom is a natural home if you don't want to expand the opening argument.

Cost comparison is one-sided. The $1/run vs "no per-token billing" framing is accurate but omits the local side of the ledger: GPU/laptop amortisation, electricity, and the engineering time to build the decomposed pipeline. The "Trade-offs" section admits "decomposition takes engineering effort" but doesn't put a number on it. Even a rough "a senior engineer can port a prompt pipeline in a day or two" would neutralise the "you're hiding the real cost" objection that readers will raise in the comments either way.

Terminology: "intrinsics" vs "adapters". The post uses both terms interchangeably — ten uses of intrinsic(s) (L64, L246, L255, L259, L301, L312, L329, L393, L406, L521) and six uses of adapter(s) (L255, L316, L319, L322, L396, L398), including the mixed phrasing on L319 "Granite intrinsics ship as ALoRA adapters". My understanding is the Mellea/Granite framing has shifted toward adapters as the external-facing term (with intrinsic still in the module path for now). If that's right, it's worth a sweep to standardise — probably adapters everywhere in prose, with one parenthetical acknowledgement that the Python import path uses intrinsic. Also drop intrinsics from the tag list; local-llm would be a sensible addition there:

tags: ["granite", "rag", "adapters", "small-models", "docling", "local-llm"]

(I'll defer to you on whether adapters or intrinsics is the preferred term — the ask is consistency, not a specific choice.)


None of the above is a blocker — the core argument is strong .

Copy link
Copy Markdown
Collaborator

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per comments (need evaluation - but your interpretation about what should be changed is fine)

psschwei added 5 commits May 8, 2026 21:54
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
@psschwei psschwei force-pushed the blog-small-models branch from 57e876f to 2d70fe8 Compare May 9, 2026 01:57
ajbozarth

This comment was marked as outdated.

Comment thread src/app/globals.css Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Comment thread content/blogs/small-models-rock.md Outdated
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Comment thread content/blogs/small-models-rock.md Outdated
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
@ajbozarth
Copy link
Copy Markdown
Contributor

So I tried walking through the code in the blog manually and just kept hitting issue and issue and tinkering with the code to get it to work, but I think a better solution is for you to do it yourself when you next have bandwidth. Since it's your blog you would know best what should be edited and how in order for a user to iteratively copy paste and run the code blocks

psschwei added 2 commits May 13, 2026 19:40
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
@psschwei
Copy link
Copy Markdown
Member Author

I think we may have gotten a little off-track here. The goal of this post is not to be a tutorial but to showcase the idea that you can get frontier-level performance from small models by using mellea as the harness. The code is meant to be illustrative, but if folks want to actually run the example, then they should pull up the notebook.

Copy link
Copy Markdown
Collaborator

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Five items not covered by the existing reviews, mostly minor polish.

Mellea turns a task that needs a frontier model into one a small model can
handle through three patterns:

![Decompose, externalize control flow, and modularize capabilities](/images/small-models-rock/three-steps.png)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alt text here repeats the section heading verbatim. A screen reader gets no useful information about what the diagram actually depicts. Something like "Diagram showing the three patterns: decompose the task into narrow steps, externalize control flow into Python, modularise model capabilities with validated components" would let a reader who can't see the image follow the argument.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the image is three mushrooms, each one representing one of the patterns. there really isn't any argument there to follow

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the section heading is also "the three patterns"


```python
from mellea.backends.huggingface import LocalHFBackend
from mellea.stdlib.components.intrinsic.rag import check_answerability
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import path uses intrinsic (mellea.stdlib.components.intrinsic.rag) while the prose throughout uses adapters. planetf1's earlier note asked for a parenthetical acknowledging the mismatch in the module path — it wasn't added. One sentence is enough: "(The import path still reflects an older internal term; it will align with adapters in a future release.)")"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm on the fence about this one. I don't really want to call out implementation details, but it's probably useful to call out more explicitly that check_answerability is an adapter.

return None, None # adapter isn't confident → unknown, not hallucinated

unit = extract_unit_price(entry, catalog) # m.instruct, format=...
total = extract_total(unit, entry.quantity) # m.instruct, format=...
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_unit_price and extract_total are called here but never shown or linked to the notebook. The note above says the snippets are trimmed for reading, and the inline comments hint at m.instruct, format=..., but a reader adapting this pattern has no idea what Pydantic type to pass as format=. Either name the type (e.g. # m.instruct(..., format=UnitPrice)) or add a sentence before the block pointing at the notebook for the full definitions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the user wants to know how to run the code, they should look at the notebook. there is a note about this at the beginning of the code snippets

tool_calls=True,
model_options={ModelOption.TOOLS: [MelleaTool.from_callable(local_code_interpreter)]},
)
chart.tool_calls["local_code_interpreter"].call_func()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.call_func() appears once with no explanation. A reader can't tell whether this executes the generated code locally, sends it to a subprocess, or is a Mellea API call. One clause resolves it — e.g. "— which executes the generated Python in-process" — or a short inline comment # executes the model-generated code locally.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see previous comment

Comment thread content/blogs/small-models-rock.md Outdated
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is meant to be illustrative

I had claude re-review with that in mind and here's an updated review:

Re-reading under the framing you laid out (snippets are sketches; the notebook is the runnable artifact) — I think that contract is the right one, and the existing callout above the first code block ("trimmed for reading… full runnable notebook has the install commands, sample data, and the glue between steps") already establishes it. Most of my prior blocker-flavored comments dissolve under that lens. Withdrawing those.

Two things to verify before merge:

  1. The notebook is the load-bearing artifact now. Last commit to notebooks/atai_2026/tutorial.ipynb was 2026-05-06, but the blog has had code edits since (the _bom_is_valid rewrite during review on 2026-05-05, the async block fix on 2026-05-11, etc.). Worth a quick top-to-bottom run of the notebook before merge to make sure the shapes shown in the blog still exist there.
  2. Sketch-internal consistency still matters even without runnability. A reader who can't pip install and try will still notice if a name introduced in one block has a different name two blocks later. The earlier _bom_entries_are_well_formed (plural) → _bom_is_valid rename is fully resolved; nothing else jumps out, but worth one read-through with that lens.

The inline comments below are the other direction: places where the current snippets carry plumbing that was in service of being copy-pasteable, and could simplify now that they don't have to be.

None of these is a blocker. Glad to take "no, the longer form is the lesson" on any of them.

Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md Outdated
Comment thread content/blogs/small-models-rock.md
Comment thread content/blogs/small-models-rock.md
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
@ajbozarth ajbozarth added the blog PRs adding or updating blog posts in content/blogs/ label May 15, 2026
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty much ready to publish, other reviewers might find some other nits or improvements, but I see no reason not to move up the publish date on this to sometime next week

@psschwei
Copy link
Copy Markdown
Member Author

I see no reason not to move up the publish date on this to sometime next week

works for me. feel free to do a new review, suggest a new date and commit, and merge as you see fit

Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put this out next Wednesday. That will give @planetf1 and any other reviewers a couple days to do final passes and would match up with our planned very other Wednesday-ish release cycle. I'll update the description with an scheduled merge.

Comment thread content/blogs/small-models-rock.md Outdated
@psschwei
Copy link
Copy Markdown
Member Author

/schedule 2026-05-20

@psschwei psschwei enabled auto-merge May 15, 2026 20:49
@ajbozarth ajbozarth disabled auto-merge May 15, 2026 20:50
@ajbozarth
Copy link
Copy Markdown
Contributor

ajbozarth commented May 15, 2026

FYI you don't want to enable auto-merge, that's what the /schedule is for so it doesn't merge until the publish date (otherwise it merges immediately upon approval). Also the /schedule needs to be in the description not a comment to work.

Edit: I looked into catching /schedule in comments and it was non-trivial compared to checking the description content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blog PRs adding or updating blog posts in content/blogs/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants