Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 136 additions & 0 deletions posts/pablocalofatti/3.better-together.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
title: 'The Memory and the Muscle: When AI Agents Get Both'
published: false
description: 'I wrote about building bicycles for AI agents. Then I actually built two. Here is what happens when persistent memory meets parallel execution.'
tags: 'ai, opensource, developer-tools, productivity'
cover_image: ./assets/better-together-cover.png
---

A few months ago I wrote a post called *Bicycles Are All Your AI Agents Need*. The thesis was simple: the best way to make AI agents more capable isn't to wait for smarter models. Give them better tools. Bicycles. Force multipliers that turn a walking human into something five times faster.

People seemed to like it. Some folks DM'd me asking what tools I meant. Others asked if I was just philosophizing.

Fair question.

So I built two.

## The Two Bottlenecks Nobody Talks About

If you've spent serious time with AI coding agents (not toy demos, but real multi-file, multi-day projects) you've hit two walls:

**Wall 1: Amnesia.** Your agent forgets everything between sessions. That architectural decision you spent 40 minutes discussing? Gone after a context compaction. The workaround for that weird API behavior? Lost. You end up re-explaining the same things, re-debugging the same bugs, watching the agent confidently re-introduce the exact mistake you fixed yesterday.

**Wall 2: Serialization.** Your agent does one thing at a time. Need to scaffold three microservices? It writes one, then the next, then the next. Need five independent bug fixes? One by one. You're sitting there watching a machine that could parallelize... not parallelize.

These aren't minor inconveniences. They're fundamental capability gaps. An amnesiac agent is an agent that never learns. A sequential agent is an agent that wastes the one thing you can't get back: your time.

I decided to fix both.

## cortexmem: The Memory

[cortexmem](https://github.com/pablocalofatti/cortexmem) is persistent memory for AI coding agents. It's a single Rust binary (no servers, no API keys, no cloud dependencies) that gives your agent a memory system modeled after how human memory actually works.

When your agent discovers something important (an architectural decision, a bug fix, a pattern, a gotcha) it saves an observation. Those observations go through a lifecycle: **buffer** (just saved, unproven), **working** (accessed again, probably useful), and **core** (frequently referenced, institutional knowledge). Observations that nobody ever looks at again naturally decay and archive. The important stuff floats to the top.

The search is hybrid: FTS5 keyword matching for when you know what you're looking for, vector similarity for when you don't, and Reciprocal Rank Fusion to blend them together. The agent asks "have I seen this before?" and gets ranked, relevant context in milliseconds.

It connects to any MCP-compatible agent: Claude Code, Cursor, Windsurf, Gemini CLI, you name it. One `cortexmem setup` and it's wired in.

I wrote a [full deep-dive on cortexmem here](https://dev.to/pablocalofatti) if you want the architecture details. But the important thing for this post is what it represents: **an agent that remembers**.

## minion-toolkit: The Muscle

[minion-toolkit](https://github.com/pablocalofatti/minion-toolkit) is a parallel task orchestrator for Claude Code. You give it a list of tasks, and it spawns isolated workers (each in its own git worktree, its own branch, its own sandbox) to build them simultaneously.

The core idea is what I call the **blueprint pattern**: deterministic guardrails wrapping agentic work. Each worker follows a strict sequence (branch, gather context, implement, lint, test, commit) but the *implementation* step inside that sequence is fully agentic. The worker can explore, search, reason, make decisions. It just can't skip the lint check or forget to run the tests.

Think of it as "one brain, many hands." The orchestrator is the brain: it parses tasks, resolves dependencies, discovers domain-specific agents (got a `cloudx-backend` agent? it'll route backend tasks there automatically), and manages the lifecycle. The workers are the hands: focused, isolated, following the blueprint.

It supports workflows (plan-implement-review phases), dependency chains between tasks, dry-run mode, cost tracking, and automatic conflict detection. It's pure markdown, zero code, zero dependencies, just prompt engineering that works.

I wrote about [minion-toolkit in detail here](https://dev.to/pablocalofatti) too. But again, for this post, what matters is the capability: **an agent that parallelizes**.

## What Happens When You Give an Agent Both

Here's where it gets interesting. cortexmem and minion-toolkit were designed independently, but they compose in ways that I didn't fully anticipate until I started using them together on real projects. The combination is more than the sum of its parts.

### Workers That Remember

A minion worker, by default, is stateless. It spawns, builds its thing, reports back, and dies. But when cortexmem is available, that worker can search for past decisions before implementing.

Say you're building a new API endpoint and a worker gets assigned the task. Before writing a single line, it queries cortexmem: *"What patterns do we use for validation in this project? Have there been any gotchas with the ORM?"* And cortexmem returns the observation from three weeks ago where you debugged a timezone issue in the date serializer, or the decision to use Zod schemas at the controller boundary.

The worker doesn't just follow the task description. It implements with institutional context. "Have we solved this before?" becomes automatic.

### The Orchestrator Learns From Failure

This one emerged from pain. I ran a minion session that hit a git corruption issue: a worktree operation left behind a spurious ref file with a space in the name. It took me twenty minutes to diagnose. So I saved an observation in cortexmem:

```text
topic: minion/git-corruption
type: bug_fix
content: "Worktree operations can create spurious ref files
(e.g., 'feat/phase-d-tier2 2' with space). Delete manually
from .git/refs/heads/. Symptom: git checkout fails with
ambiguous ref error."
```

Next time minion ran and hit a similar git error, the orchestrator's context already included that observation. It didn't spend twenty minutes diagnosing. It recognized the pattern and told me exactly what to clean up.

Past failures, gotchas, and workarounds accumulate in cortexmem. Each minion run starts slightly smarter than the last.

### Cross-Session Recovery

This is the one that convinced me the combination was more than a convenience. AI agents have context windows. Context windows get compacted. When a long minion run hits compaction mid-flight, the agent loses track of which tasks completed, which failed, and why.

cortexmem's session tracking was designed exactly for this. The orchestrator saves workflow state as observations with structured topic keys (`workflow/tdd/state`, `workflow/tdd/task-3/result`) so when a session resumes or a new session picks up where the last one left off, all the state is recoverable. The memory persists even when the context window doesn't.

I've had minion runs that spanned three sessions due to compaction. Without cortexmem, each restart would have been a cold start: re-reading task files, guessing which branches were already merged, re-running checks on completed work. With cortexmem, the agent picks up exactly where it left off.

### Team Knowledge That Compounds

Here's the long game. When multiple developers on a team use minion+cortexmem, and the memory syncs (cortexmem supports git-based sync), the knowledge base grows beyond any one person's experience.

Developer A debugs a CORS issue with the staging proxy. Saves an observation. Developer B's agent, running a minion task that touches the API gateway two weeks later, gets that context automatically. Nobody had to write a wiki page. Nobody had to remember to mention it in standup. The knowledge just *was there* when it was needed.

This is what institutional knowledge looks like when it's machine-accessible.

## A Real Example: The CloudX Audit Log Sprint

Let me make this concrete. Last month I was building an audit log system for a project called CloudX. I had a task file with five tasks: database migration, event capture service, API endpoints, UI table component, and E2E tests.

I kicked off `/minion tasks.md` and four workers spun up in parallel (task 5 depended on 1-4, so it waited). Workers 1 and 2 finished clean. Worker 3 hit a type error in the DTO validation, a known issue where NestJS class-validator decorators need explicit `@Type()` annotations for nested objects.

Here's the thing: cortexmem already had an observation about that exact pattern from a previous sprint. But Worker 3 was a fresh agent, no prior context. Without cortexmem, it would have spent cycles diagnosing the issue from scratch. With cortexmem, the blueprint's context-gathering step found the relevant observation, and the worker applied the fix pattern immediately.

Worker 4 (UI table) hit a different issue: a scroll overflow bug in the MUI DataGrid when columns exceeded the container width. The worker fixed it, and the blueprint automatically saved the fix as a new observation. Next time any worker touches DataGrid layout, that context will be waiting.

Five tasks, four in parallel, two leveraging past memory, one generating new memory. The whole thing took about twelve minutes wall-clock time. Sequential, without memory, it would have been closer to forty-five.

## What's Next

Both tools are open source and actively developed. Some things I'm working on:

- **Docker isolation for minion workers**: full container sandboxing instead of worktree isolation, for projects where git worktrees aren't sufficient
- **cortexmem cloud sync**: for teams that want shared memory without a git repository
- **Cross-agent memory**: letting cortexmem observations flow between different AI agents (your Claude Code agent's discoveries available to your Cursor agent)
- **Smarter decay**: using access patterns to predict which observations will be useful for upcoming tasks, not just which ones were useful in the past

## The Bigger Picture

I started this trilogy with a metaphor. AI agents are pedestrians. They're powerful, they're smart, but they're walking. Give them a bicycle and they become something qualitatively different.

cortexmem is the navigation system. It knows where you've been, what you learned along the way, which routes have potholes. minion-toolkit is the drivetrain. It multiplies your pedaling into real forward motion, parallel and powerful. Together, they make the bicycle complete.

But here's what I've learned building these tools: the bicycle metaphor goes beyond performance. A pedestrian is reactive: they deal with what's in front of them. A cyclist plans routes, builds momentum, carries knowledge from ride to ride. That's the shift. Not just faster agents, but agents that *compound*.

Both projects are MIT licensed, both welcome contributions, and both are designed to work with any MCP-compatible AI agent, not just Claude Code.

- [cortexmem on GitHub](https://github.com/pablocalofatti/cortexmem)
- [minion-toolkit on GitHub](https://github.com/pablocalofatti/minion-toolkit)

If you're building tools for AI agents, I'd love to hear what bicycles you're working on. And if you try either of these, open an issue. The feedback from the first two posts shaped half the features in the current releases.

The age of amnesiac, sequential AI agents is ending. Not because the models got smarter, but because the tools got better.

That was always the point.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading