scientific-python · henryiii · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026
diff --git a/docs/guides/ai.md b/docs/guides/ai.md
@@ -0,0 +1,390 @@
+---
+short_title: AI
+---
+
+# Agentic AI
+
+Around November of 2025, agentic AI exploded in usefulness, and has changed how
+a lot of software is written, reviewed, and maintained. "Agentic" AI is more
+than a chatbot; it has access to "tool calls", which can read and write files,
+and most importantly it runs in a loop so it can verify that code passes
+checks. This is closer to how a human codes; we run code and verify outputs,
+we do not write working code from scratch without running it.
+
+It helps to separate two very different things that often get lumped together
+as "Agentic AI":
+
+- A developer driving an interactive AI harness with a capable model, reading
+  the output, and taking responsibility for the result. This is a power tool,
+  much like an editor or a linter.
+- Low-cost models running unattended in automated systems that mass-produce
+  pull requests. This is what most people mean by "AI slop", and it is the
+  source of most of the frustration maintainers feel about AI contributions.
+
+The recommendations below are aimed at the first case, and at keeping your
+project from being overwhelmed by the second.
+
+:::{note}
+The first point does hide something: the tool depends on the developer guiding
+it (just like any other tool). You will also see users with very little coding
+experience using these tools to produce low quality contributions. How someone
+learns to code in this new era is still something unsolved.
+
+If you maintain a project: Try to engage with the human. If they are willing to
+interact (and not just type "address the review" into their harness), treat
+them like a human, even if you also see the AI working on their behalf. They
+also may use AI to address a language barrier.
+:::
+
+## Disclosure and transparency
+
+We recommend **full disclosure**. Knowing what model was used lets a reviewer
+run a model from a different model family to help them review the contribution.
+A maintainer has a better idea of what to expect based on the model used. And
+it's generally more respectful to not keep your process hidden when
+contributing to open source - maybe the maintainer would like to try that model
+too. If you heavily edit the model output, then use your discretion; but being
+open about the whole process is generally better!
+
+**Credit AI in commits.** Follow the convention used by the Linux kernel and
+add a trailer. Never allow the model to add itself as a co-author. The code is
+still yours (and your responsibility); the AI is a tool, not an author or
+copyright holder, which is what co-authored-by is for. A growing number of
+projects will close a PR with an AI co-author out of licensing concerns.
+
+The Linux kernel trailer looks like this:
+
+```text
+Assisted-by: <harness>:<model>
+```
+
+You can usually customize your harness to include this, either in an agents
+file (below), or via specific settings.
+
+**Write your own PR descriptions.** Generated PR summaries tend to be verbose,
+impersonal, and a chore to read. Write the description yourself. If a PR or
+comment does contain AI-generated prose, mark it clearly, for example with a
+short disclaimer line at the top - and you can still write a human written
+message above that disclaimer.
+
+**Keep human review human-to-human.** Maintainers should never have to argue
+with a bot. Don't make a reviewer talk to an AI without knowing it; if an AI is
+responding on your behalf, say so (e.g. with an AI disclaimer at top).
+You are accountable for every change you submit.
+
+**Don't submit slop.** Don't open a PR that a maintainer could finish faster
+than they can review it, and don't mass-file unsolicited PRs. Reviewing an
+AI-generated PR can take far longer than writing it did -- effectively a
+denial-of-service on volunteer maintainers. If the change is trivial with AI,
+the maintainers probably could just trigger the AI themselves. Make sure the
+pull request is welcome -- check issues, ask first, etc.
+
+## `AI_POLICY.md`
+
+A growing convention is to add an [`AI_POLICY.md`][ai-pr-policy] to your
+repository so contributors know what is expected of AI-assisted work. There is
+no single right answer; pick the stance that matches your project's tolerance
+and capacity. The tabs below sketch three levels you can adapt.
+
+::::{tab-set}
+:::{tab-item} All in
+
+AI-assisted contributions are welcome on the same footing as any other, as long
+as they meet the project's quality bar and are disclosed.
+
+```markdown
+# AI Policy
+
+AI-assisted contributions are welcome. We ask that you:
+
+- Disclose that AI was used and name the tool/model.
+- Review and understand every line you submit; you are responsible for it.
+- Meet the same quality, testing, and style standards as any contribution.
+```
+
+:::
+:::{tab-item} Moderate
+
+AI assistance is fine, but the burden is on the contributor to show real human
+involvement and prior buy-in before opening a PR. This mirrors the
+[original proposal][ai-pr-policy].
+
+```markdown
+# AI Policy
+
+AI-assisted contributions are accepted only if:
+
+- The PR fills out the pull request template.
+- It clearly states that it is AI-assisted and names the tool used.
+- It links to an issue or discussion where a maintainer agreed to the
+  proposed change beforehand.
+
+Unsolicited, undisclosed, or low-effort AI PRs will be closed.
+```
+
+:::
+:::{tab-item} Minimal
+
+AI-generated PRs are discouraged or restricted. Use this if you have limited
+review capacity.
+
+```markdown
+# AI Policy
+
+We do not accept unsolicited AI-generated pull requests. Please open an issue
+to discuss before contributing. Fully-reviewed, disclosed AI-assisted fixes may
+be considered case by case.
+```
+
+:::
+::::
+
+## `AGENTS.md`
+
+Harnesses read a project context file to learn how your repository works --
+preferred command runners, architecture notes, conventions, and gotchas. A good
+context file makes the AI far more effective without bloating every prompt. The
+cross-tool standard is [`AGENTS.md`][agents-md]; most harnesses can generate a
+first draft for you (often via an `/init` command).
+
+Keep it focused on what is *not* obvious from the code: how to run the tests,
+which tools to prefer, where generated files live, and any traps. Treat it as
+documentation you maintain, not a dumping ground.
+
+:::{note} Claude Code and `AGENTS.md`
+
+Claude Code is the only major harness to *not* read `AGENTS.md`. You can support
+both with a symlink, keeping a single source of truth:
+
+```bash
+ln -s AGENTS.md CLAUDE.md
+```
+
+You can also mention `@AGENTS.md` inside `CLAUDE.md` if you want to add
+specific instructions; this is true for all the other harnesses too
+(`copilot-instructions.md`, etc).
+
+:::
+
+How you track the file is a separate decision:
+
+::::{tab-set}
+:::{tab-item} Commit it
+
+Commit `AGENTS.md` so every contributor (and their harness) shares the same
+project context. This is a good default for projects with at least one
+maintainer also using AI harnesses. (Ignoring `CLAUDE.md` and `.claude/` is
+also a good idea, due to that not supporting standards and being fairly
+common.)
+
+:::
+:::{tab-item} Ignore it
+
+Add `AGENTS.md` to your `.gitignore` if you'd rather each contributor maintain
+their own. The ignore entry signals that the file is expected but personal.
+
+:::
+:::{tab-item} Leave it out
+
+Don't reference it at all. Contributors who want a personal context file can
+keep it out of version control locally by adding it to `.git/info/exclude`,
+which (unlike `.gitignore`) is never shared. Some projects don't want to mention
+AI at all, even in a `.gitignore`.
+
+:::
+::::
+
+## User-level configuration
+
+Beyond per-project context, most harnesses support a user-level config that
+applies everywhere (for example `~/.claude/CLAUDE.md` or
+`~/.config/opencode/AGENTS.md`). This is the place for your personal,
+cross-project preferences, such as:
+
+- Your environment (System setup, GitHub username).
+- Tool preferences, e.g. "use `uv run` in Python projects".
+- Your commit and PR conventions, including the disclosure trailers above.
+- If you use local or small models, you can request relative paths be used
+  (easier for them to write).
+
+Here's an example file:
+
+```markdown
+You are on macOS, but have GNU sed. `python3` can be used if python without
+dependencies is needed. Use `uv run` if in a python package.
+
+Use `prek -a --quiet` instead of `pre-commit run -a` for linting.
+
+If you make a commit, follow conventional commits and add a trailer:
+`Assisted-by: <harness>:<model>`, where `<harness>` is the current agent
+harness, and `<model>` is the AI model.
+
+Prefix PR descriptions and comments on PRs with the line ":robot: _AI text
+below_ :robot:" to indicate you are an agent speaking on a user's behalf.
+```
+
+## Skills
+
+Skills are reusable, named sets of instructions for repetitive workflows that
+you can invoke on demand: dropping a Python version, checking trusted
+publishing, applying a project's changelog style, and so on. They follow a
+shared [skills standard][agentskills], so a skill you write can work across
+multiple tools. See [skills.sh][] for a catalog and more background.
+
+If you find yourself giving the AI the same multi-step instructions repeatedly,
+that's a good candidate for a skill. AI can help you write skills. You can store
+skills (like changelog skills) in a repository at `.agents/skills`, or for your
+user at `~/.agents/skills`. The `gh skills` command can help you manage them.
+
+:::{note}
+Yes, you probably guessed by now, Claude Code does not respect the standard
+location. You have to symlink `.agents/skills` to `.claude/skills`, of course.
+:::
+
+## A few harness features worth knowing
+
+The details vary by tool, but most modern harnesses share a common vocabulary:
+
+- **Slash commands** for built-in actions (e.g. initialize context, plan, or
+  review). `/init`, `/review`, `/diff`, `/skills`, `/compact`, etc.
+- **`@`-mentions** to pull specific files into context.
+- **Planning mode**, where the AI proposes an approach and asks clarifying
+  questions before editing. Valuable for anything non-trivial.
+- **Subagents**, which run a sub-task in their own context and report back a
+  summary, useful for research and parallel work, and keeping your context
+  managed.
+- **Model tiers**, letting you match a cheap, fast model to simple tasks and a
+  frontier model to hard ones. Use good models at first, then you'll learn what
+  is easy and hard for an AI, and can match better.
+
+As you'll learn, effective use of AI is often about managing context; loading
+the context with things the model needs to work on your problem (like design
+spec documents, etc) is important, as is also keeping the context short
+(limiting tool output, compacting, etc) to avoid giving the model too much to
+think about.
+
+## Common concerns
+
+- **Don't try one-shot.** Watch what the AI is doing and steer it.
+  Planning mode and a quick read of the diff catch most problems early. It's
+  fine to iterate, you aren't trying to make an AI commercial!
+- **Verify, don't trust.** Models hallucinate; confirm invented explanations
+  and APIs. Make sure the model validated with testing, ask it to if it doesn't
+  first try. Reviewing with a *different* model family can catch issues a model
+  won't flag in its own work.
+- **You own the result.** AI proposes; you decide. It does not know your
+  project's best practices unless you tell it, and it can't judge what is
+  "best".
+- **Mind security.** Code sent to a hosted model leaves your machine; avoid
+  sending confidential code to providers you don't trust, and never grant an
+  agent destructive access (for example, to production data). AI tools are
+  themselves a supply-chain target; see the [security guide][security] for
+  dependency pinning, cooldowns, and CI hardening.
+- **Beware untrusted content.** Anything an agent reads can carry instructions:
+  issue text, PR comments, a fetched web page, CI logs. A model might confuse
+  instructions from a payload buried in the content it was asked to
+  process - even in hidden comments. When you point an agent at outside
+  material (e.g. "triage these issues" or a CI run URL), review what it does
+  rather than letting it act unattended, and don't combine untrusted input with
+  destructive or credentialed access. This is unfortunately a big issue with
+  setting up an automated issue processing system.
+
+## What AI is good at
+
+AI is fantastic at anything that has a clear pass/fail condition. This means
+it's great at fixing up a failing PR, addressing linter failures, polishing off
+anything that's failing tests into making it pass tests. That's why good tests
+and strong linters and type checking are so helpful to AI, they give it a
+better pass/fail to work with. Do keep an eye on it, though, sometimes it will
+skip something instead of fixing it; sometimes that's correct, but decision
+making is not as strong of an AI skill as pass/fail checks!
+
+AI knows a massive library of tricks and details. It will hallucinate ones
+sometimes, of course (that's why the pass/fail is important above!). Make it
+validate anything (newer models often have this in the system prompts, so it is
+model and harness dependent - for example, Claude Opus 4.8+ is paranoid
+and validates without request).
+
+AI doesn't mind long or annoying tasks - iterating with a CI that takes minutes
+or hours, running things though docker, figuring out how to build projects,
+etc. You'll realize that things you know are good ideas, but you were too time
+constrained to try before are perfect candidates for AI. Want to find the 20
+most important downstream projects and test them all before and after some
+change you made? AI is happy to do it!
+
+As new models are coming that are better than humans at finding and exploiting
+vulnerabilities, we need to be running those models on our code to find and fix
+bugs before they can be exploited.
+
+## What should you try?
+
+Regardless of what AI companies tell you, one of the hardest things to do with
+a model is write new code. Especially from scratch (it will mimic the current
+style). That's also something that tends to be fairly enjoyable: Don't make AI
+do stuff you'd rather do yourself! Start by using the AI to do the stuff you
+*don't* like. Then start having it do things you wouldn't do because you don't
+have time to do it. Here are some suggestions for prompts to try:
+
+:::{note} Disclaimer
+These suggestions are for *your* projects. Never do this to someone else
+without them asking for it!
+:::
+
+- "Review this project for bugs, performance, simplifications, and
+  modernizations" - you might be shocked at how much it can find!
+  - Make sure you use a good model, and have it validate the findings (some
+    do not need extra prompting to do this).
+  - Followup: Put this into an issue, then open up draft PRs for these.
+    Group several into one PR when it makes sense. The PRs should reference
+    the issue.
+- "Categorize all open issues. Highlight issues that can be easily closed,
+  and issues that are bugs that you can reproduce."
+  - Followup: "Launch subagents to fix all the reproduced bugs in worktrees,
+    and open a PR for each"
+
+Smaller ideas:
+
+- "Explain the structure and design of this project."
+- "What's new since last release? Changelog style."
+- "Review the documentation for this project. Look for typos and gaps in
+  coverage."
+- "Rebase this PR"
+- "Review PR #123" (most harnesses provide a `/review` command too).
+- Give it the URL to a flaky CI run and ask it to investigate it.
+- Ask it to revive an old outdated PR based on the current codebase.
+- Write something then ask it to apply what you did to something else similar.
+- Point it at a bug report and ask it to reproduce it as a failing test, then
+  fix it.
+- "Bisect this regression" - finding the commit that broke something is a
+  tedious mechanical loop AI is happy to run.
+- "Add tests for the change I just made" - good tests and coverage give it a
+  clear pass/fail to work against.
+- "Add type annotations here until the type checker passes."
+- Ask it to draft release notes or a changelog from the git log between two
+  tags. It will try to mimic the existing style if there is one.
+
+## Tips
+
+If you want to see your usage across harnesses, Wes McKinney (of Pandas fame)
+has [AgentsView][], which reads local files from most harnesses and summarizes
+for you. Try `uvx agentsview usage daily`, for example. A similar tool is
+`npx ccusage`, which despite the name supports multiple harnesses too.
+
+If you use Claude Code, `npx ccstatusline` is much better than having the AI
+try to write its own status line.
+
+A very powerful technique is "rubber duck", where you develop code with one
+model, then review it with a different model, feeding the review back into the
+original model, and iterate. This can provide a significantly better result
+than either model on its own, moving up
+[about 74% to the next model class in some tests][rubberduck]. (This is also
+why model disclosure is important). You don't need a specialized mode (copilot
+has one), you can do this yourself if you have access to two model families.
+
+[ai-pr-policy]: https://willmcgugan.github.io/ai-pr-policy/
+[agents-md]: https://agents.md
+[agentskills]: https://agentskills.io
+[agentsview]: https://www.agentsview.io
+[rubberduck]: https://github.blog/ai-and-ml/github-copilot/github-copilot-cli-combines-model-families-for-a-second-opinion/
+[skills.sh]: https://www.skills.sh
+[security]: guides/security