Claude Code, Part 5: Skills & Systems

Parts 1-4 gave you command-level control of a single session. Part 5 is about encoding how you want Claude Code to behave so you don't retype it next week.

Six concepts do the work. Four teach; two enforce:

CLAUDE.md: always-loaded project instructions.
Rules: instructions scoped to file paths.
Skills: invocable procedures for specific tasks.
Subagents: isolated workers with their own context.
Hooks: deterministic scripts on lifecycle events.
Plugins: packaging for sharing the above.

This part covers the first four: the teaching layer, what Claude reads. Part 6 covers hooks and the enforcement layer: what Claude cannot bypass.

Where Should This Live? The 4-Way Decision

Every instruction, procedure, or safety gate you want Claude Code to follow lands in one of four places. Where you put it decides when it loads, how scoped it is, and whether it's advisory or enforced.

If the knowledge is…	It lives in…	When it loads	Enforcement
Always relevant, every session	CLAUDE.md	Session start	Advisory
Only relevant for specific file paths	`.claude/rules/*.md` with `paths:`	When Claude reads a matching file	Advisory
A procedure for a specific task	`.claude/skills/<name>/SKILL.md`	When invoked by you or Claude	Advisory
A non-negotiable on every matching event	Hooks in `settings.json`	On the matching lifecycle event	Deterministic

One B2B example per layer:

CLAUDE.md: "We use pnpm. Postgres runs in docker-compose. Drizzle for migrations. Every tenant query filters on tenant_id."
Rule at .claude/rules/api-handlers.md scoped to src/api/**/*.ts: "All API handlers validate input with Zod and return { data, error, meta }."
Skill prove-the-fix/SKILL.md: a procedure that runs when the user reports a bug.
Hook: a PreToolUse script blocking writes to .env* and .github/workflows/** regardless of what Claude decides.

The first three layers are context Claude reads. They shape behavior but cannot force it. Hooks run as shell scripts outside Claude's decision loop, which is why they go in Part 6.

Rule of thumb: if you ever need to say "this must happen every time, no exceptions", it belongs in a hook. If it's conditional on task or judgment, pick the smallest of CLAUDE.md / rule / skill that matches the scope.

Reference docs: memory, skills, hooks, settings.

CLAUDE.md and Auto-Memory, Precisely

These are two different mechanisms. Guides often conflate them; the memory doc is explicit about where the line sits.

CLAUDE.md is instructions you write. It loads in full at session start, regardless of length. The memory doc recommends keeping each file under 200 lines for better adherence. Not because of a load cap, but because longer files get followed less consistently.

Scopes, in load order, broadest first; more specific files load later and win on conflict:

Scope	Path	Shared with
Managed policy	System-wide location, deployed by IT	Everyone on the machine
User	`~/.claude/CLAUDE.md`	You, all projects
Project	`./CLAUDE.md` or `./.claude/CLAUDE.md`	Team (via git)
Local	`./CLAUDE.local.md`	You, this project only (gitignored)

Auto-memory is notes Claude writes itself as it works. It lives at ~/.claude/projects/<project>/memory/ with a MEMORY.md index. Only MEMORY.md has the "first 200 lines or 25 KB" cap that some guides misattribute to CLAUDE.md. Topic files like debugging.md are not loaded at startup; Claude reads them on demand.

When CLAUDE.md grows past 200 lines, split it: by topic into .claude/rules/*.md, by import via @docs/<file>.md, or by scope into CLAUDE.local.md. Verify what's loaded in a session with /memory.

Rules: The Scoped Middle Layer

Rules are instructions scoped to file paths. Use them when a constraint only matters for part of the codebase.

.claude/rules/api-handlers.md:

---
paths:
  - "src/api/**/*.ts"
---

# API handler conventions
- Validate input with Zod before the first DB call.
- Return `{ data, error, meta }`. Never raw arrays.
- Errors include `code`, `message`, `requestId`.

A rule without paths: loads on every session, same priority as CLAUDE.md. A rule with paths: loads only when Claude reads a matching file, so it costs zero context on unrelated work.

Typical rule set for a B2B project: api-handlers.md → src/api/**, db-migrations.md → drizzle/**, tenant-isolation.md → src/db/**, testing.md → **/*.test.ts. Stack-agnostic: swap glob patterns for Python (**/*.py), Go (internal/**/*.go), or Rails (app/**/*.rb).

Skills: Encode the Procedures You Keep Repeating

A skill is a folder with a SKILL.md file that teaches Claude a specific task. You invoke it with /skill-name, or Claude invokes it automatically when your request matches its description.

Skills use progressive disclosure. Skill descriptions stay in context so Claude knows what's available (they're trimmed only if you install enough skills to overflow the listing budget). Skill bodies load only when invoked. You can have dozens installed with near-zero cost until used.

When to Write a Skill

A heuristic I use, three strikes then encode:

Time	Action
1st	Handle it manually
2nd	Note the pattern
3rd	Write a skill

Never pre-create skills. Let them emerge from repetition. If you explain something three times, encode it.

Before you encode, run it past three tests; a skill earns its place only if all three hold: (1) the task is procedural (ordered steps, branching), not a declarative rule; (2) it's sometimes-relevant (always-relevant belongs in CLAUDE.md, deterministic-always belongs in a hook); (3) skipping a step produces a materially worse outcome. Fail any one and it's a rule, a hook, or a one-off plan, not a skill.

What a SKILL.md Needs

Per the official skills reference, all frontmatter is optional; only description is recommended so Claude knows when to load it. Beyond that, the doc says SKILL.md can contain anything. The structure I've found works best is three parts: a process (numbered steps), stop conditions (when to escalate instead of pressing on), and a verification checklist (how you know it worked).

---
description: >
  Reproduce a bug in a failing test under .claude/proofs/ before
  fixing it. Invoke when the user reports a bug or pastes a stack trace.
disable-model-invocation: true   # has side effects: user-invoked only, never auto-fired
allowed-tools: Read, Write, Edit, Bash   # restrict to what the steps actually need
---

Then a short body with the three sections above. Per the skills reference, keep SKILL.md under about 500 lines; move detailed reference material into supporting files in the same directory.

Two frontmatter fields do real safety work, so set them from the start. disable-model-invocation: true makes a skill user-invoked only, so a procedure with side effects (data mutations, a migration, anything that writes) never auto-fires on a loose description match. allowed-tools restricts the skill to the tools its steps need. And end the body with a hard Refuse to proceed if … section listing preconditions (dirty working tree, missing backup, no human approval), treated as gates rather than suggestions. The rule of thumb from running this in production: every side-effecting skill carries disable-model-invocation: true and a refuse-to-proceed block, and the only skills left model-invocable are the fully git-undoable ones.

Anti-Rationalization Tables

LLMs skip steps by rationalizing ("I'll add tests later", "there's already a similar test", "TypeScript would catch this"). Pre-empt it by naming the rationalizations inside the skill:

Excuse	Rebuttal
"A test for this would be flaky."	Then the fix is unverifiable. Write it anyway, mark `@flaky`.
"There's already a similar test."	Not a proof for this regression. Add one for the exact conditions.
"TypeScript would have caught this."	It didn't. The bug is here.

This works because it names the rationalizations explicitly, making them harder for the model to follow silently.

Naming Skills

A convention I've found useful: name by methodology, not by action-noun. prove-the-fix beats add-tests. surface-assumptions beats plan-feature. A quick check: if the name could describe a rule ("API handlers use Zod"), it probably should be one. Scaffolding belongs in rules; methodology belongs in skills.

An MVP Skill Set for B2B SaaS

These six are the set I run, not a universal must-have list. One honest caveat up front: the skills developers install most are task-named (commit messages, code review, test generation, document-format handling), and those are genuinely useful. This set is different on purpose. It encodes methodology (how to approach a recurring class of problem), the part that doesn't already ship as a built-in and that a scaffolding rule can't capture. All six are methodology-named and load only when their description matches the conversation. Adopt the ones that fit how your team works; drop the rest.

Skill	When it fires	Why it earns its place
`surface-assumptions`	Multi-file change, ambiguous spec	Forces silent assumptions onto the table before coding.
`prove-the-fix`	Bug report, pasted stack trace	A failing test before the fix is a regression forever after.
`review-diff`	Before commit, or on request	Tiered severity (BLOCKER / WARN / NOTE) surfaces what matters.
`plan-breaking-change`	Edits to public API, DB schema, shared types	Classifies ADDITIVE / NON-BREAKING / BREAKING / DESTRUCTIVE. Spec for the last two.
`investigate-incident`	Production error, pasted log or trace	Structured triage: evidence → ranked hypotheses → cheapest test first.
`spec-before-build`	Feature request	Requirements → acceptance criteria → plan → approval gate before any code.

What's deliberately not on this list: add-endpoint, add-tests, new-service. These are scaffolding, which belongs in a path-scoped rule (.claude/rules/api-handlers.md with paths: src/api/** covers it without loading a skill every time). Also not here: review-before-commit as a single monolithic skill, because the built-in /review, /code-review (including /code-review ultra, formerly /ultrareview), /security-review, and bundled /simplify already cover that ground.

Subagents: Three Use Cases, Not One

A subagent is a separate Claude Code instance. It runs one focused task in its own context window and returns results.

What a Subagent Inherits

Per the official subagents doc, a subagent starts in the main session's working directory with its own fresh context window.

Inherits: its own system prompt from the definition, basic environment details (working directory, git status), the full CLAUDE.md / rules / CLAUDE.local.md memory hierarchy the main session loads, and skills listed in its skills: frontmatter (preloaded into context).

Does not inherit: the parent's conversation history, skills not listed in frontmatter, or the parent session's auto-memory notes. One nuance worth knowing: custom subagents defined in .claude/agents/ load your CLAUDE.md hierarchy by default; only the built-in Explore and Plan agents skip it. The same exception applies when a skill forks into a subagent (context: fork): CLAUDE.md is injected unless the chosen agent is Explore or Plan.

Two frontmatter fields are worth knowing before you write your first custom subagent:

skills: preloads the full content of named skills into the subagent's context at startup (not just their descriptions). Use it to give a code-reviewer your review-diff playbook without it having to discover the skill mid-task.
memory: gives the subagent a persistent notes directory that survives across conversations. Scope user stores at ~/.claude/agent-memory/<agent-name>/ (broadly applicable); project stores at .claude/agent-memory/<agent-name>/ and commits with the repo (recommended default); local stores at .claude/agent-memory-local/<agent-name>/, the gitignored equivalent. The subagent curates its own MEMORY.md the same way main-session auto-memory works, so over time a reviewer learns your codebase's recurring patterns.

Three Reasons to Spawn One

The subagents doc lists five reasons subagents help: preserve context, enforce constraints, reuse configurations, specialize behavior, and control costs (routing to a cheaper model). In practice, three cases cover most of my use, and they're the ones where "just prompt harder" fails:

Context isolation. A search or analysis task fills your context with file contents you won't reference again. Delegate it, get a summary back, keep the main session clean.
Bias-free review. Reviewing code in the session that wrote it has anchoring bias: the subagent "remembers" why each decision was made. A fresh-context review agent catches what self-review misses.
Restricted tools as safety. A tools: allowlist constrains a subagent only when it excludes the powerful tools; granting Bash (or Write/Edit) re-opens nearly everything, so "fewer tools" is not automatically "safer." Exclude write-capable tools entirely where you can. When a task genuinely needs Bash (a DB investigator, say), the capability boundary has to live outside the tools list: enforce read-only at the database itself (below), and treat any PreToolUse hook (Part 6) as one more layer, not the guarantee.

Design Around Context, Not Roles

This is the rule that prevents the most common subagent antipattern: modeling them as workflow stages.

ANTIPATTERN: subagents as workflow stages

   planner  →  implementer  →  reviewer

Each phase spawns a fresh subagent. Context passes as summaries.

Why it fails: the implementer needs context the planner already had
(why decisions were made, what alternatives were rejected). Passing
only a summary loses that. The implementer re-derives, badly.

Split subagents only when context can be genuinely isolated. A subagent implementing a feature should also write its tests: it already has the context. A subagent reviewing a diff should not have written it. Isolation is the whole point.

An MVP Subagent Set

Three worked examples, one per use case above. They're teaching examples, not the list every team should install. In practice the subagents teams converge on are boring and task-specific: code-reviewer is near-universal, usually joined by a test-runner, a debugger, a docs agent, and a security reviewer. Of the three below, only code-reviewer is a common pick; safe-sql-runner and research-orchestrator are deliberately less typical, chosen because they show the restricted-tools and context-isolation ideas cleanly.

code-reviewer (bias-free review). Read-only tools (Read, Grep, Glob), Sonnet or Haiku. System prompt: review the staged diff with fresh eyes, output tiered severity (BLOCKER / WARN / NOTE), do not rubber-stamp. Fresh context is the whole point, so keep it lean.

safe-sql-runner (restricted tools as safety). Give it tools: Read, Bash, then enforce read-only at the database: connect as a role granted only SELECT/USAGE, or set default_transaction_read_only = on. That is the only control psql can't talk its way around. A hook that allows only commands starting with psql -c "SELECT is not a read-only guarantee: psql runs multiple semicolon-separated statements in one -c (psql -c "SELECT 1; DROP TABLE users"), a SELECT can call write functions or wrap INSERT … RETURNING in a CTE, and \copy / -f / COPY … TO PROGRAM write files or run commands. Keep such a hook only as defense-in-depth on top of the read-only role, never as the boundary itself.

research-orchestrator (context isolation). Claude Code already ships a built-in Explore agent for exactly this, so reach for that first; build a custom one only when you need specific tooling or a cheaper model pinned to it. If you do: full read tools and Bash for rg/fd, for any task likely to produce more than 50 tool calls of search noise. The subagent returns a short summary; your main session stays focused.

What's not here: a planner, an implementer, a tester. Those are workflow stages, and subagents are the wrong tool (see the antipattern above).

What's Next

You now have the teaching layer: CLAUDE.md, rules, skills, subagents. Claude reads all of these. On a good day it follows them closely. On a bad day it doesn't.

Part 6 is about the bad day. It covers hooks, the Prove-It pattern for bug fixes, the escalation format that turns ⛔ GATE from prose into something Claude actually honors, and tiered classification for decisions where "pass/fail" is too coarse. It's how the system stops relying on Claude choosing to do the right thing and starts enforcing it.

Part 6: From Guidance to Guarantees →

Ta lekcja jest częścią Claude Code Field Guide, darmowego dodatku do Claude and the Code. Książka uczy myślenia stojącego za komendami.

Ukończyłeś 5 z 6 lekcji.

Chcesz pełną metodologię?

Field Guide uczy komend. Książka uczy myślenia.

Kup książkę — $17 Następna →

← Powrót do strony głównej