Get started
Anthropic Agent Skills explained, with a PDF generation example

Anthropic Agent Skills explained, with a PDF generation example

What Agent Skills are, how they differ from MCP servers and system prompts, and a worked example of shipping a Skill that generates PDFs from prompts.

Axel13 min read

Anthropic launched Agent Skills as an open standard on October 16, 2025, and OpenAI added compatible support a few weeks later. Skills are folders of instructions, scripts, and resources that an LLM agent loads on demand when the user's task matches a description in the skill's manifest. They sit between system prompts and MCP servers, and they have quietly become one of the most important shapes for shipping product expertise to coding agents.

This article covers what Skills actually are, how they differ from MCP and system prompts, why progressive disclosure matters, and how PDF4.dev's pdf4-generate Skill is structured. The audience is developers integrating LLMs into products and AI tooling builders who already know what an agent is.

What Agent Skills are

A Skill is a directory with a SKILL.md manifest at its root, optional supporting files, and a YAML frontmatter that tells the agent when to load it. Anthropic's release note describes them as "organized folders of instructions, scripts, and resources that Claude loads dynamically to perform specialized tasks." That phrasing is exact: instructions live next to the code and templates the agent will use, and the bundle is treated as a single unit.

The minimal manifest looks like this:

---
name: pdf4
description: Generate PDFs via the PDF4.dev API. Use when the user asks to create, render, or generate a PDF document.
---
 
# PDF4.dev — PDF Generation Skill
 
You are helping the user generate PDFs using the PDF4.dev API.

Two fields are required: name (max 64 characters, lowercase letters, numbers, and hyphens) and description (max 1,024 characters). The description is the most important line in the file. It is the only piece the agent reads at idle, and it decides whether the rest of the file is loaded into context. Optional fields like disable-model-invocation, user-invocable, and context: fork control invocation rules and isolation, as documented in a comprehensive Skills guide.

Beyond the manifest, a Skill folder can contain helper scripts, reference documents, prompt templates, and example data. The agent can read or execute any of these as it works, but only when the body of SKILL.md instructs it to.

Skills vs MCP servers vs system prompts

Most readers will conflate these three shapes the first time they see Skills. The differences matter:

ShapeWhat it isWhere it runsLoaded whenBest for
System promptStatic text injected at the top of every requestClient side, every turnAlways in contextPersona, global rules, brand voice
Agent SkillFolder with manifest plus optional scripts and assetsBundled with the agent or uploaded via APIOn demand, when description matchesProcedures, templates, domain expertise
MCP serverWire protocol exposing tools, resources, promptsRemote process the agent callsWhen the agent invokes a toolExternal system access, live data, side effects

A system prompt is "things the agent should always remember." An MCP server is "things the agent can do at runtime." A Skill is "the right way to do a specific class of task." You can ship a Skill that calls MCP tools, and you can ship an MCP server with no Skill, but the two answer different questions.

The line that helps most: MCP gives the agent a verb, a Skill teaches it the recipe. Calling render_pdf is the verb. Knowing what HTML to write, which Handlebars helpers exist, when to use a template versus raw HTML, and how to save the response as a .pdf file is the recipe.

The progressive disclosure pattern

Skills load in three tiers, and the budget at each tier is small. The metadata tier costs roughly 100 tokens per skill: the agent only sees the name and description. The instructions tier loads the body of SKILL.md, capped at a few thousand tokens by convention. The resources tier loads scripts and reference files only when the body of the manifest tells the agent to read them.

This is why a project with fifty installed skills does not consume fifty times the tokens of one skill. The tiers compound:

Tier 1 — metadata           ~100 tokens per skill, scanned at every turn
Tier 2 — SKILL.md body      Loaded once when the description matches
Tier 3 — scripts + assets   Loaded when the body says to read them

Two consequences fall out of this design:

The description carries activation. If the description does not say when to use the skill, the agent will never load it. Burying activation criteria in the body is the single most common mistake authors make. The body only runs after a skill is selected.

Skills scale where system prompts cannot. A system prompt with fifty domains of expertise quickly hits context limits. A skill directory with fifty entries costs roughly 5,000 tokens at idle. Anthropic's enterprise offering exposes admin controls so that an organization can curate which skills are provisioned to which workspaces, with the cost staying flat as the catalog grows.

Anatomy of a great Skill

A high-quality Skill makes four things obvious to the agent:

The trigger. The description names both the verb (what the skill does) and the trigger (when to load it). "Use when the user asks to create, render, or generate a PDF" is the trigger. Without it, the agent has to infer relevance, and inference is unreliable.

The contract. The body opens with a one-paragraph description of the API, the auth model, and the response format. The agent should not have to scroll to know whether it has the credentials it needs.

The happy path. A complete worked example, usually with a curl or fetch snippet, that the agent can copy and adapt. If the example covers the most common case end to end, the agent will not invent a wrong call.

The escape hatches. Edge cases, optional parameters, and related capabilities, listed but not over-explained. The agent does not need an essay; it needs a list it can scan.

Common pitfalls authors hit:

  • Putting activation rules in the body, where the agent never sees them at idle.
  • Writing the body as a tutorial for humans, with narrative flow and digressions, instead of as a reference for an agent that will pattern-match.
  • Skipping a complete example, forcing the agent to assemble fragments.
  • Not specifying error shapes, so the agent has no idea what a failure looks like.

Worked example: PDF4.dev's pdf4-generate Skill

The PDF4.dev repository ships a Skill at .claude/skills/pdf4-generate/SKILL.md. Its frontmatter is a single tight description:

---
name: pdf4
description: Generate PDFs via the PDF4.dev API. Use when the user asks to create, render, or generate a PDF document — invoices, certificates, receipts, reports, letters, or any HTML-to-PDF conversion.
argument-hint: "[description of the PDF to generate]"
---

The trigger is explicit: any phrasing about creating, rendering, or generating a PDF lights up the skill. The list of document types (invoices, certificates, receipts, reports, letters) is there so the agent matches when the user says "generate me a receipt" rather than the literal word "PDF".

The body opens with the contract:

POST https://pdf4.dev/api/v1/render
Authorization: Bearer <API_KEY>
Content-Type: application/json

Then it shows the two ways to call the endpoint. From a saved template:

{
  "template_id": "invoice",
  "data": {
    "company_name": "Acme Corp",
    "total": "$4,500"
  }
}

Or from raw HTML:

{
  "html": "<h1>Hello {{name}}</h1><p>Your order #{{order_id}} is confirmed.</p>",
  "data": {
    "name": "Alice",
    "order_id": "ORD-42"
  }
}

After the contract, the body lists the format presets (a4, a4-landscape, letter, letter-landscape, square, custom), the response shape (binary PDF on success, structured JSON error on failure), and a complete curl example with a full HTML invoice. The Handlebars helpers section is a single table:

HelperExampleOutput
formatNumberformatNumber 10000 "en-US"10,000
formatDateformatDate "2026-03-08" "short"Mar 8, 2026
formatCurrencyformatCurrency 1500 "EUR" "fr-FR"1 500,00 €

A scannable table beats prose every time, because the agent can copy the cell contents directly into its template.

The Skill closes with a section on PDF4.dev's MCP server. If the agent is running in a host that supports MCP (Claude Code, Claude Desktop, Cursor, VS Code, Windsurf), the Skill tells it to prefer the MCP path, and links to the per-client setup. This is the recipe pointing at the verb. The Skill encodes the procedure; MCP exposes the tools the procedure uses.

A typical run looks like this:

  1. User says "generate an invoice PDF for client Acme, total 4500 euros, due April 30."
  2. The agent's idle scan sees the pdf4 description match. It loads the body.
  3. The body says: write clean HTML with Handlebars variables, include CSS in <style>, call the API, save the response as a .pdf.
  4. The agent writes the HTML, calls render_pdf (via MCP) or POST /api/v1/render (via raw HTTP), and writes the bytes to disk.

No system prompt edits, no tool wiring, no per-project config. The Skill is the integration.

Invoking Skills across hosts

Skills work the same way logically across hosts, but the activation surface differs. Here is how the same Skill is invoked across the three main shapes:

# Drop the skill folder into the project
mkdir -p .claude/skills/pdf4-generate
cp SKILL.md .claude/skills/pdf4-generate/
 
# Edits to SKILL.md are picked up on the next prompt
# (hot-reload landed in Claude Code 2.1.0, January 2026)

The Anthropic API release note specifies the /v1/skills endpoints and the skills-2025-10-02 beta header, and notes that Skills require the code execution tool to be enabled. The OpenAI Responses API documentation confirms the tools[].environment.skills shape and the 50 MB zip / 500 files per version limits.

Shipping a Skill that lives well in the directory

Anthropic seeded its Skills directory with partners across the productivity stack: Atlassian, Canva, Cloudflare, Figma, Notion, Ramp, and Sentry. The directory pattern signals that Skills are a distribution channel, not just a local file format.

The directory matters because it makes Skills discoverable the same way npm makes libraries discoverable. A user installing the Notion Skill in Claude Code does not need to know how to write a manifest. They get the right procedure for talking to Notion, vetted by Notion itself, with the same activation surface as any other skill on their machine.

Four properties make a Skill ship-worthy:

Cross-host compatibility. The manifest format is shared between Anthropic and OpenAI. Skills authored to the published SKILL.md shape work in Claude apps, Claude Code, the Claude API, and the OpenAI Responses API. Avoid host-specific syntax in the body.

Tight description, broad triggers. The description must say what the skill does in one sentence and list the obvious user phrasings that should activate it. PDF4.dev's description names six document types in addition to "PDF" and "HTML-to-PDF" because users rarely say "render a PDF" verbatim.

Self-contained body. The agent should not need network access to a documentation site to use the skill. Inline the API contract, the format presets, the helper functions, and a worked example. Reference docs are fine for edge cases; the happy path goes in the manifest.

Companion MCP server when relevant. If the skill calls a remote system that has any state (templates, accounts, files), expose that system over MCP and have the skill point at it. The skill teaches the agent how to use the tools; the MCP server provides the tools.

A practical checklist before publishing:

  • Description fits in 1,024 characters.
  • Body opens with the contract, not the narrative.
  • At least one complete worked example with input and output.
  • All variables in templates use inline backticks so the agent does not confuse them with prose.
  • Errors are documented with their structured shape, not just prose.
  • If a companion MCP server exists, the Skill links to it and explains when to prefer it.

The bigger picture: Skills + MCP + Apps SDK

The 2026 agent stack has three layers that map cleanly onto the three shapes in the comparison table earlier in this article.

Connection layer: MCP servers. The agent reaches into systems via a wire protocol. Tools, resources, and prompts move across the network. This is where read and write happens.

Knowledge layer: Skills. The agent loads procedural knowledge on demand. Recipes, templates, conventions, and code snippets travel with the agent or sit in a directory it can reach. This is where know-how lives.

Surface layer: Apps and harnesses. Claude Code, Claude Desktop, Cursor, the OpenAI Responses API, custom clients. The user interacts with one of these, and the surface is responsible for making both Skills and MCP available with the right activation rules.

The interesting move in 2026 is that the knowledge layer is now public infrastructure. A vendor can publish a Skill once and reach every coding agent without per-host integration work. That is what "open standard" means in practice: the manifest format is published, the directory is open, the API endpoints exist on both sides, and the bundle that worked in Claude Code on Monday works in the OpenAI Responses API on Friday with no code change.

For PDF4.dev specifically, this is why we ship both. The MCP server at /api/mcp is the connection layer; it exposes render_pdf, list_templates, create_template, and the rest of the API as MCP tools. The pdf4-generate Skill is the knowledge layer; it teaches the agent which tool to call, what arguments to pass, how to write the HTML, and how to handle the response. Together they turn "I want a PDF" into a saved file in two model turns, with no glue code on the user's side.

If you are integrating an LLM into your product in 2026, the question is no longer "MCP or system prompt." It is "which layer does this belong in, and which other layers need to know about it." Skills make that question answerable.

Start generating PDFs

Build PDF templates with a visual editor. Render them via API from any language in ~300ms.