A comprehensive collection of Agent Skills for context engineering, multi-agent architectures, and production agent systems. Use when building, optimizing, or debugging agent systems that require effective context management.
A comprehensive, open collection of Agent Skills focused on context engineering and harness engineering principles for building production-grade AI agent systems. These skills teach the art and science of curating context, designing agent operating loops, and evaluating agent behavior across any agent platform.
Context engineering is the discipline of managing the language model's context window. Unlike prompt engineering, which focuses on crafting effective instructions, context engineering addresses the holistic curation of all information that enters the model's limited attention budget: system prompts, tool definitions, retrieved documents, message history, and tool outputs.
The fundamental challenge is that context windows are constrained not by raw token capacity but by attention mechanics. As context length increases, models exhibit predictable degradation patterns: the "lost-in-the-middle" phenomenon, U-shaped attention curves, and attention scarcity. Effective context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of desired outcomes.
Recognition
This repository is cited in academic research as foundational work on static skill architecture:
"While static skills are well-recognized [Anthropic, 2025b; Muratcan Koylan, 2025], MCE is among the first to dynamically evolve them, bridging manual skill engineering and autonomous self-improvement."
These skills establish the foundational understanding required for all subsequent context engineering work.
| Skill | Description |
|-------|-------------|
| | Understand what context is, why it matters, and the anatomy of context in agent systems |
| | Recognize patterns of context failure: lost-in-middle, poisoning, distraction, and clash |
| | Design and evaluate compression strategies for long-running sessions |
These skills cover the patterns and structures for building effective agent systems.
| Skill | Description |
|-------|-------------|
| multi-agent-patterns | Master orchestrator, peer-to-peer, and hierarchical multi-agent architectures |
| memory-systems | Design short-term, long-term, and graph-based memory architectures |
| tool-design | Build tools that agents can use effectively |
| filesystem-context | Use filesystems for dynamic context discovery, tool output offloading, and plan persistence |
| hosted-agents | NEW Build background coding agents with sandboxed VMs, pre-built images, multiplayer support, and multi-client interfaces |
Operational Skills
These skills address the ongoing operation and optimization of agent systems.
| Skill | Description |
|-------|-------------|
| context-optimization | Apply compaction, masking, and caching strategies |
| latent-briefing | Share task-relevant orchestrator state with workers via task-guided KV cache compaction when the worker runtime is controllable |
| evaluation | Build evaluation frameworks for agent systems |
| advanced-evaluation | Master LLM-as-a-Judge techniques: direct scoring, pairwise comparison, rubric generation, and bias mitigation |
| harness-engineering | Design autonomous agent harnesses with locked metrics, durable logs, novelty gates, rollback, and human approval boundaries |
Development Methodology
These skills cover the meta-level practices for building LLM-powered projects.
| Skill | Description |
|-------|-------------|
| project-development | Design and build LLM projects from ideation through deployment, including task-model fit analysis, pipeline architecture, and structured output design |
Cognitive Architecture Skills
These skills cover formal cognitive modeling for rational agent systems.
| Skill | Description |
|-------|-------------|
| bdi-mental-states | NEW Transform external RDF context into agent mental states (beliefs, desires, intentions) using formal BDI ontology patterns for deliberative reasoning and explainability |
Design Philosophy
Progressive Disclosure
Each skill is structured for efficient context use. At startup, agents load only skill names and descriptions. Full content loads only when a skill is activated for relevant tasks.
Platform Agnosticism
These skills focus on transferable principles rather than vendor-specific implementations. The patterns work across Claude Code, Cursor, and any agent platform that supports skills or allows custom instructions.
Conceptual Foundation with Practical Examples
Scripts and examples demonstrate concepts using Python pseudocode that works across environments without requiring specific dependency installations.
Usage
Usage with Claude Code
This repository is a Claude Code Plugin Marketplace containing context engineering skills that Claude automatically discovers and activates based on your task context.
Installation
Step 1: Add the Marketplace
Run this command in Claude Code to register this repository as a plugin source:
The .plugin/plugin.json manifest follows the Open Plugins standard, so the repo also works with any conformant agent tool (Codex, GitHub Copilot, etc.).
Using Individual Skills
To use a single skill without installing the full plugin, copy its SKILL.md directly into your project's .claude/skills/ directory:
# Example: add just the context-fundamentals skill
mkdir -p .claude/skills
curl -o .claude/skills/context-fundamentals.md \
https://raw.githubusercontent.com/muratcankoylan/Agent-Skills-for-Context-Engineering/main/skills/context-fundamentals/SKILL.md
EvaluatorAgent: High-level agent combining all evaluation capabilities
Book SFT Pipeline Example
The book-sft-pipeline example demonstrates training small models (8B) to write in any author's style:
Intelligent Segmentation: Two-tier chunking with overlap for maximum training examples
Prompt Diversity: 15+ templates to prevent memorization and force style learning
Tinker Integration: Complete LoRA training workflow with $2 total cost
Validation Methodology: Modern scenario testing proves style transfer vs content memorization
Integrates with context engineering skills: project-development, context-compression, multi-agent-patterns, evaluation.
Researcher Operating System
The researcher directory is a file-based operating system for turning external research into skill changes. It exists so this repository can act as a compounding source of truth instead of an anthology.
Measured router-benchmark results
The skill router (which decides whether the right skill gets loaded for a given task) has been benchmarked end-to-end against four frontier models via the Cursor SDK. Three full sweeps (50 prompts x 4 models x 3 replications = 600 calls each):
Mechanism registry (researcher/mechanisms/registry.jsonl + ledgers/): 16 accepted behavior changes used as the primary novelty signal, with append-only accepted/rejected ledgers for institutional memory.
Claim provenance (researcher/claims/index.jsonl): 12 provenance-tracked claims with source URL, evidence strength, volatility, and last reviewed date.
Corpus index (researcher/corpus/index.json): canonical machine-readable map of skills, activation scenarios, mechanism IDs, and claim IDs.
Run state machine (researcher/runs/<run-id>/run-state.json): initialized -> retrieved -> evaluated -> proposed -> novelty_checked -> validated -> pr_ready -> closed.
Skill health gate (researcher/scripts/skill_health.py): deterministic body-quality scoring; current strict corpus score is 0.9117 with 0 flagged skills.
Operator commands
# Deterministic gates (also run in CI on every PR)
python3 researcher/scripts/validate_repo.py --strict
python3 researcher/scripts/skill_health.py --strict --no-history
python3 researcher/scripts/run_benchmarks.py
python3 researcher/scripts/check_activation_cases.py
# Per-run readiness (active runs only)
python3 researcher/scripts/validate_run.py --run-dir researcher/runs/<run-id>
# Continuous loop, manual
python3 researcher/scripts/loop_discover.py
python3 researcher/scripts/loop_step.py --allow-fetch
python3 researcher/scripts/loop_daily.py
python3 researcher/scripts/loop_status.py
# Continuous loop, daemon (macOS)
researcher/orchestration/launchd/install.sh # install launchd jobs (10-min step, 12h discover, daily ops)
researcher/orchestration/launchd/uninstall.sh # remove launchd jobs
See the template folder for the canonical skill structure.
Contributing
This repository follows the Agent Skills open development model. Contributions are welcome from the broader ecosystem. When contributing:
Follow the skill template structure
Provide clear, actionable instructions
Include working examples where appropriate
Document trade-offs and potential issues
Keep SKILL.md under 500 lines for optimal performance
Feel free to contact Muratcan Koylan for collaboration opportunities or any inquiries.
License
MIT License - see LICENSE file for details.
References
The principles in these skills are derived from research and production experience at leading AI labs and framework developers. Each skill includes references to the underlying research and case studies that inform its recommendations.