Install once.
Every AI tool gets CogOS.
Run cog init and CogOS registers as an MCP server. Your AI tool discovers CogOS and gets instant access to 55 domain expert modules. No API key needed — the AI tool IS the LLM, CogOS just makes it smarter.
MCP Integration — Automatic Setup
CogOS registers itself as an MCP (Model Context Protocol) server. This means your AI tool discovers CogOS automatically — no config files to edit, no instructions to paste, no existing files overwritten.
# Install and register — one command
$ pip install -e .
$ cog init
Registered CogOS MCP server with:
- claude (~/.claude.json)
- codex (~/.codex/config.toml)
- gemini (~/.gemini/settings.json)
CogOS is ready. Your AI tool will discover it automatically via MCP.After registration, your AI tool sees these CogOS tools:
| MCP Tool | What it does |
|---|---|
cog_run |
Returns relevant expert knowledge from matching modules. The AI uses this expertise to complete the task correctly. |
cog_chat |
Ask follow-up questions about domain topics |
cog_status |
Show active modules, tools, and provider info |
cog_modules |
List available domain modules and capabilities |
Supported AI tools
| Tool | How it discovers CogOS | Config location |
|---|---|---|
| Claude Code | Auto — registers in ~/.claude.json |
~/.claude.json |
| OpenAI Codex CLI | Auto — registers in ~/.codex/config.toml |
~/.codex/config.toml |
| Gemini CLI | Auto — registers in ~/.gemini/settings.json |
~/.gemini/settings.json |
| opencode | Auto — registers in config, reads provider | ~/.config/opencode/opencode.json |
| Cursor | Auto — creates .cursor/mcp.json |
.cursor/mcp.json |
| VS Code / Cline / Roo | Auto — creates .vscode/mcp.json |
.vscode/mcp.json |
| Goose | Auto — registers in config | ~/.config/goose/config.yaml |
Manual registration: Run cog register at any time to (re)register with all detected AI tools. Or add manually: command is python -m cog.mcp_server via stdio transport.
AGENTS.md — Auto-Instructions
When you run cog init, CogOS also writes a small instruction block to your project's AGENTS.md file. This tells AI tools (Claude Code, Codex, etc.) how to use CogOS without reading its source code.
How it works
- No AGENTS.md exists — CogOS creates one with just the CogOS instruction block
- AGENTS.md already exists — CogOS prepends its block above your existing content (nothing is overwritten)
- AGENTS.md already has CogOS block — CogOS skips it (idempotent — safe to run
cog initmultiple times)
# First run — creates AGENTS.md
$ cog init
Created AGENTS.md
# Second run — skips (already has CogOS block)
$ cog init
AGENTS.md already has CogOS section (skipped)
# With existing AGENTS.md that has other content
$ cog init
Prepended CogOS info to AGENTS.mdThe CogOS block is wrapped in <!-- cogos:start --> / <!-- cogos:end --> markers so it can be detected and updated independently. Your existing project instructions, framework configs, or any other content in AGENTS.md is never touched.
What the block contains
The prepended block tells AI tools three things:
- What CogOS is — a modular cognitive runtime with modules, tools, and multi-agent orchestration
- How to use it — call
cog_runinstead of reading CogOS source code - Available tools — a table of
cog_run,cog_chat,cog_status,cog_modules
Why this matters: Without AGENTS.md instructions, AI tools default to reading CogOS source code (~2500 lines) to understand how it works. The AGENTS.md block short-circuits that — they see the instructions and immediately call cog_run instead. Faster, cheaper, and more reliable.
How It Works — Zero Config
CogOS uses a simple but powerful architecture: the AI tool is the LLM. When Claude Code, Cursor, or any MCP-compatible AI calls cog_run(), CogOS finds relevant expert modules and returns their knowledge. The AI then uses that expertise to complete the task.
- You run
cog init— Registers CogOS as MCP server with your AI tools. Auto-detects your LLM provider from environment, opencode config, or Ollama. - Your AI calls
cog_run(task)— CogOS queries a chunk-level index built from all 7,275+ prompt extensions across 55 modules. - CogOS returns targeted expertise — The highest-scoring individual knowledge chunks are returned, capped to ~1,500 tokens, with session deduplication so the AI never receives the same knowledge twice.
- Your AI completes the task — Using the expert context, the AI executes with deep domain knowledge it didn't have before — without burning through your token budget.
No API key needed for MCP usage. The AI tool is already authenticated and running. CogOS enriches it with expertise. Provider config is only needed for standalone CLI usage.
Token Efficiency — Built In
CogOS holds 7,275+ prompt extensions across 55 modules. Without safeguards, a single cog_run call could push 10,000–15,000 tokens of expertise into your context window before your AI writes a single line of code. That would make CogOS a liability, not an asset.
Three mechanisms work together automatically to prevent this. No configuration is required — they are active from the moment you install CogOS.
1. Chunk-level indexing
The old approach returned every extension from the top-matching modules as a block dump. The current approach treats each of the 7,275+ extensions as an individually scored document. When your AI calls cog_run("build a REST API with FastAPI"), CogOS scores every extension in the library against that specific task and returns only the highest-relevance ones — regardless of which module they came from.
A typical call now delivers ~1,500 tokens of precisely targeted knowledge instead of a broad module dump. The AI gets what it actually needs, not everything that's loosely related.
2. Session deduplication
Every chunk of expertise returned to your AI is fingerprinted with a content hash. On every subsequent cog_run or cog_chat call in the same session, already-returned chunks are excluded from the results. The response includes a chunks_skipped_dedup field so your AI knows its prior context is still valid and can build on it.
This compounds throughout a session. By the third or fourth call on the same project, CogOS is only returning knowledge that is genuinely new — not re-injecting context the AI already has. The AI's context window grows in value without growing proportionally in size.
3. Character budget
Total expertise per call is capped at a configurable character limit. The default is 6,000 characters (~1,500 tokens). Chunks are collected greedily in relevance order — the most important content first — so the budget cut always removes the lowest-value tail, not random content.
The budget is tunable for projects that genuinely need richer context per call:
# cog.yaml
max_expertise_chars: 6000 # default — ~1,500 tokens per call
# Or via environment variable
export COG_MAX_EXPERTISE_CHARS=10000What the AI sees
Every cog_run response includes metadata so your AI can reason about what it received and what's been deduplicated:
{
"chunks_returned": 12,
"chunks_skipped_dedup": 8,
"total_chars": 5840,
"modules_contributing": [
{ "name": "cog-code-python", "chunks": 7 },
{ "name": "cog-infra-docker", "chunks": 5 }
],
"expertise": "..."
}When all relevant chunks for a task have already been returned this session, CogOS says so explicitly rather than returning an empty response — the AI knows to proceed using the expertise already in its context.
The goal: CogOS should make your AI measurably better at complex technical tasks — without meaningfully increasing your token bill. These three mechanisms are our commitment to that. If you observe unexpected token spikes, run cog status to see session chunk counts and current budget settings.
Host AI Mode — The Default
By default, CogOS uses host AI mode: the AI tool you are already running is the LLM. CogOS does not need its own model, API key, or separate configuration.
This is the setting you get out of the box. In cog.yaml it looks like this:
provider: host # use whatever AI is already running cogWhen your AI calls cog_run(), CogOS finds the relevant expert modules and returns their knowledge as context. Your AI — Claude Code, Cursor, Gemini CLI, opencode, or any other MCP-compatible tool — then uses that context to complete the task. No second API key. No second model. No extra cost.
You never need to touch cog.yaml for this to work. Host AI mode is automatic. The only reason to create a cog.yaml is to point specific internal agents at a cheaper model — which is entirely optional.
Per-Agent Model Configuration — Optional Cost Savings
CogOS has several internal agents that work on different parts of a task. Some of them — like the planner that breaks a task into steps, or the document writer that generates summaries — don't need your premium model. You can point them at a smaller, cheaper model while your main AI handles the actual execution.
Create a cog.yaml in your project root and add an agents: block:
# provider: host = use my current AI (the default)
provider: host
# Per-agent overrides — each role can use its own model
agents:
planner:
provider: openai
model: gpt-4o-mini # cheap model for task decomposition
api_key: YOUR_KEY_HERE
# base_url: optional, for custom / self-hosted endpoints
document_writer:
provider: openai
model: gpt-4o-mini # cheap model for doc generation
api_key: YOUR_KEY_HERE
# executor — writes and runs code
# executor:
# provider: openai
# model: gpt-4o
# api_key: YOUR_KEY_HERE
# researcher — web search and analysis
# researcher:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# coder — writes implementation code
# coder:
# provider: openai
# model: gpt-4o
# api_key: YOUR_KEY_HERE
# reviewer — reviews and critiques code
# reviewer:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# critic — finds flaws and improvements
# critic:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# tester — tests and validates output
# tester:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# documenter — writes inline documentation
# documenter:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# optimizer — optimizes performance
# optimizer:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# security — security analysis
# security:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# architect — system design decisions
# architect:
# provider: openai
# model: gpt-4o
# api_key: YOUR_KEY_HERE
memory_backend: sqlite
modules_path: modules
memory_path: cog_memory.db
log_level: INFO
max_agent_iterations: 20Any agent role you leave out inherits the global provider. With provider: host, that means your current AI handles it at no extra cost.
Known agent roles
| Role | What it does | Recommendation |
|---|---|---|
planner |
Decomposes tasks into ordered steps | Small model — short structured prompts |
document_writer |
Generates docs, summaries, and reports | Small model — templated output |
executor |
Writes and runs code, calls tools | Leave unset — use your best model (host AI) |
researcher |
Web search and multi-source analysis | Small model or leave unset |
coder |
Writes implementation code | Leave unset — use your best model |
reviewer |
Reviews and critiques code | Small model or leave unset |
critic |
Finds flaws and suggests improvements | Small model or leave unset |
tester |
Tests and validates output | Small model or leave unset |
documenter |
Writes inline documentation | Small model — templated output |
optimizer |
Optimizes performance | Small model or leave unset |
security |
Security analysis and review | Small model or leave unset |
architect |
System design decisions | Leave unset — use your best model |
Works with any OpenAI-compatible endpoint. Set base_url on any agent to point at DeepSeek, Zhipu/GLM, OpenRouter, Ollama, LM Studio, or any self-hosted model — mix and match per role.
Advanced: Standalone Provider Configuration
The following options are for CLI usage (cog run, cog chat) or for embedding CogOS as a library. For MCP usage — Claude Code, Cursor, Gemini CLI, etc. — host AI mode handles everything automatically.
There are four ways to configure a standalone provider:
- Pass-through — hand an existing provider object to CogOS
- String-based — provide model name + API key to the constructor
- Environment variables — CogOS reads standard env vars
- Config file — set
agents.executorincog.yaml
Key principle: CogOS never stores or transmits your API keys. They live in your environment, your config file (which is gitignored), or your host tool. CogOS just uses them to make LLM calls on your behalf.
Pass-Through Mode (Recommended)
When you're already using an AI tool (Claude Code, Codex CLI, Gemini CLI, etc.), that tool already has an LLM client configured. Instead of duplicating configuration, just pass the provider directly:
from cog import CogOS
from cog.providers.openai_provider import OpenAIProvider
# Your host tool already built this provider
provider = OpenAIProvider(model="gpt-4o", api_key="sk-...")
# Hand it to CogOS — no second config needed
cog = CogOS(provider=provider)
result = cog.run("Analyze this codebase for security issues")
print(result["output"])With pass-through mode:
- No API keys are duplicated or stored by CogOS
- The host tool's provider, model, and base_url are used directly
- Works with any provider that implements the
LLMProviderinterface - Zero additional configuration required
Ideal for: AI tools that embed CogOS as a library. The host tool manages the LLM connection; CogOS provides the cognitive runtime (modules, agents, tools, memory).
For MCP usage, pass-through happens automatically — the AI tool IS the LLM.
String-Based Initialization
For standalone scripts or quick prototyping, provide the model name and API key directly:
from cog import CogOS
# OpenAI
cog = CogOS(llm="gpt-4o", api_key="sk-...")
# Anthropic (auto-detected from model name)
cog = CogOS(llm="claude-sonnet-4-20250514", api_key="sk-ant-...")
# Any OpenAI-compatible endpoint
cog = CogOS(llm="glm-4", api_key="...", base_url="https://open.bigmodel.cn/api/paas/v4")The provider is auto-detected from the model name:
- Model contains
"claude"→ Anthropic provider - Everything else → OpenAI-compatible provider (works with any endpoint via
base_url)
Environment Variables
Set standard environment variables and CogOS will detect them automatically. This is the easiest way to configure CogOS without modifying code:
# CogOS-specific (highest priority)
export COG_PROVIDER=openai
export COG_MODEL=gpt-4o
export COG_API_KEY=sk-...
export COG_BASE_URL=https://api.openai.com/v1 # optional
# Or use standard provider env vars (auto-detected)
export OPENAI_API_KEY=sk-... # → provider=openai
export OPENAI_BASE_URL=https://... # → custom endpoint
export ANTHROPIC_API_KEY=sk-ant-... # → provider=anthropicUse CogOS.from_env() to create an instance from environment variables:
from cog import CogOS
cog = CogOS.from_env()
result = cog.run("Deploy this to AWS")Resolution order
When multiple sources are set, CogOS uses this priority (highest wins):
COG_*environment variablesOPENAI_*/ANTHROPIC_*environment variablescog.yamlconfig file (walks up from CWD)- Hardcoded defaults (all
None— no silent fallback)
Config File (cog.yaml)
For MCP usage (Claude Code, Cursor, Gemini CLI, etc.), no config file is needed at all. The AI tool you're already running IS the LLM — CogOS uses it automatically in host AI mode. cog.yaml is only needed if you want to assign specific agents to cheaper models, or for standalone CLI usage.
The default cog.yaml generated by cog init reflects the host AI default and the per-agent pattern:
# cog.yaml — never committed to git
# "host" means: use whatever AI is already running cog
provider: host
# Optional: point specific agents at cheaper models
agents:
planner:
provider: openai
model: gpt-4o-mini
api_key: YOUR_KEY_HERE
document_writer:
provider: openai
model: gpt-4o-mini
api_key: YOUR_KEY_HERE
# executor:
# provider: openai
# model: gpt-4o
# api_key: YOUR_KEY_HERE
# researcher:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# coder:
# provider: openai
# model: gpt-4o
# api_key: YOUR_KEY_HERE
# reviewer:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# critic:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# tester:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# documenter:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# optimizer:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# security:
# provider: openai
# model: gpt-4o-mini
# api_key: YOUR_KEY_HERE
# architect:
# provider: openai
# model: gpt-4o
# api_key: YOUR_KEY_HERE
memory_backend: sqlite
modules_path: modules
memory_path: cog_memory.db
log_level: INFO
max_agent_iterations: 20Generate one with cog init:
$ cog init
Created cog.yaml with host AI mode defaults.
Edit agents: to assign specific roles to cheaper models.Security: cog.yaml is listed in .gitignore by default. Never commit this file if it contains API keys. For team projects, use environment variables or a secrets manager instead.
CLI Usage
The CLI reads from the config file and environment variables automatically:
# Create config
$ cog init
# Run a task
$ cog run "Create a REST API with authentication"
# Override provider/model per command
$ cog run "Debug this function" --provider openai --model gpt-4o
# Interactive chat
$ cog chat
# Show status (provider, model, modules, tools)
$ cog statusBoth cog and cogos commands work identically.
Creating Custom Modules
Use cog create to scaffold a new module with the correct structure:
# Scaffold a new module
$ cog create infra-pulumi --description "Pulumi infrastructure as code"
Created modules/cog-infra-pulumi/
modules/cog-infra-pulumi/manifest.json
modules/cog-infra-pulumi/module.py
# Or use the full cog- prefix
$ cog create cog-lang-rust --description "Rust language module"This generates two files:
manifest.json — module metadata
{
"name": "cog-infra-pulumi",
"version": "0.1.0",
"description": "Pulumi infrastructure as code",
"capabilities": ["pulumi_operations"],
"requires": [],
"permissions": ["shell.execute"],
"entrypoint": "module.py"
}module.py — your tools, verifiers, and domain knowledge
The scaffolded module.py includes a working Tool, Verifier, and CogModule class with TODO comments. Each module can provide:
| Component | What it does | Method |
|---|---|---|
| Tools | Actions CogOS can take (run commands, call APIs, etc.) | register_tools() |
| Verifiers | Health checks (is the tool installed? are creds valid?) | register_verifiers() |
| Prompt extensions | Domain expertise injected into the LLM context | get_prompt_extensions() |
| Capabilities | Tags for task routing (e.g. s3_operations) |
get_capabilities() |
| Lifecycle hooks | Setup/teardown logic (on_load, pre_execute, etc.) |
Override on CogModule |
Module naming convention
Module names follow the pattern cog-{category}-{topic}:
# Category examples
cog-cloud-aws # cloud providers
cog-lang-rust # programming languages
cog-infra-docker # infrastructure / DevOps
cog-db-postgres # databases
cog-framework-nextjs # web frameworks
cog-testing-playwright # testing toolsSharing modules
Modules are just directories with manifest.json + module.py. To share:
- Git repo — push your module directory, users clone into their
modules/ - Pip package — wrap the module in a Python package with a
modules/entry point - Registry — publish to a CogOS module registry (coming soon)
Tip: The requires field in manifest.json lets you declare dependencies on other modules (e.g. "requires": ["tool-core"]). CogOS resolves activation order automatically.
Supported Providers
Two provider classes are included. The OpenAI provider works with any OpenAI-compatible endpoint via the base_url parameter:
| Provider | Class | Env vars | Notes |
|---|---|---|---|
| OpenAI | OpenAIProvider |
OPENAI_API_KEY |
GPT-4o, GPT-4, etc. |
| Anthropic | AnthropicProvider |
ANTHROPIC_API_KEY |
Claude Sonnet, Haiku, etc. |
| DeepSeek | OpenAIProvider |
COG_API_KEY + COG_BASE_URL |
OpenAI-compatible endpoint |
| Zhipu / GLM | OpenAIProvider |
COG_API_KEY + COG_BASE_URL |
OpenAI-compatible endpoint |
| OpenRouter | OpenAIProvider |
COG_API_KEY + COG_BASE_URL |
OpenAI-compatible endpoint |
| Local (Ollama, etc.) | OpenAIProvider |
COG_BASE_URL |
base_url=http://localhost:11434/v1 |
| Any custom | LLMProvider subclass |
Custom | Implement complete() and get_model_name() |
Integrating With AI Tools
CogOS is designed to be embedded by other AI tools. Here's how different tools would integrate:
Claude Code / Anthropic tools
from cog import CogOS
from cog.providers.anthropic_provider import AnthropicProvider
provider = AnthropicProvider(model="claude-sonnet-4-20250514", api_key="sk-ant-...")
cog = CogOS(provider=provider)OpenAI Codex / GPT tools
from cog import CogOS
from cog.providers.openai_provider import OpenAIProvider
provider = OpenAIProvider(model="gpt-4o", api_key="sk-...")
cog = CogOS(provider=provider)Custom / OpenAI-compatible endpoints
from cog import CogOS
from cog.providers.openai_provider import OpenAIProvider
# Works with DeepSeek, Zhipu, OpenRouter, Ollama, vLLM, etc.
provider = OpenAIProvider(
model="deepseek-chat",
api_key="...",
base_url="https://api.deepseek.com/v1",
)
cog = CogOS(provider=provider)Custom provider class
Implement the LLMProvider interface for any LLM not covered by the built-in providers:
from cog.providers.base import LLMProvider, LLMResponse, LLMMessage
class MyProvider(LLMProvider):
def complete(self, messages, tools=None, temperature=0.1,
max_tokens=4096, system=None) -> LLMResponse:
# Call your LLM here
...
def get_model_name(self) -> str:
return "my-custom-model"
cog = CogOS(provider=MyProvider())Summary: CogOS is provider-agnostic by design. Pass your existing provider in, and CogOS adds 55 domain modules, multi-agent orchestration, memory, caching, and 70 tools on top. No lock-in, no duplicated config.