agent-architecture-analysis
Claude Code plugin for code review skills and verification workflows. Python, Go, React, FastAPI, BubbleTea, and AI frameworks (Pydantic AI, LangGraph, Vercel AI SDK).
npx skills add https://github.com/existential-birds/beagle --skill agent-architecture-analysisSKILL.md
12-Factor Agents Compliance Analysis
Reference: 12-Factor Agents
Input Parameters
| Parameter | Description | Required |
|---|---|---|
docs_path | Path to documentation directory (for existing analyses) | Optional |
codebase_path | Root path of the codebase to analyze | Required |
Analysis Framework
Factor 1: Natural Language to Tool Calls
Principle: Convert natural language inputs into structured, deterministic tool calls using schema-validated outputs.
Search Patterns:
# Look for Pydantic schemas
grep -r "class.*BaseModel" --include="*.py"
grep -r "TaskDAG\|TaskResponse\|ToolCall" --include="*.py"
# Look for JSON schema generation
grep -r "model_json_schema\|json_schema" --include="*.py"
# Look for structured output generation
grep -r "output_type\|response_model" --include="*.py"
File Patterns: **/agents/*.py, **/schemas/*.py, **/models/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | All LLM outputs use Pydantic/dataclass schemas with validators |
| Partial | Some outputs typed, but dict returns or unvalidated strings exist |
| Weak | LLM returns raw strings parsed manually or with regex |
Anti-patterns:
json.loads(llm_response)without schema validationoutput.split()or regex parsing of LLM responsesdict[str, Any]return types from agents- No validation between LLM output and handler execution
Factor 2: Own Your Prompts
Principle: Treat prompts as first-class code you control, version, and iterate on.
Search Patterns:
# Look for embedded prompts
grep -r "SYSTEM_PROMPT\|system_prompt" --include="*.py"
grep -r '""".*You are' --include="*.py"
# Look for template systems
grep -r "jinja\|Jinja\|render_template" --include="*.py"
find . -name "*.jinja2" -o -name "*.j2"
# Look for prompt directories
find . -type d -name "prompts"
File Patterns: **/prompts/**, **/templates/**, **/agents/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | Prompts in separate files, templated (Jinja2), versioned |
| Partial | Prompts as module constants, some parameterization |
| Weak | Prompts hardcoded inline in functions, f-strings only |
Anti-patterns:
f"You are a {role}..."inline in agent methods- Prompts mixed with business logic
- No way to iterate on prompts without code changes
- No prompt versioning or A/B testing capability
Factor 3: Own Your Context Window
Principle: Control how history, state, and tool results are formatted for the LLM.
Search Patterns:
# Look for context/message management
grep -r "AgentMessage\|ChatMessage\|messages" --include="*.py"
grep -r "context_window\|context_compiler" --include="*.py"
# Look for custom serialization
grep -r "to_xml\|to_context\|serialize" --include="*.py"
# Look for token management
grep -r "token_count\|max_tokens\|truncate" --include="*.py"
File Patterns: **/context/*.py, **/state/*.py, **/core/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | Custom context format, token optimization, typed events, compaction |
| Partial | Basic message history with some structure |
| Weak | Raw message accumulation, standard OpenAI format only |
Anti-patterns:
- Unbounded message accumulation
- Large artifacts embedded inline (diffs, files)
- No agent-specific context filtering
- Same context for all agent types
Factor 4: Tools Are Structured Outputs
Principle: Tools produce schema-validated JSON that triggers deterministic code, not magic function calls.
Search Patterns:
# Look for tool/response schemas
grep -r "class.*Response.*BaseModel" --include="*.py"
grep -r "ToolResult\|ToolOutput" --include="*.py"
# Look for deterministic handlers
grep -r "def handle_\|def execute_" --include="*.py"
# Look for validation layer
grep -r "model_validate\|parse_obj" --include="*.py"
File Patterns: **/tools/*.py, **/handlers/*.py, **/agents/*.py
Compliance Criteria:
| Level | Criteria |
|---|---|
| Strong | All tool outputs schema-validated, handlers type-safe |
| Partial | Most tools typed, some loose dict returns |
| Weak | Tools return arbitrary dicts, no validation layer |
Anti-patterns:
- Tool handlers that directly execute LLM output
eval()orexec()on LLM-generated code- No separation between decision (LLM) and execution (code)
- Magic method dispatch based on string matching
Factor 5: Unify Execution State
Principle: Merge execution state (step, retries) with business state (messages, results).
Search Patterns:
# Look for state models
grep -r "ExecutionState\|WorkflowState\|Thread" --include="*.py"
# Look for dual state systems
grep -r "checkpoint\|MemorySaver" --include="*.py"
grep
...