agent-architecture-analysis

from existential-birds/beagle

Claude Code plugin for code review skills and verification workflows. Python, Go, React, FastAPI, BubbleTea, and AI frameworks (Pydantic AI, LangGraph, Vercel AI SDK).

15 stars3 forksUpdated Jan 24, 2026
npx skills add https://github.com/existential-birds/beagle --skill agent-architecture-analysis

SKILL.md

12-Factor Agents Compliance Analysis

Reference: 12-Factor Agents

Input Parameters

ParameterDescriptionRequired
docs_pathPath to documentation directory (for existing analyses)Optional
codebase_pathRoot path of the codebase to analyzeRequired

Analysis Framework

Factor 1: Natural Language to Tool Calls

Principle: Convert natural language inputs into structured, deterministic tool calls using schema-validated outputs.

Search Patterns:

# Look for Pydantic schemas
grep -r "class.*BaseModel" --include="*.py"
grep -r "TaskDAG\|TaskResponse\|ToolCall" --include="*.py"

# Look for JSON schema generation
grep -r "model_json_schema\|json_schema" --include="*.py"

# Look for structured output generation
grep -r "output_type\|response_model" --include="*.py"

File Patterns: **/agents/*.py, **/schemas/*.py, **/models/*.py

Compliance Criteria:

LevelCriteria
StrongAll LLM outputs use Pydantic/dataclass schemas with validators
PartialSome outputs typed, but dict returns or unvalidated strings exist
WeakLLM returns raw strings parsed manually or with regex

Anti-patterns:

  • json.loads(llm_response) without schema validation
  • output.split() or regex parsing of LLM responses
  • dict[str, Any] return types from agents
  • No validation between LLM output and handler execution

Factor 2: Own Your Prompts

Principle: Treat prompts as first-class code you control, version, and iterate on.

Search Patterns:

# Look for embedded prompts
grep -r "SYSTEM_PROMPT\|system_prompt" --include="*.py"
grep -r '""".*You are' --include="*.py"

# Look for template systems
grep -r "jinja\|Jinja\|render_template" --include="*.py"
find . -name "*.jinja2" -o -name "*.j2"

# Look for prompt directories
find . -type d -name "prompts"

File Patterns: **/prompts/**, **/templates/**, **/agents/*.py

Compliance Criteria:

LevelCriteria
StrongPrompts in separate files, templated (Jinja2), versioned
PartialPrompts as module constants, some parameterization
WeakPrompts hardcoded inline in functions, f-strings only

Anti-patterns:

  • f"You are a {role}..." inline in agent methods
  • Prompts mixed with business logic
  • No way to iterate on prompts without code changes
  • No prompt versioning or A/B testing capability

Factor 3: Own Your Context Window

Principle: Control how history, state, and tool results are formatted for the LLM.

Search Patterns:

# Look for context/message management
grep -r "AgentMessage\|ChatMessage\|messages" --include="*.py"
grep -r "context_window\|context_compiler" --include="*.py"

# Look for custom serialization
grep -r "to_xml\|to_context\|serialize" --include="*.py"

# Look for token management
grep -r "token_count\|max_tokens\|truncate" --include="*.py"

File Patterns: **/context/*.py, **/state/*.py, **/core/*.py

Compliance Criteria:

LevelCriteria
StrongCustom context format, token optimization, typed events, compaction
PartialBasic message history with some structure
WeakRaw message accumulation, standard OpenAI format only

Anti-patterns:

  • Unbounded message accumulation
  • Large artifacts embedded inline (diffs, files)
  • No agent-specific context filtering
  • Same context for all agent types

Factor 4: Tools Are Structured Outputs

Principle: Tools produce schema-validated JSON that triggers deterministic code, not magic function calls.

Search Patterns:

# Look for tool/response schemas
grep -r "class.*Response.*BaseModel" --include="*.py"
grep -r "ToolResult\|ToolOutput" --include="*.py"

# Look for deterministic handlers
grep -r "def handle_\|def execute_" --include="*.py"

# Look for validation layer
grep -r "model_validate\|parse_obj" --include="*.py"

File Patterns: **/tools/*.py, **/handlers/*.py, **/agents/*.py

Compliance Criteria:

LevelCriteria
StrongAll tool outputs schema-validated, handlers type-safe
PartialMost tools typed, some loose dict returns
WeakTools return arbitrary dicts, no validation layer

Anti-patterns:

  • Tool handlers that directly execute LLM output
  • eval() or exec() on LLM-generated code
  • No separation between decision (LLM) and execution (code)
  • Magic method dispatch based on string matching

Factor 5: Unify Execution State

Principle: Merge execution state (step, retries) with business state (messages, results).

Search Patterns:

# Look for state models
grep -r "ExecutionState\|WorkflowState\|Thread" --include="*.py"

# Look for dual state systems
grep -r "checkpoint\|MemorySaver" --include="*.py"
grep

...
Read full content

Repository Stats

Stars15
Forks3
LicenseMIT License