scorable-integration
from root-signals/scorable-skills
Skills for using the Scorable evaluation platform
npx skills add https://github.com/root-signals/scorable-skills --skill scorable-integrationSKILL.md
Scorable Integration
Guides integration of Scorable's LLM-as-a-Judge evaluation system into codebases with LLM interactions. Scorable creates custom evaluators (judges) that assess LLM outputs for quality, safety, and policy adherence.
Overview
Your role is to:
- Analyze the codebase to identify LLM interactions
- Create judges via Scorable API to evaluate those interactions
- Integrate judge execution into the code at appropriate points
- Provide usage documentation for the evaluation setup
Workflow
Step 1: Analyze the Application
Examine the codebase to understand:
- What LLM interactions exist (prompts, completions, agent calls)
- What the application does at each interaction point
- Which interactions are most critical to evaluate
If multiple LLM interactions exist, help the user prioritize. Recommend starting with the most critical one first.
Step 2: Get Scorable API Key
Ask the user if they want to:
- Create a temporary API key (for quick testing) - Warn that the judge will be public and visible to everyone
- Use an existing API key (for production)
Creating a Temporary API Key
curl --request POST \
--url https://api.scorable.ai/create-demo-user/ \
--header 'accept: application/json' \
--header 'content-type: application/json'
Response includes api_key field. Warn the user appropriately that:
- The judge will be public and visible to everyone
- The key only works for a limited time
- For private judges, they should create a permanent key at https://scorable.ai/register
Step 3: Generate a Judge
Call /v1/judges/generate/ with a detailed intent string describing what to evaluate.
Intent String Guidelines
- Describe the application context and what you're evaluating
- Mention the specific execution point (stage name)
- Include critical quality dimensions you care about
- Add examples, documentation links, or policies if relevant
- Be specific and detailed (multiple sentences/paragraphs are good)
- Code level details like frameworks, libraries, etc. do not need to be mentioned
Example Request
curl --request POST \
--url https://api.scorable.ai/v1/judges/generate/ \
--header 'accept: application/json' \
--header 'content-type: application/json' \
--header 'Authorization: Api-Key <SCORABLE_API_KEY>' \
--data '{
"visibility": "unlisted",
"intent": "An email automation system that creates summary emails using an LLM based on database query results and user input. Evaluate the LLM output for: accuracy in summarizing data, appropriate tone for the audience, inclusion of all key information from queries, proper formatting, and absence of hallucinations. The system is used for customer-facing communications.",
"generating_model_params": {
"temperature": 0.2,
"reasoning_effort": "medium"
}
}'
Note: This can take up to 2 minutes to complete.
Handling API Responses
1. missing_context_from_system_goal - Additional context needed:
{
"missing_context_from_system_goal": [
{
"form_field_name": "target_audience",
"form_field_description": "The intended audience for the content"
}
]
}
Ask the user for these details (if not evident from the codebase), then call /v1/judges/generate/ again with:
{
"judge_id": "existing-judge-id",
"stage": "Stage name",
"extra_contexts": {
"target_audience": "Enterprise customers"
}
}
2. multiple_stages - Judge detected multiple evaluation points:
{
"error_code": "multiple_stages",
"stages": ["Stage 1", "Stage 2", "Stage 3"]
}
Ask the user which stage to focus on, or if they have a custom stage name. Each judge evaluates one stage. You can create additional judges later for other stages.
3. Success - Judge created:
{
"judge_id": "abc123...",
"evaluator_details": [...]
}
Proceed to integration.
Step 4: Integrate Judge Execution
Add code to evaluate LLM outputs at the appropriate execution point(s).
Check if the codebase is using a framework with integration instructions in Scorable docs (use curl to fetch https://docs.scorable.ai/llms.txt if needed).
Language-Specific Integration
Choose the appropriate integration guide based on the codebase language:
- Python: See references/python.md for installation, sync/async usage, multi-turn conversations, and common patterns
- TypeScript/JavaScript: See references/typescript.md for npm installation and usage examples
- Other languages: See references/other-languages.md for REST API integration via cURL template
Integration Points
- Insert evaluation code where LLM outputs are generated (e.g., after OpenAI response calls)
responseparameter: The text you want to evaluaterequestparameter: The input that prompted the responsemessagesparameter: The multi-turn
...