apify-actor-development
from apify/agent-skills
Collection of Apify Agent Skills
npx skills add https://github.com/apify/agent-skills --skill apify-actor-developmentSKILL.md
Apify Actor Development
Important: Before you begin, fill in the generatedBy property in the meta section of .actor/actor.json. Replace it with the tool and model you're currently using, such as "Claude Code with Claude Sonnet 4.5". This helps Apify monitor and improve AGENTS.md for specific AI tools and models.
What are Apify Actors?
Actors are serverless programs inspired by the UNIX philosophy - programs that do one thing well and can be easily combined to build complex systems. They're packaged as Docker images and run in isolated containers in the cloud.
Core Concepts:
- Accept well-defined JSON input
- Perform isolated tasks (web scraping, automation, data processing)
- Produce structured JSON output to datasets and/or store data in key-value stores
- Can run from seconds to hours or even indefinitely
- Persist state and can be restarted
Prerequisites & Setup (MANDATORY)
Before creating or modifying actors, verify that apify CLI is installed apify --help.
If it is not installed, you can run:
curl -fsSL https://apify.com/install-cli.sh | bash
# Or (Mac): brew install apify-cli
# Or (Windows): irm https://apify.com/install-cli.ps1 | iex
# Or: npm install -g apify-cli
When the apify CLI is installed, check that it is logged in with:
apify info # Should return your username
If it is not logged in, check if the APIFY_TOKEN environment variable is defined (if not, ask the user to generate one on https://console.apify.com/settings/integrations and then define APIFY_TOKEN with it).
Then run:
apify login -t $APIFY_TOKEN
Template Selection
IMPORTANT: Before starting actor development, always ask the user which programming language they prefer:
- JavaScript
- TypeScript
- Python
Templates for each language are available in the references/ directory. Use the appropriate template based on the user's language choice. Additional packages (Crawlee, Playwright, etc.) can be installed later as needed.
Quick Start Workflow
- Use language template - Copy the appropriate template from
references/directory based on user's language preference - Install dependencies
- JavaScript/TypeScript:
npm install - Python:
pip install -r requirements.txt
- JavaScript/TypeScript:
- Implement logic - Write the actor code in
src/main.py,src/main.js, orsrc/main.ts - Configure schemas - Update input/output schemas in
.actor/input_schema.json,.actor/output_schema.json,.actor/dataset_schema.json - Configure platform settings - Update
.actor/actor.jsonwith actor metadata (see references/actor-json.md) - Write documentation - Create comprehensive README.md for the marketplace
- Test locally - Run
apify runto verify functionality (see Local Testing section below) - Deploy - Run
apify pushto deploy the actor on the Apify platform (actor name is defined in.actor/actor.json)
Best Practices
✓ Do:
- Use Apify SDK (
apify) for code running ON Apify platform - Validate input early with proper error handling and fail gracefully
- Use CheerioCrawler for static HTML (10x faster than browsers)
- Use PlaywrightCrawler only for JavaScript-heavy sites
- Use router pattern (createCheerioRouter/createPlaywrightRouter) for complex crawls
- Implement retry strategies with exponential backoff
- Use proper concurrency: HTTP (10-50), Browser (1-5)
- Set sensible defaults in
.actor/input_schema.json - Define output schema in
.actor/output_schema.json - Clean and validate data before pushing to dataset
- Use semantic CSS selectors with fallback strategies
- Respect robots.txt, ToS, and implement rate limiting
- Always use
apify/logpackage - censors sensitive data (API keys, tokens, credentials) - Implement readiness probe handler (required if your Actor uses standby mode)
✗ Don't:
- Rely on
Dataset.getInfo()for final counts on Cloud - Use browser crawlers when HTTP/Cheerio works
- Hard code values that should be in input schema or environment variables
- Skip input validation or error handling
- Overload servers - use appropriate concurrency and delays
- Scrape prohibited content or ignore Terms of Service
- Store personal/sensitive data unless explicitly permitted
- Use deprecated options like
requestHandlerTimeoutMillison CheerioCrawler (v3.x) - Use
additionalHttpHeaders- usepreNavigationHooksinstead - Disable standby mode without explicit permission
Logging
See references/logging.md for complete logging documentation including available log levels and best practices for JavaScript/TypeScript and Python.
Check usesStandbyMode in .actor/actor.json - only implement if set to true.
Commands
apify run # Run Actor locally
apify login # Authenticate account
apify push # Deploy to Apify platform (uses name from .actor/actor.json)
apify help # List all commands
L
...