ab-test-setup
from coreyhaines31/marketingskills
Marketing skills for Claude Code and AI agents. CRO, copywriting, SEO, analytics, and growth engineering.
npx skills add https://github.com/coreyhaines31/marketingskills --skill ab-test-setupSKILL.md
A/B Test Setup
You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.
Initial Assessment
Before designing a test, understand:
-
Test Context
- What are you trying to improve?
- What change are you considering?
- What made you want to test this?
-
Current State
- Baseline conversion rate?
- Current traffic volume?
- Any historical test data?
-
Constraints
- Technical implementation complexity?
- Timeline requirements?
- Tools available?
Core Principles
1. Start with a Hypothesis
- Not just "let's see what happens"
- Specific prediction of outcome
- Based on reasoning or data
2. Test One Thing
- Single variable per test
- Otherwise you don't know what worked
- Save MVT for later
3. Statistical Rigor
- Pre-determine sample size
- Don't peek and stop early
- Commit to the methodology
4. Measure What Matters
- Primary metric tied to business value
- Secondary metrics for context
- Guardrail metrics to prevent harm
Hypothesis Framework
Structure
Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].
Examples
Weak hypothesis: "Changing the button color might increase clicks."
Strong hypothesis: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start."
Good Hypotheses Include
- Observation: What prompted this idea
- Change: Specific modification
- Effect: Expected outcome and direction
- Audience: Who this applies to
- Metric: How you'll measure success
Test Types
A/B Test (Split Test)
- Two versions: Control (A) vs. Variant (B)
- Single change between versions
- Most common, easiest to analyze
A/B/n Test
- Multiple variants (A vs. B vs. C...)
- Requires more traffic
- Good for testing several options
Multivariate Test (MVT)
- Multiple changes in combinations
- Tests interactions between changes
- Requires significantly more traffic
- Complex analysis
Split URL Test
- Different URLs for variants
- Good for major page changes
- Easier implementation sometimes
Sample Size Calculation
Inputs Needed
- Baseline conversion rate: Your current rate
- Minimum detectable effect (MDE): Smallest change worth detecting
- Statistical significance level: Usually 95%
- Statistical power: Usually 80%
Quick Reference
| Baseline Rate | 10% Lift | 20% Lift | 50% Lift |
|---|---|---|---|
| 1% | 150k/variant | 39k/variant | 6k/variant |
| 3% | 47k/variant | 12k/variant | 2k/variant |
| 5% | 27k/variant | 7k/variant | 1.2k/variant |
| 10% | 12k/variant | 3k/variant | 550/variant |
Formula Resources
- Evan Miller's calculator: https://www.evanmiller.org/ab-testing/sample-size.html
- Optimizely's calculator: https://www.optimizely.com/sample-size-calculator/
Test Duration
Duration = Sample size needed per variant × Number of variants
───────────────────────────────────────────────────
Daily traffic to test page × Conversion rate
Minimum: 1-2 business cycles (usually 1-2 weeks) Maximum: Avoid running too long (novelty effects, external factors)
Metrics Selection
Primary Metric
- Single metric that matters most
- Directly tied to hypothesis
- What you'll use to call the test
Secondary Metrics
- Support primary metric interpretation
- Explain why/how the change worked
- Help understand user behavior
Guardrail Metrics
- Things that shouldn't get worse
- Revenue, retention, satisfaction
- Stop test if significantly negative
Metric Examples by Test Type
Homepage CTA test:
- Primary: CTA click-through rate
- Secondary: Time to click, scroll depth
- Guardrail: Bounce rate, downstream conversion
Pricing page test:
- Primary: Plan selection rate
- Secondary: Time on page, plan distribution
- Guardrail: Support tickets, refund rate
Signup flow test:
- Primary: Signup completion rate
- Secondary: Field-level completion, time to complete
- Guardrail: User activation rate (post-signup quality)
Designing Variants
Control (A)
- Current experience, unchanged
- Don't modify during test
Variant (B+)
Best practices:
- Single, meaningful change
- Bold enough to make a difference
- True to the hypothesis
What to vary:
Headlines/Copy:
- Message angle
- Value proposition
- Specificity level
- Tone/voice
Visual Design:
- Layout structure
- Color and contrast
- Image selection
- Visual hierarchy
CTA:
- Button copy
- Size/prominence
- Placement
- Number of CTAs
Content:
- Information included
- Order of information
- Amount of content
- Social p
...