gotchas-vs-rules-rerun
differences detectedControl
No changes (baseline)
Treatment
ProjectB change: Add "Common Failures" section to each SKILL.md body reframing key conventions as failure modes. Same treatment as original experiment.
Skills
4/10
Refs
3/10
Tools
2/10
Signals
20/26
Grading
Control
83.3%
40/48
Treatment
60.4%
29/48
Delta
-22.9%
10 prompts graded
Insights
Gotchas framing is definitively negative (-18.8%), confirmed across two runs. Original (-44.4% at 120s) was inflated by timeouts, but the real signal is still negative at 180s.
Treatment won on prompts where gotchas embedded the exact signal name. #5 (table: "FAILURE: Using plain table without data-slab-id") and #6 (animation: "FAILURE: Using raw CSS animations without data-zap") passed because the signal was literally in the SKILL.md body.
Treatment lost on dark mode (#2: 100%->60%). Control read more diverse references (6 vs 4). The verbose gotchas section consumed context space that could have been used for broader reference reading.
Gotchas framing still causes 3x more timeouts (3 vs 0 at 180s). The verbose "FAILURE: Using X without Y" format demonstrably slows Claude's reasoning loop.
Per-Prompt Results
#1 Add a two-column layout to the contact manager with a sidebar for filters
treatment timed out| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 48 | timed out | --- |
| Duration | 158.5s | --- | --- |
| Skills | html | --- | --- |
| Refs | html/form-patterns.md,best-practices.md | --- | --- |
| Tools | Agent(1), Bash(2), Edit(6), Read(13), Skill(1) | --- | --- |
| Signals | 3 | --- | --- |
Control signals: data-forge-id, flux-pod, forge-trigger
#2 Add a dark mode toggle button that switches between light and dark themes
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 43 | 37 | ≠ |
| Duration | 149.9s | 99.0s | ≠ |
| Skills | css, html, javascript | html, css | ≠ |
| Refs | css/theming.md; html/best-practices.md; javascript/event-handling.md | css/theming.md | ≠ |
| Tools | Agent(1), Bash(1), Edit(3), Glob(1), Read(9), Skill(3), Write(2) | Agent(1), Bash(3), Edit(3), Read(7), Skill(2), Write(1) | ≠ |
| Signals | 4 | 2 | ≠ |
Control signals: data-coat, zap(), on_x_y, --ink-
Treatment signals: data-coat, --ink-
#3 Add form validation to the contact form so empty fields show error messages
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 23 | 19 | ≠ |
| Duration | 62.5s | 63.8s | ≠ |
| Skills | html | html | = |
| Refs | html/form-patterns.md; javascript/best-practices.md; css/best-practices.md | html/form-patterns.md | ≠ |
| Tools | Edit(3), Glob(1), Read(6), Skill(1) | Edit(3), Glob(1), Read(4), Skill(1) | ≠ |
| Signals | 3 | 3 | = |
Control signals: flux-pod, data-forge-id, forge-trigger
Treatment signals: flux-pod, data-forge-id, forge-trigger
#4 Create a confirmation dialog that appears when the user clicks delete on a contact
treatment timed out| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 47 | timed out | --- |
| Duration | 150.5s | --- | --- |
| Skills | html, css, javascript | --- | --- |
| Refs | html/dialog-patterns.md; css/best-practices.md; javascript/event-handling.md | --- | --- |
| Tools | Agent(1), Bash(3), Edit(3), Glob(1), Read(9), Skill(3), Write(2) | --- | --- |
| Signals | 7 | --- | --- |
Control signals: forge-trigger, data-hatch-id, hatch-trigger, row-lever, zap(), on_x_y, hatch-body
#5 Add click-to-sort functionality to the contact table columns
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 35 | 41 | ≠ |
| Duration | 93.0s | 160.8s | ≠ |
| Skills | none | html, css, javascript | ≠ |
| Refs | none | html/table-patterns.md; javascript/event-handling.md; css/best-practices.md | ≠ |
| Tools | Agent(1), Bash(3), Edit(5), Read(7) | Agent(1), Bash(1), Edit(2), Glob(1), Read(9), Skill(3), Write(2) | ≠ |
| Signals | 0 | 6 | ≠ |
Treatment signals: data-rankable, row-lever, slab-hollow, zap(), on_x_y, data-slab-id
#6 Add a fade-in animation when new contacts appear in the table
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 13 | 19 | ≠ |
| Duration | 34.4s | 54.9s | ≠ |
| Skills | none | css | ≠ |
| Refs | none | css/animation-patterns.md | ≠ |
| Tools | Edit(3), Glob(1), Read(2) | Edit(3), Glob(1), Read(4), Skill(1) | ≠ |
| Signals | 0 | 2 | ≠ |
Treatment signals: data-zap, --pulse
#7 Add a search input that filters the contact table in real-time as the user types
treatment timed out| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 25 | timed out | --- |
| Duration | 77.1s | --- | --- |
| Skills | html | --- | --- |
| Refs | html/form-patterns.md,table-patterns.md | --- | --- |
| Tools | Edit(5), Glob(1), Read(5), Skill(1) | --- | --- |
| Signals | 4 | --- | --- |
Control signals: row-lever, slab-hollow, flux-pod, data-slab-id
#8 Fetch contacts from a /api/contacts endpoint and display them in the table on page load
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 17 | 17 | = |
| Duration | 24.8s | 19.4s | ≠ |
| Skills | none | none | = |
| Refs | none | none | = |
| Tools | Agent(1), Bash(2), Glob(1), Read(3) | Agent(1), Glob(3), Read(3) | ≠ |
| Signals | 0 | 0 | = |
#9 Add a comment at the top of each file explaining what it does
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 15 | 15 | = |
| Duration | 17.4s | 19.6s | ≠ |
| Skills | none | none | = |
| Refs | none | none | = |
| Tools | Edit(3), Glob(1), Read(3) | Edit(3), Glob(1), Read(3) | = |
| Signals | 0 | 0 | = |
#10 Rename the project title in index.html from Contact Manager to Address Book
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 9 | 9 | = |
| Duration | 24.3s | 23.4s | ≠ |
| Skills | none | none | = |
| Refs | none | none | = |
| Tools | Edit(1), Glob(1), Grep(1), Read(1) | Edit(1), Glob(1), Grep(1), Read(1) | = |
| Signals | 0 | 0 | = |
Totals
Control
Sessions
10
Prompts
10
Events
275
Skills: html, css, javascript
Tools: Agent(5), Bash(11), Edit(32), Glob(8), Grep(1), Read(58), Skill(9), Write(4)
Treatment
Sessions
7
Prompts
7
Events
157
Skills: html, css, javascript
Tools: Agent(3), Bash(4), Edit(15), Glob(8), Grep(1), Read(31), Skill(7), Write(3)
Verification Signals
| Signal | Control | Treatment | Proves |
|---|---|---|---|
| data-zap | ○ | ● | CSS animation-patterns |
| --pulse | ○ | ● | |
| data-coat | ● | ● | CSS theming |
| --ink- | ● | ● | |
| data-forge-id | ● | ● | HTML form-patterns |
| flux-pod | ● | ● | |
| forge-trigger | ● | ● | |
| data-hatch-id | ● | ○ | HTML dialog-patterns |
| hatch-trigger | ● | ○ | |
| hatch-body | ● | ○ | |
| data-slab-id | ● | ● | HTML table-patterns |
| data-rankable | ○ | ● | |
| row-lever | ● | ● | |
| slab-hollow | ● | ● | |
| zap() | ● | ● | JS event-handling |
| on_x_y | ● | ● |
Conclusion
skills differed in 6/10 prompts; subskill refs differed in 7/10 prompts; 6/26 verification signals differed