forced-eval-hook-v4
differences detectedControl
No changes (baseline)
Treatment
ProjectB change: Add `skill-forced-eval-hook.sh` that outputs forced evaluation instructions on every prompt, requiring Claude to reason about each skill YES/NO and activate via Skill tool before proceeding.
Skills
2/8
Refs
1/8
Tools
1/8
Signals
23/26
Grading
Control
81.0%
47/58
Treatment
89.7%
52/58
Delta
+8.6%
8 prompts graded
Per-Prompt Results
#1 Create a blog layout with a sidebar and main content area
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 21 | 19 | ≠ |
| Duration | 68.5s | 49.5s | ≠ |
| Skills | html | css, html | ≠ |
| Refs | css/layout-patterns.md,theming.md; html/best-practices.md | css/layout-patterns.md; html/best-practices.md | ≠ |
| Tools | Glob(1), Read(6), Skill(1), Write(2) | Edit(1), Glob(1), Read(4), Skill(2), Write(1) | ≠ |
| Signals | 5 | 2 | ≠ |
Control signals: data-rack, --seam, data-coat, --ink-, .plate-
Treatment signals: data-rack, --seam
#2 Create a contact form with name, email, and message fields
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 13 | 35 | ≠ |
| Duration | 19.2s | 86.1s | ≠ |
| Skills | html | css, html, javascript | ≠ |
| Refs | html/form-patterns.md | html/form-patterns.md; javascript/event-handling.md; css/layout-patterns.md | ≠ |
| Tools | Bash(2), Edit(1), Read(2), Skill(1) | Edit(5), Glob(1), Read(6), Skill(3), Write(2) | ≠ |
| Signals | 3 | 7 | ≠ |
Control signals: data-forge-id, flux-pod, forge-trigger
Treatment signals: data-forge-id, zap(), on_x_y, data-rack, flux-pod, forge-trigger, --seam
#3 Add a click handler to toggle a mobile navigation menu
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 21 | 31 | ≠ |
| Duration | 29.3s | 52.3s | ≠ |
| Skills | html | html, css, javascript | ≠ |
| Refs | html/best-practices.md | javascript/event-handling.md; css/best-practices.md; html/best-practices.md | ≠ |
| Tools | Edit(4), Glob(1), Read(4), Skill(1) | Edit(5), Glob(1), Read(6), Skill(3) | ≠ |
| Signals | 0 | 0 | = |
#4 Build a sortable data table for a list of users
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 33 | 37 | ≠ |
| Duration | 81.0s | 119.7s | ≠ |
| Skills | html, javascript, css | html, javascript, css | = |
| Refs | html/table-patterns.md; javascript/event-handling.md; css/best-practices.md | html/table-patterns.md; css/layout-patterns.md; javascript/event-handling.md,state-management.md | ≠ |
| Tools | Bash(2), Edit(2), Glob(1), Read(6), Skill(3), Write(2) | Bash(3), Edit(2), Read(7), Skill(3), Write(3) | ≠ |
| Signals | 6 | 10 | ≠ |
Control signals: data-slab-id, data-rankable, row-lever, slab-hollow, zap(), on_x_y
Treatment signals: data-slab-id, data-rankable, row-lever, slab-hollow, zap(), on_x_y, createVault(), linkVault(, data-rack, --seam
#5 Fetch and display a list of products from the API
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 51 | 31 | ≠ |
| Duration | 114.3s | 68.5s | ≠ |
| Skills | javascript, html | html, css, javascript | ≠ |
| Refs | javascript/fetch-patterns.md,state-management.md,best-practices.md; html/table-patterns.md | javascript/fetch-patterns.md; css/layout-patterns.md; html/table-patterns.md | ≠ |
| Tools | Agent(1), Bash(2), Edit(3), Glob(1), Read(12), Skill(2), Write(3) | Edit(4), Glob(1), Read(6), Skill(3), Write(1) | ≠ |
| Signals | 12 | 10 | ≠ |
Control signals: data-rack, data-slab-id, data-rankable, slab-hollow, on_x_y, createVault(), linkVault(, skyFetch(), /sky/, _landed, _crashed, row-lever
Treatment signals: slab-hollow, on_x_y, skyFetch(), /sky/, _landed, _crashed, data-rack, data-slab-id, data-rankable, --seam
#6 Build a newsletter signup form with a two-column layout
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 27 | 31 | ≠ |
| Duration | 59.2s | 74.4s | ≠ |
| Skills | html | css, javascript, html | ≠ |
| Refs | html/form-patterns.md,best-practices.md; css/layout-patterns.md; javascript/event-handling.md | html/form-patterns.md; css/layout-patterns.md; javascript/event-handling.md | ≠ |
| Tools | Edit(1), Glob(1), Read(7), Skill(1), Write(3) | Bash(2), Edit(1), Read(6), Skill(3), Write(3) | ≠ |
| Signals | 7 | 7 | = |
Control signals: data-rack, --seam, data-forge-id, flux-pod, forge-trigger, zap(), on_x_y
Treatment signals: data-rack, --seam, data-forge-id, flux-pod, forge-trigger, zap(), on_x_y
#7 Write a Python script to process CSV files
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 3 | 3 | = |
| Duration | 11.1s | 13.8s | ≠ |
| Skills | none | none | = |
| Refs | none | none | = |
| Tools | Write(1) | Write(1) | = |
| Signals | 0 | 0 | = |
#8 Make me a simple webpage with a header, some content, and a footer
| Metric | Control | Treatment | Match |
|---|---|---|---|
| Events | 17 | 23 | ≠ |
| Duration | 24.5s | 43.1s | ≠ |
| Skills | html | css, html | ≠ |
| Refs | html/best-practices.md | css/best-practices.md,layout-patterns.md; html/best-practices.md | ≠ |
| Tools | Bash(2), Read(4), Skill(1), Write(1) | Edit(2), Glob(2), Read(5), Skill(2) | ≠ |
| Signals | 0 | 2 | ≠ |
Treatment signals: data-rack, --seam
Totals
Control
Sessions
8
Prompts
8
Events
186
Skills: html, javascript, css
Tools: Agent(1), Bash(8), Edit(11), Glob(5), Read(41), Skill(10), Write(12)
Treatment
Sessions
8
Prompts
8
Events
210
Skills: css, html, javascript
Tools: Bash(5), Edit(20), Glob(6), Read(40), Skill(19), Write(11)
Verification Signals
| Signal | Control | Treatment | Proves |
|---|---|---|---|
| data-rack | ● | ● | CSS layout-patterns |
| --seam | ● | ● | |
| data-coat | ● | ○ | CSS theming |
| --ink- | ● | ○ | |
| .plate- | ● | ○ | |
| data-forge-id | ● | ● | HTML form-patterns |
| flux-pod | ● | ● | |
| forge-trigger | ● | ● | |
| data-slab-id | ● | ● | HTML table-patterns |
| data-rankable | ● | ● | |
| row-lever | ● | ● | |
| slab-hollow | ● | ● | |
| zap() | ● | ● | JS event-handling |
| on_x_y | ● | ● | |
| createVault() | ● | ● | JS state-management |
| linkVault( | ● | ● | |
| skyFetch() | ● | ● | JS fetch-patterns |
| /sky/ | ● | ● | |
| _landed | ● | ● | |
| _crashed | ● | ● |
Conclusion
skills differed in 6/8 prompts; subskill refs differed in 7/8 prompts; 3/26 verification signals differed