Triage CI Failure

Overview

Systematic workflow for triaging and fixing test failures in CI, especially flaky tests that pass locally but fail in CI. Tests that made it to main are usually flaky due to timing, bundling, or environment differences.

CRITICAL RULE: You MUST run the reproduction workflow before proposing any fixes. No exceptions.

When to Use

CI test fails on main branch after PR was merged
Test passes locally but fails in CI
Test failure labeled as "flaky" or intermittent
E2E or integration test timing out in CI only

MANDATORY First Steps

YOU MUST EXECUTE THESE COMMANDS. Reading code or analyzing logs does NOT count as reproduction.

Extract suite name, test name, and error from CI logs
EXECUTE: Kill port 3000 to avoid conflicts
EXECUTE: pnpm dev $SUITE_NAME (use run_in_background=true)
EXECUTE: Wait for server to be ready (check with curl or sleep)
EXECUTE: Run the specific failing test with Playwright directly (npx playwright test test/TEST_SUITE_NAME/e2e.spec.ts:31:3 --headed -g "TEST_DESCRIPTION_TARGET_GOES_HERE")
If test passes, EXECUTE: pnpm prepare-run-test-against-prod

Loading…

Triage CI Failure

Overview

CRITICAL RULE: You MUST run the reproduction workflow before proposing any fixes. No exceptions.

When to Use

CI test fails on main branch after PR was merged
Test passes locally but fails in CI
Test failure labeled as "flaky" or intermittent
E2E or integration test timing out in CI only

MANDATORY First Steps

YOU MUST EXECUTE THESE COMMANDS. Reading code or analyzing logs does NOT count as reproduction.

Extract suite name, test name, and error from CI logs
EXECUTE: Kill port 3000 to avoid conflicts
EXECUTE: pnpm dev $SUITE_NAME (use run_in_background=true)
EXECUTE: Wait for server to be ready (check with curl or sleep)
EXECUTE: Run the specific failing test with Playwright directly (npx playwright test test/TEST_SUITE_NAME/e2e.spec.ts:31:3 --headed -g "TEST_DESCRIPTION_TARGET_GOES_HERE")
If test passes, EXECUTE: pnpm prepare-run-test-against-prod

Rationalization	Reality
"The logs show the exact error"	Logs show symptoms, not root cause. Reproduce.
"I can see the problem in the code"	You're guessing. Reproduce to confirm.
"This is obviously a race condition"	Maybe. Reproduce to be sure.
"I've seen this error before"	This might be different. Reproduce.
"The stack trace is clear"	Stack trace shows where, not why. Reproduce.
"Time pressure - need to fix fast"	Reproducing IS fast. Guessing wastes time.
"The test file shows the issue"	Reading ≠ running. Execute the commands.
"I'll analyze the code first"	Code analysis comes AFTER reproduction.
"Let me investigate the root cause"	Reproduction IS the investigation.
"I need to understand the error"	Understanding comes from seeing it fail.

Mistake	Fix
Running full test suite first	Run specific test by name
Skipping dev code reproduction	Always try dev code first
Not testing with bundled code	If dev passes, test with `prepare-run-test-against-prod`
Proposing fix without reproducing	Follow the workflow - reproduce first
Using `networkidle` in new code	Use condition-based waiting with `waitForFunction()`
Adding arbitrary `wait()` calls	Use Playwright's built-in assertions and waits

triage-ci-flake

Triage CI Failure

Overview

When to Use

MANDATORY First Steps

Related Skills

flow

verify

feature-flags

flags

Triage CI Failure

Overview

When to Use

MANDATORY First Steps

Core Workflow

Step-by-Step Process

1. Extract CI Details

2. Reproduce with Dev Code

3. Reproduce with Bundled Code

4. Unable to Reproduce

Common Flaky Test Patterns

Race Conditions

Test Pollution

Timing Issues

Linting Considerations

Verification

The Iron Law

Rationalization Table

Red Flags - STOP

Example Session

Common Mistakes

Key Principles

Completion: Creating a PR

Related Skills

flow

verify

feature-flags

flags