The Test Healing Agent — Mukesh Rajput

A UI change ships — a button gets a new data-test attribute, a modal gets a new class name. Three tests start failing with "Element not found." The healing agent finds the broken locator, identifies the correct selector, updates the Java page object, verifies the fix with Maven, and raises a GitHub PR. Engineers review the diff. They don't do the detective work anymore.

The Problem

Locator failures are the most common, most tedious category of automation breakage. A UI change ships — a button gets a new data-test attribute, a modal gets a new class name, a form field gets a new id. Three tests start failing with Element not found.

A QA engineer then has to: find which page object has the stale locator, figure out the new selector by opening DevTools and inspecting the element in staging, update the Java file, re-run the test locally to verify the fix works, then commit and push. That's 30–45 minutes per locator — multiplied by however many broke in this deployment.

The healing agent does all of this autonomously. It reads the failing test, finds the page object, asks Claude to propose a fix based on the error message, applies it, verifies it with Maven against the actual test, and commits or rolls back depending on the result.

30–45 min

Manual fix time per broken locator

~3 min

Agent fix time per locator (end-to-end)

5 max

Fixes per session
AUTO_FIX_MAX_FIXES_PER_RUN

2 tries

Max attempts before skip
MAX_FIX_ATTEMPTS

How It Gets Triggered

The healing agent is triggered by the triaging agent. After the 5-step triaging pipeline finishes, any failures classified as AUTOMATION_ISSUE + HIGH confidence + ELEMENT_NOT_FOUND are written to a JSON handoff file at:

agents/test-healing-agent/queue/<build-tag>.json

The healing agent reads this file and processes each failure in order, up to AUTO_FIX_MAX_FIXES_PER_RUN=5 per session. This prevents runaway commits on large deployments where many locators broke at once.

The triage report that feeds the healing agent. Failures classified as AUTOMATION_ISSUE with HIGH confidence and ELEMENT_NOT_FOUND root cause are automatically queued for the healing agent.

The Fix Pipeline (Per Failure)

For each failure in the handoff file, the agent runs the following sequence. If any step fails, it skips to the next failure and records the reason.

Locate the source files code_analyzer.py

lib/code_analyzer.py takes the failing test class name and method name from the handoff file. It walks the Jarvis source tree to find:

The test class file — e.g. PaymentsWebTest.java
The page object file referenced in the failing method — e.g. PaymentsPage.java

This is where the broken locator lives. The analyzer uses static analysis — parsing the test method body to find which page object methods are called, then resolving those methods to their file.

Build the fix prompt

Claude Opus receives everything it needs to reason about the fix:

The failing test method code — so it knows what action was being performed
The full page object file — so it can see the stale locator in context with its surrounding methods
The actual error message: "Element not found: [data-test='submit-payment']"
The CONVENTIONS.md / CLAUDE.md framework rules — so it knows the correct Playwright selector patterns and naming conventions used in Jarvis

Apply the fix

Claude returns a corrected version of the page object file — not a diff, the full file with the fix applied. The agent writes it to disk, replacing the original. The original is backed up to a temp location before the write, so rollback is lossless.

Verify with Maven

The agent runs the exact failing test method to verify the fix:

mvn test -Dtest=PaymentsWebTest#checkoutFlow

It reads test-results/report.json written by Jarvis's JsonTestReporter.java to determine pass or fail — not just the Maven exit code, which can be misleading.

Commit or rollback

Two outcomes:

Test passes: git add PaymentsPage.java && git commit -m "fix(payments): update stale locator in PaymentsPage"
Test fails: The original file is restored from the backup. The failure output — the new error message from Maven — is injected into the next Claude prompt so it can try a different selector approach. Up to MAX_FIX_ATTEMPTS=2 retries per test.

Healing agent fix pipeline terminal output

The agent processing three broken locators in sequence — locating source files, extracting element names, calling Claude Opus for a fix proposal, applying it, and verifying with Maven. Each ✅ means the fixed test passed before the change was committed.

The GitHub PR

Once all fixable tests are committed, 02_ship.py packages everything into a PR that an engineer can review and merge without needing to understand what changed or why:

Create branch

A new branch is created: chore/qa-autofix/<build-tag>-<timestamp>. The branch name makes it immediately clear this is an automated fix and which build triggered it.

Push to GitHub

The branch is pushed to the Jarvis repository. All commits from the fix session are included — one commit per successfully fixed test.

gh pr create

The GitHub CLI creates the PR with a description listing every fix: which test, which file, what the old selector was, what the new one is, and whether it was verified by Maven. Tests that couldn't be fixed after 2 attempts are listed with the failure reason and marked NEEDS-REVIEW.

Slack notification

A message is posted to the QA channel: "Auto-fixed 3/4 locator failures in payments module — PR #47 ready for review". The PR link is included so an engineer can click straight to the diff.

The pull request raised automatically — showing the exact diff for each locator fix, the Maven verification result per test, and a summary comment from the agent. Engineers see what changed, why, and that it was verified before they review.

The Closed Loop

The full autonomous cycle

CI runs Jarvis tests

→

Jarvis writes results to MySQL

→

Triaging Agent classifies failures

→

Healing Agent auto-fixes ELEMENT_NOT_FOUND locators

→

PR raised on Jarvis repo

→

Engineer reviews and merges

→

Next CI run passes

This is the loop that closes. The engineer touches two points: the merge decision and the product bug investigation. Everything in between — triage, fix, verification, PR — is autonomous.

The cycle time from CI failure to mergeable PR is typically under 15 minutes for a batch of 3–5 locator failures. That same batch manually would take a QA engineer 2–3 hours — find the failures, inspect the selectors, write the fixes, verify locally, open PRs.

The complete session summary — Ship step pushes the branch, opens the PR, posts the Slack notification, and prints timing per step. The entire cycle from queue pickup to mergeable PR: 1 minute 27 seconds.

Limits and Guardrails

The healing agent is deliberately conservative. Every design decision prioritises engineer visibility and control over raw throughput.

MAX 5 FIXES AUTO_FIX_MAX_FIXES_PER_RUN — prevents runaway commits on large deployments. If 20 locators broke, the first 5 are fixed autonomously. The rest appear in the PR with a note that they need manual review.
MAX 2 TRIES MAX_FIX_ATTEMPTS — if two attempts fail, the test is skipped with both attempts' failure outputs included in the PR description. Engineers get full context to fix it themselves.
ELEM NOT FOUND ONLY Only ELEMENT_NOT_FOUND root cause qualifies for auto-healing. Timing and assertion failures require human judgement about whether to increase timeouts or revisit assertion logic.
HIGH CONFIDENCE ONLY Medium and low confidence classifications from the triaging agent do not qualify. If the triaging agent isn't sure, the healing agent doesn't attempt it.
PR ALWAYS RAISED Even if some fixes fail, a PR is created with what succeeded and detailed notes on what didn't. Engineers always have a single place to see the full picture of automated fix activity.
DRY RUN MODE AUTO_PUSH=false skips the PR and logs what would have been fixed. Useful for validating the agent's behaviour in a new environment before enabling live commits.

The principle behind the guardrails

An agent that fixes tests perfectly 90% of the time but sometimes commits wrong changes in a way engineers can't trace is worse than no agent at all. Every guardrail here is about maintaining trust: the PR always shows what changed and why, the agent never commits silently, and the conservative qualification criteria mean false positives are extremely rare.

Engineers who review these PRs quickly learn to trust them — because the track record is clean and the failure cases are always surfaced transparently.

Next: Talk to Tests

The loop is closed. Now explore how your whole team can query the entire QA knowledge base in plain English — test plans, specs, runbooks, and live TestRail data, answered in seconds.

Feature 5 — Talk to Tests Feature 3 — Test Triaging GitHub

#ClaudeOpus #SelfHealingTests #PageObjectModel #Maven #GitHubCLI #QAAutomation #AIAgents #Playwright