Diagnosing Agent Coordination Failures

A Vocabulary for What's Going Wrong

Your multi-agent system is failing. But how do you describe what's wrong?

"The agents are... not working together" doesn't help your debugging session. You need a diagnostic vocabulary—precise terms that map symptoms to root causes to solutions.

The Problem with Ad-Hoc Diagnosis

Most teams describe agent failures vaguely:

"It's stuck"
"The responses conflict"
"It's too slow"
"Something's wrong with the handoffs"

These descriptions don't point to solutions. They're symptoms, not diagnoses.

Musical Diagnostics

We developed a vocabulary borrowed from music—where coordinating 100+ performers in real-time has been solved for centuries.

Here are the core diagnostic terms:

Timing Failures

Symptom	Diagnosis	What's Happening
Deadlock / Hanging	Stuck Fermata	Agent is waiting for a signal that never comes
Skipping steps, hallucinating	Rushing	Agent proceeds without required context
Can't make decisions	Dragging	Over-deliberation, no urgency signal
Acts too early	False Entry	Missing barrier synchronization

Coherence Failures

Symptom	Diagnosis	What's Happening
Contradictory outputs	Harmonic Clash	No shared constraint on who "wins"
Repeating same error	Deaf Agent	Not receiving feedback from verifier
Context mismatch	Dissonance	Different agents have different reference frames
Wanders off-task	Improvisation Drift	Missing explicit constraints/boundaries

From Diagnosis to Fix

Each diagnosis maps to a pattern—a named solution you can implement.

Stuck Fermata → Escalation Pattern

Problem: Agent A sends a fermata (hold) to Agent B, but B never responds.

Fix: Add timeout with automatic escalation.

from hct_mcp_signals import fermata

hold = fermata(
    source="legal_agent",
    reason="Compliance review required",
    hold_type="human",
    timeout_ms=300000,  # 5 minutes
    on_timeout="escalate"  # Auto-route to manager
)

Harmonic Clash → Shared Progressions Pattern

Problem: Finance says "approve" while Legal says "reject".

Fix: Define a voice hierarchy in your reference frame.

# reference_frame.yaml
shared_progressions:
  compliance: "Legal has final voice"
  budget: "Finance has final voice"
  product: "Product has final voice"

Rushing → Tempo Control Pattern

Problem: Agent hallucinates because it feels "rushed".

Fix: Use explicit urgency signals and Chain-of-Verification.

# Instead of just sending content, send with tempo
signal = cue(
    source="orchestrator",
    targets=["analyst"],
    urgency=5,  # Not urgent - take your time
    tempo="andante"  # Walking pace
)

The Full Library

We've documented 15 diagnostic patterns covering:

5 Timing failures
5 Coherence failures
5 Resource failures

Each includes:

How to spot it in your logs
The root cause
Code to fix it

Explore the Patterns Library →

Naming Things Matters

The act of naming a problem is half the battle. When you can say "we have a Stuck Fermata problem," your team immediately knows:

What symptom to look for
What the root cause is
Which pattern to apply

That's the power of a shared diagnostic vocabulary.

Next time your agents misbehave, don't say "it's broken." Say "it's a Harmonic Clash" and reach for the pattern.