AI Research & Engineering · Multi-Agent Coordination

Diagnosing Agent Coordination Failures

A Vocabulary for What's Going Wrong
Diagnostic Patterns

Your multi-agent system is failing. But how do you describe what's wrong?

"The agents are... not working together" doesn't help your debugging session. You need a diagnostic vocabulary—precise terms that map symptoms to root causes to solutions.

The Problem with Ad-Hoc Diagnosis

Most teams describe agent failures vaguely:

  • "It's stuck"
  • "The responses conflict"
  • "It's too slow"
  • "Something's wrong with the handoffs"

These descriptions don't point to solutions. They're symptoms, not diagnoses.

Musical Diagnostics

We developed a vocabulary borrowed from music—where coordinating 100+ performers in real-time has been solved for centuries.

Here are the core diagnostic terms:

Timing Failures

SymptomDiagnosisWhat's Happening
Deadlock / HangingStuck FermataAgent is waiting for a signal that never comes
Skipping steps, hallucinatingRushingAgent proceeds without required context
Can't make decisionsDraggingOver-deliberation, no urgency signal
Acts too earlyFalse EntryMissing barrier synchronization

Coherence Failures

SymptomDiagnosisWhat's Happening
Contradictory outputsHarmonic ClashNo shared constraint on who "wins"
Repeating same errorDeaf AgentNot receiving feedback from verifier
Context mismatchDissonanceDifferent agents have different reference frames
Wanders off-taskImprovisation DriftMissing explicit constraints/boundaries

From Diagnosis to Fix

Each diagnosis maps to a pattern—a named solution you can implement.

Stuck Fermata → Escalation Pattern

Problem: Agent A sends a fermata (hold) to Agent B, but B never responds.

Fix: Add timeout with automatic escalation.

from hct_mcp_signals import fermata

hold = fermata(
    source="legal_agent",
    reason="Compliance review required",
    hold_type="human",
    timeout_ms=300000,  # 5 minutes
    on_timeout="escalate"  # Auto-route to manager
)

Harmonic Clash → Shared Progressions Pattern

Problem: Finance says "approve" while Legal says "reject".

Fix: Define a voice hierarchy in your reference frame.

# reference_frame.yaml
shared_progressions:
  compliance: "Legal has final voice"
  budget: "Finance has final voice"
  product: "Product has final voice"

Rushing → Tempo Control Pattern

Problem: Agent hallucinates because it feels "rushed".

Fix: Use explicit urgency signals and Chain-of-Verification.

# Instead of just sending content, send with tempo
signal = cue(
    source="orchestrator",
    targets=["analyst"],
    urgency=5,  # Not urgent - take your time
    tempo="andante"  # Walking pace
)

The Full Library

We've documented 15 diagnostic patterns covering:

  • 5 Timing failures
  • 5 Coherence failures
  • 5 Resource failures

Each includes:

  • How to spot it in your logs
  • The root cause
  • Code to fix it

Explore the Patterns Library →

Naming Things Matters

The act of naming a problem is half the battle. When you can say "we have a Stuck Fermata problem," your team immediately knows:

  1. What symptom to look for
  2. What the root cause is
  3. Which pattern to apply

That's the power of a shared diagnostic vocabulary.


Next time your agents misbehave, don't say "it's broken." Say "it's a Harmonic Clash" and reach for the pattern.