Skip to content

Instantly share code, notes, and snippets.

@dmahlow
Created June 4, 2026 12:38
Show Gist options
  • Select an option

  • Save dmahlow/71f7bee13d66670abd30e480d43482f7 to your computer and use it in GitHub Desktop.

Select an option

Save dmahlow/71f7bee13d66670abd30e480d43482f7 to your computer and use it in GitHub Desktop.
Vonovia Testfallkatalog KI-Hotline - Gap Analysis

Testfallkatalog KI-Hotline vs. Voicebot - Gap Analysis

Analysis of the Fachbereich's 35 test cases against the current voicebot codebase (2026-06-04).

Summary

Status Count %
Works 11 31%
Partially works 5 14%
Not implemented / out of scope 19 54%

The Testfallkatalog was written for the entire IVR system (traditional IVR + KI-Hotline + Hermes queue), not just the KI-Hotline voicebot. Of the 19 "not implemented" cases, ~13 belong to infrastructure layers outside the voicebot (DTMF, queue management, callback, after-hours routing, ANI lookup, recording notices).


WORKS (11 test cases)

ID Thema Why it works
TC-001 Standardbegruessung & Datenschutz _BRAND_GREETINGS in driver.py includes "Herzlich willkommen..." + Datenschutzhinweis + "www.vonovia.de/Datenschutz". Brand-specific (VNA/DW/BIT).
TC-002 Standardanliegen korrekt erkannt Core flow: legitimation -> klassifikation -> campaign node. 9 campaigns with detailed priority rules.
TC-005 Interessent von Mieter getrennt mark_interessent tool + is_interessent flag + dedicated INT routing in both legitimation and klassifikation nodes.
TC-007 Mimi-/Spezialanliegen (Mietminderung) MW node has explicit Mietminderung legal disclaimer. BEW Schlagwort list includes "Mietminderung". Klassifikation routes correctly.
TC-008 Nebenkostenanliegen erkannt NEKO campaign exists. Priority rules explicitly handle "Nebenkosten", "Betriebskosten", "Jahresabrechnung". Routes to line 606 (VNA).
TC-022 Heizungsausfall erkannt Heizung campaign exists. Priority rules separate Heizung from REP. Routes to line 611. Non-acute Heizungsausfall has special disclaimer.
TC-030 Energie / Spezialkanal Energie campaign with dedicated line mapping (604 for VNA). Classification rules cover Strom/Gas/Fernwaerme.
TC-031 Gueltige Service-PIN legitimiert verify_service_pin tool, sets auth_complete=True, verified_tenant_id. Single contract auto-selected, multi-contract asks for address.
TC-032 Service-PIN als Option angeboten Legitimation prompt starts with PIN request. Address-based fallback (verify_by_address) also available.
TC-033 Ungueltige PIN abgewiesen pin_failures incremented on failure, LLM asks to retry.
TC-034 Mehrfach falsche PIN -> Fallback pin_failures >= 3 triggers edge to offramp with auth_failed line. No infinite loop.

PARTIALLY WORKS (5 test cases)

ID Thema What works / what's missing
TC-003 Unverstaendliche Sprache LLM handles unclear input conversationally. Azure Content Filter fallback exists ("Entschuldigung, ich konnte Sie leider nicht richtig verstehen..."). But no explicit STT-confidence threshold or structured retry flow - depends on ElevenLabs STT quality.
TC-004 Fremdsprachige Eingabe The LLM may understand some English/Turkish, but there's no explicit language detection or defined fallback route. No "Bitte sprechen Sie Deutsch" prompt. Could route correctly or could misclassify.
TC-009 Abweisung per Infotext classify_kampagne("OTHER") + out_of_scope_flag works. Generic text: "Fuer dieses Anliegen bin ich leider nicht zustaendig..." But no per-topic-specific Infotext.
TC-025 Datenschutzhinweis vor Datenerhebung Greeting includes Datenschutz reference. But no separate notice before PIN collection specifically - the greeting covers it upfront. Depends on interpretation.
TC-035 Keine/abgebrochene PIN-Eingabe LLM handles conversational timeout ("Sind Sie noch da?"). But no explicit timeout_seconds countdown or structured fallback for abandoned input.

NOT IMPLEMENTED / OUT OF SCOPE (19 test cases)

DTMF / Tastatureingabe (2 cases)

ID Thema Status
TC-010 DTMF korrekt erkannt KI-Hotline is speech-only. No DTMF/keypress processing.
TC-011 Ungueltige DTMF behandelt Same.

Warteschlange / Queue Management (3 cases)

ID Thema Status
TC-012 Warteplatz zugewiesen Queue management lives in Hermes/ACD, not the voicebot. Voicebot does SIP REFER handoff.
TC-013 Volle Queue -> Fallback Hermes responsibility.
TC-014 Wartezeitansage Voicebot doesn't know queue depth.

Callback (3 cases)

ID Thema Status
TC-015 Callback-Angebot No callback feature implemented.
TC-016 Callback outside hours Same.
TC-017 Callback-Antwortlogik Same.

Mietvertragsnummer / ANI-based auth (4 cases)

ID Thema Status
TC-018 MV-Erkennung per Rufnummer Voicebot uses Service-PIN, not ANI/CLI-based tenant lookup. No phone-to-contract mapping.
TC-019 MV-Nummer manuell erfasst Uses PIN, not Mietvertragsnummer. Different auth paradigm.
TC-020 Ungueltige MV abgewiesen Same.
TC-021 Mehrfach falsche MV -> Fallback Same.

Nacht-/Notdienst (3 cases)

ID Thema Status
TC-027 Notdienstzeiten-Ansage After-hours routing happens before the voicebot (IVR/SBC layer). Voicebot only handles live calls during service hours.
TC-028 Notdienst-Routing Same. (Note: voicebot's escalate_notfall handles in-call emergencies during business hours, which is different.)
TC-029 Nicht-dringendes nachts != Notfall Same - not the voicebot's layer.

Wiederholer / Repeat Caller (1 case)

ID Thema Status
TC-006 Wiederholer-Erkennung No cross-session state. No CRM lookup. Voicebot doesn't know if someone called before.

Bandaufnahme / Recording (2 cases)

ID Thema Status
TC-023 Aufzeichnungshinweis wenn aktiv No recording notice feature. Legitimation prompt explicitly says "Erwaehne KEINE Gespraechsaufzeichnung."
TC-024 Kein Hinweis wenn deaktiviert Same - no toggle.

Warteschleife-Uebergang (1 case)

ID Thema Status
TC-026 Uebergang in Warteschleife Voicebot does handoff (SIP REFER), but doesn't manage queue transition. Queue vs. direct pickup is Hermes-side.

Genuinely missing features worth discussing with Fachbereich

  1. TC-006 Wiederholer - could be valuable but needs CRM/session persistence
  2. TC-018-021 Mietvertragsnummer - we use PIN instead; confirm whether PIN-based auth satisfies these or if MV-Nummer is additionally required
  3. TC-026 Warteschleife - clarify boundary between voicebot handoff and Hermes queue entry

Recommendation

Flag to the Fachbereich that the Testfallkatalog covers the full IVR stack. Ask them to mark which test cases they expect the KI-Hotline to own vs. Hermes/IVR infrastructure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment