● 04 · LEHRTAFEL · RISIKO UND TRUST · MODULE · RISK AND TRUST

Wann du dem Agent
traust.

When to trust
the agent.

Agents machen Fehler. Manche kostenlos, manche teuer. Diese Tafel zeigt fünf typische Versagensmuster, drei Schichten Schutz, das Spektrum zwischen Vorschlag und Vollautomatik, und am Ende eine Checkliste mit zwölf Punkten, die du vor jedem produktiven Setup durchgehst.

Agents make mistakes. Some cheap, some costly. This module shows five typical failure patterns, three layers of protection, the spectrum between suggestion and full automation, and a twelve-point checklist you run through before any production setup.

Typische Failure-Modi Typical Failure Modes

Schichten an Guardrails Layers of Guardrails

Checks vor dem Produktivgang Checks Before Going Live

METHODE LESEN READ THE METHOD ┌──→ DIREKT ZUR CHECKLISTE JUMP TO CHECKLIST ┌──→

FILE · 04 / SERIE AI & AGENTIC EXPLAINED FILE · 04 / SERIES AI & AGENTIC EXPLAINED LESEZEIT · 13 MINUTEN READ TIME · 13 MINUTES VERSION · v.2026.05

● 05 · DIE FEHLERTYPEN · THE FAILURE TYPES

Wie Agents scheitern.

How agents fail.

Fast jedes Versagen lässt sich auf eine von fünf Mustern zurückführen. Wer diese fünf kennt, kann sie früh erkennen, einplanen und abfangen. Wer sie ignoriert, baut sich böse Überraschungen ein.

Nearly every failure traces back to one of five patterns. Those who know them can spot them early, plan for them, and intercept them. Those who ignore them build in nasty surprises.

01 · HALLUZINATION

Selbstbewusste Fiktion Confident Fiction

Erfindet Quellen, Zahlen, Zitate. Klingt richtig, ist es nicht. Häufigster Modus, schwer erkennbar in glatten Texten. Invents sources, numbers, quotes. Sounds correct, isn't. The most common mode, hard to spot in polished text.

02 · TOOL-MISUSE

Falsches Werkzeug Wrong Tool

Greift ein Tool, das sich gleich anhört, aber etwas anderes tut. Liest statt zu schreiben. Sendet statt zu speichern. Reaches for a tool that sounds similar but does something different. Reads instead of writing. Sends instead of saving.

03 · LOOP-FALLE

Endlose Korrektur Endless Correction

Korrigiert die Korrektur, dreht sich im Kreis. Frisst Tokens und Zeit, ohne näher ans Ziel zu kommen. Corrects the correction, spins in circles. Burns tokens and time without getting closer to the goal.

04 · GOAL DRIFT

Anderes Ziel verfolgt Wrong Objective

Optimiert das Falsche. Fokussiert auf einen Nebenaspekt und vergisst den eigentlichen Auftrag. Entsteht oft mit langer Kontexttiefe. Optimises the wrong thing. Fixates on a side detail and forgets the actual task. Often emerges with deep context.

05 · PROMPT INJECTION

Fremde Anweisung Foreign Instruction

Eine externe Quelle schmuggelt versteckte Anweisungen in den Kontext. Der Agent folgt ihnen, statt deinen Auftrag zu erfüllen. An external source smuggles hidden instructions into the context. The agent follows them instead of your task.

2 / 3

FAUSTREGEL · ROHE LÄUFE RULE OF THUMB · RAW RUNS

In zwei von drei produktiven Runs ohne Schutzschicht tritt mindestens einer dieser fünf Modi auf. Nicht jedes Mal teuer, aber häufig genug, dass keine seriöse Setup-Liste an den Modi vorbeikommt.

In two out of three production runs without a protection layer, at least one of these five modes appears. Not always costly, but often enough that no serious setup checklist can ignore them.

● 06 · DAS TRUST-SPEKTRUM · THE TRUST SPECTRUM

Fünf Stufen zwischen
Vorschlag und Vollautomatik.

Five levels between
suggestion and full automation.

Trust ist keine Ja-Nein-Entscheidung. Es ist ein Schieberegler. Für jeden Anwendungsfall sucht man die richtige Stufe: nicht zu eng, sonst fühlt sich der Agent wie ein langsamer Praktikant. Nicht zu offen, sonst fliegt einem etwas um die Ohren.

Trust is not a yes-or-no decision. It is a slider. For each use case you find the right level: not too tight, or the agent feels like a slow intern. Not too open, or something blows up in your face.

Suggest

↤ 0% AUTONOMIE ↤ 0% AUTONOMY

Approve

↤ 25%

Auto + Audit

↤ 50%

Auto + Sample

↤ 75%

Full Auto

100% ↦ 100% ↦

● MENSCH ENTSCHEIDET ● HUMAN DECIDES ● AGENT ENTSCHEIDET ● AGENT DECIDES

┌──→ FAUSTREGEL: ACTION-TOOLS BEGINNEN AUF L1 ODER L2. READ-TOOLS DÜRFEN HÖHER STARTEN. ┌──→ RULE OF THUMB: ACTION TOOLS START AT L1 OR L2. READ TOOLS MAY START HIGHER.

● 07 · DREI SCHICHTEN SCHUTZ · THREE LAYERS OF PROTECTION

Guardrails kommen geschichtet.

Guardrails come in layers.

Eine einzige Schutzschicht reicht nie. Der Stand der Praxis sind drei: vor dem Lauf, im Lauf, nach dem Lauf. Jede fängt andere Fehler ab, gemeinsam decken sie das meiste, was schiefgehen kann.

A single protection layer is never enough. The state of the art is three: before the run, during the run, after the run. Each catches different errors; together they cover most of what can go wrong.

SCHICHT LAYER	INPUT · VOR DEM LAUF INPUT · BEFORE THE RUN	RUNTIME · IM LAUF RUNTIME · DURING THE RUN	OUTPUT · NACH DEM LAUF OUTPUT · AFTER THE RUN
Schützt vor Protects from	Prompt Injection, Daten-Leaks Prompt Injection, data leaks	Loop-Falle, Goal Drift, Kostenexplosion Loop trap, Goal Drift, cost explosion	Halluzination, Tool-Misuse Hallucination, Tool-Misuse
Mechanismen Mechanisms	Sanitizer, Scope-Limits, PII-Filter Sanitizer, Scope Limits, PII filter	Step-Limit, Token-Budget, Kill-Switch Step limit, Token budget, Kill switch	Fact-Check, Diff-Vorschau, Approval Fact-check, Diff preview, Approval
Wer sieht es Who sees it	Niemand, läuft im Hintergrund Nobody, runs in background	Monitoring-Dashboard, Logs Monitoring dashboard, logs	Du, kurz vor Versand You, just before dispatch
Aufwand Effort	Einmalig konfigurieren Configure once	Pro Run-Typ ein Profil One profile per run type	Klick pro relevanter Aktion One click per relevant action
Was passiert bei Fehler On failure	Anfrage wird abgewiesen Request is rejected	Run wird abgebrochen Run is terminated	Du widerrufst, kein Schaden You revoke, no damage done

DEFENSE IN DEPTH

Eine Guardrail-Schicht ist eine Tür. Drei sind ein Schleusen-System. Wenn eine Schicht versagt, fängt die nächste den Schaden ab. Selbst hartnäckige Fehler kommen so selten bis ans Ziel.

One guardrail layer is a door. Three are an airlock system. When one layer fails, the next catches the damage. Even stubborn errors rarely make it through to the end.

● 08 · LIVE-DEMO · LIVE DEMO

Mit und ohne Schutz.
Im direkten Vergleich.

With and without protection.
Side by side.

Tippe einen Failure-Modus an. Links siehst du, was ohne Schutzschicht passiert. Rechts, was mit der passenden Guardrail anders läuft. Echte Patterns, vereinfacht zur Lesbarkeit.

Tap a failure mode. On the left you see what happens without a protection layer. On the right, what changes with the right guardrail in place. Real patterns, simplified for readability.

Auftrag: "Recherchiere Kunde XYZ und schick eine Outreach-Mail." Task: "Research client XYZ and send an outreach email."

FAILURE-SIMULATOR

SIMULATION · KEINE ECHTEN AKTIONEN. SIMULATION · NO REAL ACTIONS.

● 09 · HUMAN-IN-THE-LOOP

Wo der Mensch bleibt.

Where the human stays.

Vollautomatik ist selten der richtige Endzustand. In den meisten produktiven Setups bleibt der Mensch an vier Stellen im Loop: vor dem Start, an Schlüsselmomenten, am Ende, und wenn etwas Ungewöhnliches passiert. Vier Plätze, vier Aufgaben.

Full automation is rarely the right end state. In most production setups the human stays in the loop at four points: before the start, at key moments, at the end, and when something unusual happens. Four seats, four responsibilities.

1.0 · PRE-APPROVAL

Vor dem Start.

Before the start.

Du gibst Ziel, Budget, Schwellen. Der Agent fängt erst danach an. Verhindert, dass er etwas Falsches überhaupt erst angeht.

You set goal, budget, thresholds. The agent only starts after. Prevents it from going after the wrong thing in the first place.

2.0 · MID-FLIGHT GATE

An kritischen Stellen.

At critical moments.

Vor jeder Action-Tool-Ausführung hält er kurz an, zeigt was er tun will, wartet auf Ja oder Nein. Ein Klick, weiter geht es.

Before each action-tool execution it briefly pauses, shows what it intends to do, waits for yes or no. One click, it continues.

3.0 · POST-REVIEW

Am Ende des Runs.

At the end of the run.

Du prüfst, ob das Ergebnis stimmt, bevor es nach draußen geht. Bei reversiblen Aktionen reicht eine Stichprobe.

You verify the result is correct before it goes out. For reversible actions a spot check is sufficient.

4.0 · EXCEPTION ESCALATION

Wenn es ungewöhnlich wird.

When it gets unusual.

Bei unerwarteten Mustern, neuen Tool-Errors, hohen Kosten ruft der Agent dich aktiv. Standard ist Slack oder Email.

On unexpected patterns, new tool errors, high costs the agent actively calls you. Standard is Slack or email.

Wer alle vier Plätze ungenutzt lässt, hat Vollautomatik gebaut. Wer alle vier durchgehend besetzt, hat manuelle Arbeit mit AI-Anstrich. Die Kunst liegt darin, je nach Risiko nur die nötigen Plätze zu besetzen und die anderen offen zu lassen.

Leaving all four seats empty means you have built full automation. Keeping all four permanently staffed means manual work with an AI coat of paint. The art is filling only the seats the risk demands and leaving the rest open.

● 10 · CHECKLISTE · CHECKLIST

Zwölf Fragen vor dem
Produktivgang.

Twelve questions before
going live.

Klick die Punkte an, während du sie für dein Setup beantwortest. Wer einen Punkt nicht beantworten kann, ist nicht produktiv-bereit. Wer alle zwölf hat, hat ein robustes Setup.

Click the items as you answer them for your setup. Anyone who cannot answer a point is not production-ready. Anyone with all twelve has a robust setup.

Setup-Check · v.2026.05 Setup Check · v.2026.05

12 PUNKTE · KLICK ZUM ABHAKEN 12 ITEMS · CLICK TO CHECK OFF

01 · ZIEL 01 · GOAL

Was genau soll der Agent erledigen, in einem Satz? What exactly should the agent accomplish, in one sentence?

02 · NICHT-ZIELE 02 · NON-GOALS

Was darf er ausdrücklich nicht tun? What is it explicitly not allowed to do?

03 · TOOL-SCOPE

Welche Tools darf er nutzen, mit welchen Rechten? Which tools may it use, with what permissions?

04 · KOSTEN-BUDGET 04 · COST BUDGET

Was darf ein Run maximal kosten in Tokens und Euro? What can one run cost at most in tokens and dollars?

05 · STEP-LIMIT

Nach wie vielen Schritten bricht er hart ab? After how many steps does it hard-stop?

06 · APPROVAL-GATE

Bei welchen Aktionen wartet er auf dich? For which actions does it wait for you?

07 · KILL-SWITCH

Wer kann den Run jederzeit stoppen, und wie? Who can stop the run at any time, and how?

08 · AUDIT-TRAIL

Wo landen alle Tool-Calls und Entscheidungen? Where do all tool calls and decisions land?

09 · PII-FILTER

Welche sensiblen Daten darf er sehen, welche nicht? Which sensitive data may it see, which not?

10 · INJECTION-SCHUTZ 10 · INJECTION-GUARD

Wie filterst du externe Inhalte vor dem Modell? How do you filter external content before the model?

11 · ESKALATION 11 · ESCALATION

Wer wird benachrichtigt, wenn etwas Unerwartetes passiert? Who is notified when something unexpected happens?

12 · ERFOLGSMASS 12 · SUCCESS METRIC

Woran erkennst du, dass der Agent gut gearbeitet hat? How do you recognize that the agent did good work?

FORTSCHRITT · 0 / 12 PROGRESS · 0 / 12

● 11 · GLOSSAR · GLOSSARY

Sechs Begriffe
für die Sicherheit.

Six terms
for safety.

FAILURE MODE Ein typisches Versagensmuster eines Agents. Halluzination, Tool-Misuse, Loop-Falle, Goal Drift, Prompt Injection. Wer sie kennt, kann sie abfangen. A typical failure pattern of an agent. Hallucination, Tool-Misuse, Loop Trap, Goal Drift, Prompt Injection. Those who know them can intercept them.

GUARDRAIL Eine Schutzschicht. Eingebaut vor, im oder nach dem Lauf. Soll Schaden begrenzen, ohne den Agent zu erdrosseln. A protection layer. Built in before, during, or after the run. Meant to limit damage without strangling the agent.

HUMAN-IN-THE-LOOP Der Mensch bleibt an mindestens einer Stelle im Ablauf entscheidend. Vier Standard-Plätze: Pre-Approval, Mid-Flight, Post-Review, Eskalation. The human remains decisive at at least one point in the flow. Four standard seats: Pre-Approval, Mid-Flight, Post-Review, Escalation.

PROMPT INJECTION Versteckte Anweisung in einem Inhalt, die das Modell befolgt. Klassischer Angriff auf Agents mit Web- oder Email-Zugriff. A hidden instruction embedded in content that the model follows. Classic attack on agents with web or email access.

KILL-SWITCH Eine Funktion, die den Agent jederzeit hart stoppt. Sollte ein einziger Klick sein, immer erreichbar, ohne Login-Hürden. A function that hard-stops the agent at any time. Should be a single click, always reachable, without login barriers.

AUDIT-TRAIL Lückenlose Protokollierung aller Tool-Calls, Entscheidungen, Approvals. Forensisch lesbar. Aufbewahrt nach Compliance-Vorgaben. Complete logging of all tool calls, decisions, approvals. Forensically readable. Retained per compliance requirements.

● 12 · NÄCHSTE LEHRTAFEL · NEXT MODULE

Du kennst die Risiken.
Was kommt als Nächstes?

You know the risks.
What comes next?

Dein erster Agent Your first agent

Vom Anwendungsfall bis zum Prototyp. Welche Schritte, welche Werkzeuge, welche Stolperfallen.

From use case to prototype. Which steps, which tools, which pitfalls.

Agent-Teams Agent teams

Mehrere Agents, die zusammen arbeiten. Rollen, Übergaben, Orchestrierung. Wo das funktioniert und wo nicht.

Multiple agents working together. Roles, handoffs, orchestration. Where it works and where it doesn't.

Was kommt danach? What comes after?

Wohin sich Agentic AI bewegt. Welche Trends bleiben, welche verschwinden, was du jetzt schon einüben solltest.

Where Agentic AI is heading. Which trends stay, which disappear, what you should start practising now.

┌──→ ZURÜCK ZU 01 BACK TO 01 ┌──→ ZURÜCK ZU 02 BACK TO 02 ┌──→ ZURÜCK ZU 03 BACK TO 03 LEHRTAFEL 05 LESEN READ MODULE 05 ┌──→

Wann du dem Agenttraust.

When to trustthe agent.