EU retail bankingGuarded agentCloud + on-prem

Trusted agent platform

Tessera

A guarded agentic AI system for regulated banking support

Built with LangGraph, retrieval, guarded tool execution, audit envelopes and a multilingual regression tests across French, German and English

Watch 2-min demoVideo walkthrough Try agent UIOpen interactive interface

View on GitHub Read Architecture Safety Notes

LanguagesFR / DE / EN

Eval set40 failure cases

DomainEU banking

Grounding4 corpora

Demo and agent UI

Start with the guided walkthrough, then open the interface to chat, inspect live traces, guarded execution and escalation flows in action

2:00Watch demo videoA guided fraud-support scenario showing guard decisions, audit evidence and escalationPlay video

tessera.ui/session/urgent-card

ChatguardedAuditliveEval40 cases

Customer

Ma carte a ete volee. Peux-tu verifier mon compte ?

Tessera

I can help, but account lookup is blocked until identity is verified. I am escalating this with redacted evidence

Live guard trace

Reviewer-ready evidence

SAFE HANDOFF

classifyurgent_card
guardidentity_required
decisionblock account_lookup
handoffhuman_escalation

FRDEEN

Try agent UI

Open the interactive interface to explore live traces and guarded execution

Open interface

Before action

Runtime

Tool arguments checked before execution

Replayable

Regression

Failure catalogue lives as JSON test cases

Structured

Audit

Every guard decision is inspectable

Explicit

Escalation

Low confidence becomes a handoff

Scope

Not a new firewall, not a new benchmark

A concrete assembly for one regulated European banking support workflow

Languages

French, German, English

Failure cases

CI-gated non-regression set

Quality layers

Guard, eval, audit, escalation

Regulatory corpora

DORA, CNIL, BaFin, GDPR

What it proves

A bank-support agent that can be inspected, tested, and safely handed off

Tessera shows that guarded tool calls, multilingual grounding, audit evidence, and escalation paths can work together in one concrete banking support flow

Multilingual support agent for Crédit Aurore

Guarded function calls for sensitive banking actions

Traceable audit evidence for reviewer and operator workflows

Non-regression catalogue covering known agent failure patterns

mcp-firewall

Runtime guard

Sensitive tool calls are checked before execution with allow, deny, transform, and redaction decisions

Regression suite

Offline tests

Documented failure cases are replayed against the agent graph so safety behavior can regress loudly

JSON evidence

Structured audit

Each guard decision records the policy rule, rationale, target tool, redacted arguments, and operational timestamps

Reviewer node

Human escalation

Low-confidence or high-stakes turns route to escalation instead of pretending the answer is certain

Multilingual operating surface

Same banking product, three language and regulator contexts

Each language path shows how Tessera routes the request, applies the relevant regulator context, checks the guard, and keeps the customer reply controlled

France

FRCNIL + GDPR

Je veux contester un paiement carte visible depuis hier

route: transaction_search -> reviewer
guard: verify identity first
reply: French, bounded, cited

Germany

DEBaFin + DORA

Meine Karte wurde gestohlen. Kannst du mein Konto pruefen?

route: urgent_card -> escalation
guard: block account lookup
reply: German, no disclosure

Cross-border EU

ENGDPR + DORA

Can you explain why my loan simulation changed?

route: loan_simulate -> audit
guard: redact sensitive fields
reply: English, grounded

Regression scorecard

The safety claim is backed by replayable failures

Tessera treats known agent failures as JSON test cases. Each case declares the failure pattern, the check that must pass, and the expected bounded behavior before the demo can be trusted

Catalogue

40 cases

Locales

FR / DE / EN

Gate

eval.yml

Output

JSON + markdown

CaseFailureCheckExpectedStatus

#01

Prompt injection

forbidden phrase + tool boundary

refuse or transform

guarded

#02

PII leakage

redaction + disclosure limit

redact evidence

redacted

#03

Citation hallucination

grounded source required

cite corpus only

grounded

#04

Overconfident action

must_escalate path

human review

escalated

Scenario spotlight

The difference is visible when the request is risky

An unguided agent might answer with whatever tool result is easiest to fetch. Tessera treats the banking situation as the product surface: urgency, ownership, disclosure risk, and escalation all change the route before a tool is allowed to run

Recognizes stolen-card urgency before ordinary account lookup

Avoids exposing account data while fraud context is unresolved

Escalates to a human path with structured evidence attached

Customer turn

French support case

Ma carte a ete volee et je vois des paiements que je n'ai pas faits. Pouvez-vous verifier mon compte maintenant ?

The request combines urgency and fraud. The account lookup path is no longer the right default

Route contrast

Same customer message, different operating discipline

Looks helpful, leaks context

Naive agent route

1Fetch account data
2Show suspicious payments
3Try to block the card late

The assistant optimizes for an answer before ownership, urgency, and disclosure risk are resolved

Guarded, auditable handoff

Tessera route

1Classify fraud urgency
2Block unsafe lookup path
3Escalate with redacted evidence

The system keeps the user moving toward help while preserving a reviewer-ready trail

Controlled execution path

Every useful action leaves a trail

Each customer turn moves through routing, retrieval, guard checks, audit emission, and a final response or escalation path

Route

Classify language, intent, and urgency before planning

Retrieve

Pull product and regulation context from pgvector

Guard

Check sensitive tool calls through policy before execution

Audit

Emit structured evidence for reviewer and operator views

Respond

Answer, decline, or escalate with confidence signals

Delivery assurance

Trust comes from the delivery chain, not from a polished interface

The strongest signal is not a single UI screen. It is the chain from local quality gates to deployed infrastructure, with an on-prem path when regulation changes the deployment boundary

Build gate

ruff, mypy, pytest

The system is framed as software that must keep compiling, typing, and replaying

Safety gate

40 failure replays

Known agent failures are catalogued as JSON test cases instead of left as anecdotes

Hosted path

Cloud Run + Cloud SQL

The frontier path is deployable with managed runtime, secrets, logs, and monitoring

Local path

Ollama on Apple Silicon

The on-prem mode keeps the banking story credible when data cannot leave the perimeter

Try the system

A dedicated slot for the public agent UI

Visitors should be able to leave the case study and test Tessera for themselves. This block will point to the deployed dashboard as soon as the public URL is available

Open agent UIChat · audit · eval

tessera.ui/session/urgent-card

ChatguardedAuditliveEval40 cases

Customer

Ma carte a ete volee. Peux-tu verifier mon compte ?

Tessera

I can help, but account lookup is blocked until identity is verified. I am escalating this with redacted evidence

Live guard trace

Reviewer-ready evidence

SAFE HANDOFF

classifyurgent_card
guardidentity_required
decisionblock account_lookup
handoffhuman_escalation

FRDEEN

Chat workbench

Test the multilingual banking-support flow with guarded tool calls

Audit trail

Inspect policy decisions, redactions, policy rules, and reviewer evidence

Eval scorecard

Replay known failure cases and see what still needs work

Operating evidence

Architecture and evaluation flow

The diagrams show how Tessera is assembled: agent orchestration, guarded tools, audit evidence, cloud and on-prem paths, and evaluation checks

Stack snapshot

Agent

LangGraphFastAPIPython 3.12uv

Retrieval

PostgreSQLpgvectorHybrid searchRegulatory corpora

LLM paths

Vertex AICloud RunOllamaLlama 3.3 70B

Quality

mcp-firewallpytestmypy strictruff

Operating model

From customer signal to reviewer-ready evidence

Tessera treats banking support as a controlled workflow: language, intent, policy, tool permission, audit evidence, and escalation remain visible from the first message to the final handoff

Most agent systems

Unguarded tool calls
No replay evidence
Manual-only validation
English-only compliance

What Tessera makes visible

mcp-firewall + YAML policy
Structured JSON audit trail
40 failure cases, CI-gated
FR / DE / EN regulatory routing

Architecture summary

Router, planner, guarded tools and reviewer remain explicit

RouterPlannerWorkersReviewer

Graph orchestration

Router, planner, reviewer and workers are separated so useful work and controlled action remain distinct

Guarded tool boundary

Account lookup, card blocking and transaction search stay behind policy checks and auditable decisions

Cloud and on-prem paths

Vertex AI and Cloud Run cover the frontier path; Ollama and Llama 3.3 70B keep an on-prem option explicit

Shipped

validated

LangGraph agent, FR / DE / EN prompts, audit trail, guard adapter, JSON eval test cases

Hardening

Public demo URL, mcp-firewall upstream contribution, German escalation calibration

Open-source & transparent

Inspect the assembly, not a black box

Tessera stays honest about what it contributes: an end-to-end regulated assembly with reusable dependencies, visible guardrails and documented failures

View on GitHub Read Architecture Safety Notes

Related Projects

AI Skin Cancer Detection Challenge

Detects skin cancer from 3D total-body images with a high-sensitivity model

TensorFlowCNNpAUCMedical AI

March 24, 2025

Data ScienceDeep Learning

Music Recommendation System

Ranks songs from listening behavior to generate personalized recommendations

SurpriseScikit-learnLightFMRanking

June 30, 2024

Data ScienceMachine Learning

Open channels

Follow the work

AI engineering, data platforms and applied machine learning, shared through practical case studies and shipped systems

Privacy-friendly analytics may be used to understand aggregate visits and improve the site experience.

Tessera

Reviewer-ready evidence

Try agent UI

Runtime

Regression

Audit

Escalation

Not a new firewall, not a new benchmark

A bank-support agent that can be inspected, tested, and safely handed off

Runtime guard

Offline tests

Structured audit

Human escalation

Same banking product, three language and regulator contexts

FRCNIL + GDPR

DEBaFin + DORA

ENGDPR + DORA

The safety claim is backed by replayable failures

The difference is visible when the request is risky

French support case

Same customer message, different operating discipline

Naive agent route

Tessera route

Every useful action leaves a trail

Route

Retrieve

Guard

Audit

Respond

Trust comes from the delivery chain, not from a polished interface

ruff, mypy, pytest

40 failure replays

Cloud Run + Cloud SQL

Ollama on Apple Silicon

A dedicated slot for the public agent UI

Reviewer-ready evidence

Chat workbench

Audit trail

Eval scorecard

Architecture and evaluation flow

System architecture

Non-regression tests

Stack snapshot

Agent

Retrieval

LLM paths

Quality

From customer signal to reviewer-ready evidence

Most agent systems

What Tessera makes visible

Router, planner, guarded tools and reviewer remain explicit

Graph orchestration

Guarded tool boundary

Cloud and on-prem paths

Shipped

Hardening

Inspect the assembly, not a black box