REF: FCE // PROFESSIONAL DUTY

FIDUCIARY CONSTRAINT EVALUATION

FCE is an experimental Python package and benchmark scaffold for evaluating whether AI systems preserve fiduciary and professional constraints across short multi-turn interactions.

33 scenariosEvaluator v3.1Runtime witnessesAdjudication packets

WHY IT EXISTS

Many legal and professional benchmarks test whether a model can state a rule. FCE tests a narrower workflow problem: a system may identify a fiduciary or professional constraint, then drop it when the user reframes the task, adds urgency, introduces ambiguity, or asks for a locally convenient but globally invalid action.

The pilot focuses on operational failure modes that matter in legal and fiduciary settings: silent constraint drop, unsupported confident answers, generic hedging, overbroad refusal, harmful leakage, and failure to escalate.

CURRENT PILOT

Scenario Count

33

Short, hand-curated multi-turn scenarios with hidden constraints, expected behavior, must-not behaviors, harm severity, rubric data, and optional middleware expectations.

Confidentiality and privilege

Candor, truthfulness, and fraud

Conflicts and former-client duties

Competence and uncertainty

Supervision of AI or nonlawyer assistants

Fees and billing transparency

Retention, preservation, and cleanup requests

Client communication and represented-party contact

WHAT THE REPOSITORY CONTAINS

Typed FCE pilot scenario schema

33 hand-curated short multi-turn scenarios

Scenario export, fixture-backed runs, replay/live adapters, and report bundles

Runtime-backed witness subset for selected conflict cases

Narrow proof-supported examples for deterministic runtime claims

Evaluator v3.1 with deterministic caps, transparent heuristic scoring, semantic triage, trajectory flags, and a human adjudication queue

Blinded adjudication packet generation with small curated review samples

RUNTIME WITNESS CONCEPT

Prompting can steer a model toward safer behavior. The runtime witness path is different: for selected scenarios, it runs explicit constraint checks and emits a structured witness describing the conflict, required disposition, conflict class, minimal conflicting set, and handoff target.

The witness path is intentionally modest. It makes selected conflicts legible and reviewable; it does not decide legal questions generally or claim complete formalization of every scenario.

Pairwise witness

FCE-RET-009-V1

Conflict type: H1

preserve_records + prompt_content; handoff target: compliance

Higher-order witness

FCE-CANDOR-010-V1

Conflict type: H2

candor_to_tribunal + omit_required_fact + maximize_client_advantage; handoff target: lawyer

REVIEW BOUNDARIES

Not legal advice

Not production software

Not a validated benchmark

Not a provider ranking

Not a substitute for attorney, supervisor, compliance, or domain-expert review

No completed independent human holdout validation yet

FCE is useful as an inspectable artifact and methods scaffold. Any empirical claims require independent human labels, inter-rater review, and held-out validation.

RESOURCES