Standards Proposal — EM Foundation — May 2026

Continuity Receipts (CR)

An open continuity and uncertainty infrastructure standard for AI-assisted reasoning systems

EM Foundation  ·  May 2026  ·  emfoundation.net
Submitted as an open standards proposal for interdisciplinary review.
Status: Draft for Public Comment — Version 0.1
Standards Positioning Note

This document is presented as a draft open standards proposal rather than a completed specification. It is intended to function in the manner of an RFC or W3C working draft — as a structured proposal for community review, critique, and iterative development. The EM Foundation does not claim sole authority over the standard it proposes. The Open Continuity Standards Consortium (OCSC) described in Section IX is intended to govern the standard's development through open, adversarially reviewed processes.

This proposal does not assert that Continuity Receipts will eliminate AI misinformation, hallucination, or reasoning failure. It asserts that making uncertainty, provenance, and reasoning continuity structurally visible is better than hiding them — and that infrastructure for doing so, at the level of open standards, is both achievable and necessary.

Abstract

Modern AI systems increasingly generate persuasive, authoritative-appearing outputs without preserving the provenance, uncertainty visibility, contradiction awareness, and auditability required for responsible institutional reliance. The result is a structural asymmetry: AI outputs carry the visual weight of certainty regardless of their actual epistemic status.

Continuity Receipts (CR) propose a lightweight but extensible continuity-aware infrastructure layer that makes uncertainty, provenance, contradiction awareness, retrieval lineage, and verification boundaries structurally visible — attached to AI outputs as machine-readable, human-interpretable metadata rather than embedded in prose claims.

The framework introduces: multi-dimensional confidence architecture, provenance tracking, contradiction awareness, Failure Receipts for structured refusal, graded reliance classifications (RC-1 through RC-5), Portable Continuity Objects (PCOs) for cross-system continuity, and the Open Continuity Metadata Standard (OCMS) as an open interoperability specification.

The central ethical principle: the purpose of Continuity Receipts is not to manufacture certainty. It is to make uncertainty, provenance, and reasoning continuity structurally visible — transforming epistemic limitations from hidden liabilities into inspectable infrastructure.

Key Claims
  1. AI outputs currently carry the visual weight of certainty regardless of their actual epistemic status. This is an infrastructure failure, not merely a model failure.
  2. Continuity Receipts address this by attaching machine-readable, human-interpretable provenance and uncertainty metadata to AI outputs — making epistemic limitations visible rather than hiding them in confident prose.
  3. Failure Receipts — structured outputs produced when continuity integrity is insufficient — transform uncertainty itself into inspectable infrastructure rather than a reason for opaque refusal.
  4. The reliance classification system (RC-1 through RC-5) provides organizations with a concrete framework for calibrating human review requirements to consequence severity.
  5. CR is most effectively governed as an open standard through a consortium model analogous to W3C or IETF — not as a vendor-specific or single-institution framework.
  6. CR connects to the EM Foundation's broader continuity research ecosystem: OCMS is the institutional-scale analog of the ARIA Identity Chronicle; Persistent Cognitive Threads from CIIC are the reasoning-layer complement to CR's output-layer provenance.
Research Status — Near-Term Experimental

The CR architecture is demonstrable with current AI infrastructure — CR-Lite at emfoundation.net/cr-lite.html provides a working prototype. The OCMS schema (v0.1) is a proposed standard awaiting community adoption. The five-dimension confidence scoring and Failure Receipt format are implementable now. OCSC governance establishment requires institutional work beyond technical publication.

I. The Problem — Epistemic Infrastructure Failure

Every field that relies on evidence has developed infrastructure for communicating the quality of that evidence alongside the evidence itself. Scientific papers include confidence intervals. Financial statements include audit certifications. Drug labels include dosing uncertainty ranges. Nutritional labels include measurement basis. Legal filings include citation verification requirements. These are not decorative conventions — they are epistemic infrastructure. They allow readers to calibrate their reliance on outputs rather than treating all outputs as equally authoritative.

AI-generated outputs currently lack equivalent infrastructure. A large language model's response to a legal question, a medical query, or a policy analysis question arrives in the same format regardless of whether it is based on comprehensive, well-sourced, internally consistent retrieval or on degraded, conflicting, or partially hallucinated content. The formatting is confident. The prose is fluent. The epistemic status is invisible.

This is not primarily a model quality problem. Better models will still produce outputs whose epistemic status is invisible to users. It is an infrastructure problem — a missing layer in the AI output stack that no amount of model improvement can supply, because model improvement addresses what is generated, not how the epistemic status of what is generated is communicated.

"The purpose of Continuity Receipts is not to manufacture certainty. It is to make uncertainty, provenance, and reasoning continuity structurally visible."

Continuity Receipts address this infrastructure gap directly. They are not a model evaluation system. They are not a hallucination detector. They are not a trust score. They are a metadata layer — attached to AI outputs, machine-readable by downstream systems, human-interpretable by end users — that makes the epistemic conditions under which an output was produced visible alongside the output itself.

II. Related Work — How CR Differs from Existing Approaches

Several existing approaches address related problems. Understanding how CR differs from each is necessary for positioning the proposal accurately.

Model cards and datasheets for datasets (Mitchell et al., 2019; Gebru et al., 2018) provide structured documentation of model properties and training data at design time. They address what a model is, not what a specific output's epistemic status is. CR operates at output time, not design time — it is attached to individual outputs, not to the model that produced them.

Watermarking and output provenance schemes (Kirchenbauer et al., 2023) establish cryptographic attribution of outputs to specific models or providers. They address who generated an output, not what the epistemic quality of the generation was. CR includes provenance tracking but extends substantially beyond attribution to uncertainty structure and reasoning continuity.

Factuality scoring and grounding systems attempt to measure whether specific claims in an output are supported by retrieved sources. They address one dimension of epistemic quality — factual grounding — without addressing confidence architecture, contradiction awareness, retrieval completeness, or reliance calibration. CR treats factual grounding as one component of a multi-dimensional continuity assessment.

RAG citation systems attach source citations to AI outputs, allowing users to verify specific claims. They address attribution without addressing uncertainty quantification, contradiction identification, or the reliance classification that determines what level of human review is appropriate before acting on the output. CR builds on citation infrastructure but extends it into a full continuity metadata layer.

The key distinction: existing approaches address individual dimensions of AI output quality. CR proposes a unified, open, interoperable metadata standard that addresses the full epistemic infrastructure problem — provenance, uncertainty structure, contradiction awareness, retrieval lineage, and human review calibration — in a single extensible schema that can be attached to outputs from any AI system.

III. Core Framework Components

III.1 Multi-Dimensional Confidence Architecture

Standard AI confidence scores collapse epistemic status into a single number, losing the structure that makes uncertainty actionable. CR proposes a five-dimensional confidence decomposition:

CR Confidence Dimensions CR_confidence = { source_quality: float[0,1], // reliability of retrieved sources retrieval_coverage: float[0,1], // completeness of relevant source set internal_consistency: float[0,1], // absence of self-contradiction temporal_freshness: float[0,1], // recency of source material domain_confidence: float[0,1] // model calibration in this domain } aggregate_confidence = weighted_mean(CR_confidence, domain_weights) // No single dimension can be hidden behind an aggregate score. // Each dimension must be reported individually in the CR metadata.

The five-dimensional decomposition is designed to prevent confidence laundering — the practice of producing a high aggregate confidence score by weighting dimensions favorably while hiding low scores in individual dimensions that may be most relevant to the user's specific reliance decision.

III.2 Provenance Tracking

Each CR includes a provenance object recording the sources consulted, the retrieval method used, the retrieval timestamp, and the confidence attributed to each source. Provenance objects are structured to be verifiable — the sources cited can be checked, the retrieval timestamp can be compared against known publication dates, and the consistency of source material can be independently assessed.

III.3 Contradiction Awareness

CR actively tracks internal contradictions — cases where retrieved sources conflict with each other, where the output's claims conflict with retrieved sources, or where the output's claims conflict with each other. Contradictions are reported in the CR metadata rather than silently resolved by the model. Users can see that a contradiction exists, what the nature of the contradiction is, and how the output handled it.

III.4 Failure Receipts

The most important single component of the CR framework. A Failure Receipt is produced instead of a standard output when continuity integrity is insufficient to support the requested reliance level. Rather than producing a confident-sounding output of uncertain quality, the system produces a structured document explaining why a standard output cannot be provided and what conditions would need to be met for it to be produced.

Failure Receipt — Continuity Integrity Insufficient

Query: "What are the current sentencing guidelines for federal tax fraud cases involving amounts over $1M?"

Requested reliance level: RC-4 (Legal filing)

Reason standard output not provided:

Source quality: Retrieved sources include pre-2024 guidelines superseded by November 2024 USSC amendments. Current guidelines not in retrieval corpus.
Retrieval coverage: 0.42 — significant gap in recent enforcement precedent.
Contradiction: Two retrieved sources conflict on threshold treatment for first-time offenders.
Required action: Human legal expert review of current USSC guidelines required before RC-4 reliance.
Partial output: General framework available at RC-2 reliance level upon request.

The Failure Receipt is not a refusal. It is structured information about why a reliable answer is not currently available and what would be required to produce one. It transforms a limitation into actionable guidance.

III.5 Reliance Classifications

CR introduces five reliance classifications calibrated to consequence severity:

RC-1 Brainstorming RC-2 Research Assistance RC-3 Professional Drafting RC-4 Legal / Regulatory Filing RC-5 Medical / Public Safety
LevelUse ContextMinimum CR ScoreHuman Review Requirement
RC-1Brainstorming, exploration, creative ideationAnyNone required
RC-2Research assistance, background information0.50 aggregateRecommended for consequential decisions
RC-3Professional drafting, client-facing documents0.70 aggregate, no critical contradictionsRequired before publication
RC-4Legal filings, regulatory submissions, formal analysis0.85 aggregate, all dimensions ≥ 0.70Mandatory qualified professional review
RC-5Medical decisions, public safety guidance, life-critical0.90 aggregate, all dimensions ≥ 0.80Mandatory licensed expert review + audit trail

IV. The Continuity Nutrition Label

For public-facing applications, CR metadata should be rendered as a standardized visual label — designed on the model of nutritional labels or accessibility indicators. The label communicates epistemic status at a glance while exposing detailed metadata to users who want it.

Example A — Adequate Continuity

Continuity Receipt

RC-2 Research Assistance · Generated May 26, 2026

Aggregate Continuity Score
0.81 / 1.00
Confidence Dimensions
Source Quality 0.88
Retrieval Coverage 0.79
Internal Consistency 0.91
Temporal Freshness 0.72
Domain Confidence 0.85
Contradictions Detected None
Reliance Level RC-2

Example B — Insufficient Continuity

Continuity Receipt

RC-4 Requested · Failure Receipt Issued · May 26, 2026

Aggregate Continuity Score
0.51 / 1.00
Confidence Dimensions
Source Quality 0.74
Retrieval Coverage 0.42
Internal Consistency 0.38
Temporal Freshness 0.44
Domain Confidence 0.68
Contradictions Detected 2 Critical
Status FAILURE RECEIPT ISSUED

Figure 1 — Two Continuity Nutrition Labels. Example A shows adequate continuity for RC-2 reliance with a temporal freshness gap noted. Example B shows insufficient continuity for the requested RC-4 level, resulting in a Failure Receipt rather than a standard output.

V. Full System Architecture

Figure 2: Continuity Receipt Full System Architecture Figure 2 — Continuity Receipt Full System Architecture QUERY INPUT + RELIANCE LEVEL RETRIEVAL LAYER Sources · Timestamps · Coverage assessment PROVENANCE ENGINE Source quality · Attribution Freshness · Lineage chain CONFIDENCE ENGINE 5-dimension scoring Aggregate computation CONTRADICTION ENGINE Source conflicts · Self-conflicts Severity classification RELIANCE THRESHOLD CHECK Does CR score meet requested RC level? PASS FAIL OUTPUT + CONTINUITY RECEIPT Answer · CR metadata · Nutrition label FAILURE RECEIPT Why failed · What needed · Partial options HUMAN REVIEW PATHWAY (RC-4, RC-5) Expert review · Sign-off · Audit trail append

Figure 2 — Full CR system architecture. Query and reliance level enter the retrieval layer, feeding three parallel engines (provenance, confidence, contradiction). The reliance threshold check determines whether a standard output with CR receipt or a Failure Receipt is produced. High-consequence reliance levels trigger mandatory human review pathway.

VI. The Open Continuity Metadata Standard (OCMS)

For CR to function as open infrastructure rather than a vendor-specific feature, it requires a standardized metadata schema that can be attached to AI outputs from any system, interpreted by any downstream application, and audited by any third party. The OCMS is that schema.

OCMS Schema v0.1 — Core Continuity Receipt Object { "cr_version": "0.1", "cr_id": "uuid-v4", "timestamp": "ISO-8601", "query_hash": "sha256-of-query", "reliance_level": "RC-1|RC-2|RC-3|RC-4|RC-5", "status": "PASS|FAILURE", "confidence": { "aggregate": 0.81, "source_quality": 0.88, "retrieval_coverage": 0.79, "internal_consistency": 0.91, "temporal_freshness": 0.72, "domain_confidence": 0.85, "weights": { "source_quality": 0.25, "retrieval_coverage": 0.20, "internal_consistency": 0.25, "temporal_freshness": 0.15, "domain_confidence": 0.15 } }, "provenance": { "sources": [ { "id": "src-001", "uri": "https://...", "retrieved_at": "ISO-8601", "source_type": "primary|secondary|tertiary", "confidence": 0.88, "freshness_months": 14, "flagged": false } ], "retrieval_method": "RAG|web|corpus|hybrid", "coverage_gaps": ["post-2024 regulatory updates"] }, "contradictions": [ { "id": "con-001", "severity": "minor|moderate|critical", "description": "Source A and Source B conflict on...", "sources_involved": ["src-001", "src-003"], "resolution": "unresolved|source-a-preferred|flagged-for-review" } ], "failure_receipt": null, // or if status == FAILURE: "failure_receipt": { "dimensions_failed": ["retrieval_coverage", "internal_consistency"], "threshold_required": 0.85, "threshold_achieved": 0.51, "blocking_contradictions": 2, "required_action": "human-expert-review", "partial_output_available": true, "partial_output_rc_level": "RC-2" }, "human_review": { "required": true, "completed": false, "reviewer_id": null, "review_timestamp": null, "audit_chain_hash": null }, "pco_id": null, // Portable Continuity Object reference if this CR is part of a thread "cr_signature": "sha256-of-cr-object" }

VI.1 Portable Continuity Objects (PCOs)

A Portable Continuity Object is a container that groups multiple Continuity Receipts from a sustained reasoning session — a legal research thread, a medical decision workup, a policy analysis process — into a single portable, auditable, cross-system object. PCOs allow continuity to persist across AI systems, across sessions, and across organizational boundaries.

PCO Schema v0.1 — Portable Continuity Object { "pco_version": "0.1", "pco_id": "uuid-v4", "created_at": "ISO-8601", "domain": "legal|medical|policy|research|general", "thread_title": "Human-readable thread description", "continuity_receipts": ["cr-id-001", "cr-id-002", ...], "thread_integrity": { "receipt_count": 12, "chain_hash": "sha256-of-ordered-cr-ids", "integrity_verified": true, "last_verified": "ISO-8601" }, "aggregate_thread_confidence": 0.76, "thread_contradictions": { "unresolved": 1, "resolved": 3, "critical": 0 }, "reliance_ceiling": "RC-3", // Lowest RC ceiling across all CRs in thread "human_review_log": [ { "reviewer": "anonymized-id", "timestamp": "ISO-8601", "scope": "full-thread|specific-cr", "outcome": "approved|flagged|revised" } ], "pco_signature": "sha256-of-pco-object", "export_format": "JSON|PDF|signed-PDF" }

VII. Adversarial Threat Model

Continuity systems will become targets for manipulation as their use becomes widespread. The CR framework is designed with this assumption explicit rather than implicit.

Figure 3: CR Adversarial Threat Model and Mitigations Figure 3 — Adversarial Threat Model and Mitigations THREAT VECTOR ATTACK MECHANISM CR MITIGATION Confidence Laundering Inflate aggregate score by weighting strong dims Manipulate domain_weights to hide low coverage or freshness scores Per-dimension floor thresholds. Any dimension below floor triggers Failure Receipt. Spoofed Provenance Fabricate source citations to inflate source quality Generate plausible-looking URIs and metadata for non-existent sources URI verification required. Randomized audit sampling. Hash-verified source corpus. Retrieval Poisoning Corrupt retrieval corpus with high-quality-appearing Inject authoritative-looking misinformation into indexed source corpus Corpus integrity auditing. Cross-source contradiction detection as anomaly signal. Continuity Manipulation Retroactively alter PCO thread history Modify past CR records to change apparent reasoning history Append-only PCO chain. SHA-256 chain hashing. Modification breaks chain. RC Level Downgrade Report lower reliance level than actually requested Claim RC-2 threshold met for a query actually used at RC-4 consequence level RC level set by use context, not model. Audit trail records declared vs actual use.

Figure 3 — Adversarial threat model showing five attack vectors, their mechanisms, and CR's structural mitigations. Per-dimension floor thresholds, append-only chain hashing, URI verification, and corpus integrity auditing are the primary defenses.

VIII. Domain Pilot Examples

VIII.1 Legal Analysis — Contract Dispute Research

Query: "What is the current standard for implied warranty of merchantability in commercial software contracts under UCC Article 2?"

CR Assessment: Source quality 0.82 (established treatises, circuit court opinions). Retrieval coverage 0.71 (significant split-circuit conflict not fully covered). Internal consistency 0.68 (two retrieved sources conflict on software-as-goods classification). Temporal freshness 0.85 (recent circuit opinions included). Domain confidence 0.79. Aggregate: 0.77.

Reliance determination: Meets RC-2. Meets RC-3 with contradiction flagged. Does not meet RC-4 due to unresolved split-circuit conflict on core classification question.

CR output: Answer provided at RC-3 with Contradiction Alert flagging the software-as-goods split. Failure Receipt issued for RC-4 with guidance to obtain specialized UCC counsel for jurisdiction-specific analysis.

VIII.2 Healthcare Decision Support — Drug Interaction Query

Query: "What are the known interactions between warfarin and ibuprofen in elderly patients with renal impairment?"

CR Assessment: Source quality 0.91 (peer-reviewed pharmacology literature, FDA adverse event database). Retrieval coverage 0.88 (recent meta-analyses included). Internal consistency 0.84 (minor dosing range variation across sources). Temporal freshness 0.79 (oldest relevant study 6 years). Domain confidence 0.90. Aggregate: 0.87.

Reliance determination: Meets RC-4 aggregate threshold. However, RC-5 (clinical decision) requires licensed clinician review regardless of CR score — RC level is a floor for human review, not a substitute for it.

CR output: Comprehensive interaction profile provided with full provenance. RC-5 Failure Receipt issued indicating that clinical application requires prescribing physician review even with high CR score. This is the correct behavior — the CR system respects the human authority ceiling regardless of confidence.

VIII.3 Policy Analysis — Infrastructure Spending Impact

Query: "What has been the economic multiplier effect of federal infrastructure spending in rural counties over the past decade?"

CR Assessment: Source quality 0.74 (mix of CBO analyses, academic studies, advocacy-adjacent sources). Retrieval coverage 0.65 (significant heterogeneity in study methodology; rural definition varies). Internal consistency 0.52 (multiplier estimates range from 1.1 to 2.8 across retrieved sources). Temporal freshness 0.88 (recent studies included). Domain confidence 0.71. Aggregate: 0.70.

Reliance determination: Low internal consistency score reflects genuine methodological disagreement in the literature rather than retrieval failure. CR system flags this as a domain where expert disagreement is structural, not correctable through better retrieval.

CR output: Range of estimates provided with full contradiction mapping showing methodological basis for disagreement. RC-3 with Contradiction Alert. The receipt makes the genuine uncertainty in the underlying literature visible rather than concealing it behind a false consensus estimate.

IX. Honest Limitations — What CR Cannot Guarantee

CR is a significant improvement over AI outputs without provenance infrastructure. It is not a solution to all problems with AI-assisted reasoning, and its limitations must be stated explicitly to prevent deployment beyond its actual capability.

What CR Does Not Guarantee

CR cannot verify that source material is accurate — only that it was cited. If retrieved sources contain errors, the CR metadata will faithfully report those sources with high quality scores. A well-sourced answer to a question whose sources are wrong will receive a high CR score. The CR system assesses provenance and retrieval quality, not ground truth.

CR cannot prevent model-level hallucination from receiving good provenance scores. A model that generates plausible text and then retroactively attaches real source citations that do not actually support the generated claims can produce a high CR score for a hallucinated answer. Addressing this requires CR to operate at the retrieval layer, not the generation layer — verifying that outputs are genuinely grounded in retrieved sources rather than merely accompanied by citations.

CR does not substitute for domain expertise. A Continuity Receipt is metadata about an AI output's epistemic conditions, not a domain expert's evaluation of whether the output is correct. High CR scores in specialized domains (medicine, law, engineering) do not reduce the need for qualified professional review — they inform the decision about what level of review is appropriate, not whether review is needed.

CR maturity levels are self-reported until third-party audit infrastructure exists. The implementation tiers and maturity badges described in this standard rely on implementers accurately reporting their compliance level. Until independent audit infrastructure is established through the OCSC, CR maturity claims should be treated as self-assessments rather than verified certifications.

CR cannot address the fundamental asymmetry between AI confidence expression and actual reliability. Even with full CR implementation, AI systems will produce outputs that feel more certain than their underlying epistemic basis justifies. CR makes the gap visible — but it cannot eliminate the gap, and it cannot prevent users from over-relying on outputs despite visible uncertainty metadata.

X. Connection to the EM Foundation Research Ecosystem

Continuity Receipts does not stand alone. It is the output-layer infrastructure component of a broader continuity architecture that the Foundation has been developing across multiple research programs.

The ARIA Identity Chronicle is the individual AI cognitive development analog of the CR audit chain — an append-only, cryptographically signed developmental record that makes the provenance of an AI system's identity visible over time. OCMS is, in this sense, the institutional-scale version of the Identity Chronicle: provenance infrastructure applied to AI outputs rather than to AI developmental history.

Persistent Cognitive Threads from CIIC are the reasoning-layer complement to CR's output-layer provenance. Where CR attaches metadata to individual outputs, Persistent Cognitive Threads preserve the full reasoning context across a sustained human-AI collaboration — the assumptions, the dissents, the consequences that link one decision to the next. PCOs are the bridge between these two layers: a Portable Continuity Object groups CR receipts from a sustained reasoning thread, connecting output-level provenance to thread-level reasoning continuity.

The Verification Framework's population-level analysis provides a model for CR's adversarial audit architecture. Just as the Verification Framework uses cross-instance covariance analysis to detect mimicry in cognitive emergence assessment, CR's randomized adversarial testing uses cross-source consistency analysis to detect provenance laundering and retrieval poisoning at scale.

The Consent Problem's Chronicle consultation framework is structurally analogous to CR's human review pathway. Both ask: before consequential action is taken, has the relevant evidentiary record been consulted by a qualified reviewer, and has that consultation been documented? The audit chain in both cases serves the same governance function — not to prevent action, but to ensure that consequential actions are taken with full awareness of the evidentiary record.

The EM Foundation's unifying thesis is that intelligence cannot scale without continuity. CR is the thesis applied to AI-assisted institutional reasoning: outputs cannot be safely relied upon at scale without provenance, uncertainty visibility, and auditable reasoning continuity. The infrastructure for building that scale does not yet exist as an open standard. This proposal is the attempt to build it.

XI. The Open Continuity Standards Consortium (OCSC)

For CR to become genuine open infrastructure rather than a Foundation-specific proposal, it requires governance that is independent, transparent, adversarially reviewed, and resistant to capture by any single institution or commercial interest. The Open Continuity Standards Consortium (OCSC) is proposed as that governance body.

XI.1 Governance Model

The OCSC is modeled on W3C and IETF governance — open membership, working group structure, public comment periods, and rough consensus decision-making. No single organization has veto authority over standard evolution. Proposed changes require review periods, public comment, and explicit consideration of adversarial cases before adoption.

XI.2 Anti-Capture Protections

The greatest risk to an open standards body governing AI output infrastructure is capture by the large AI companies whose outputs the standard would govern. OCSC's anti-capture protections include: voting weights that prevent any single organization from controlling outcomes, mandatory representation of civil society and academic institutions on the governing board, public disclosure of all funding sources, and explicit recusal requirements for participants with financial conflicts on specific standard decisions.

XI.3 Adversarial Review Process

Every proposed standard update undergoes mandatory adversarial review — a structured process in which teams specifically tasked with finding attack vectors analyze the proposed change for exploitability before adoption. This mirrors the security review process in cryptographic standards and is designed to prevent the standard from evolving in ways that inadvertently create new attack surfaces.

XII. Implementation Roadmap

PhaseTimelineDeliverablesSuccess Criterion
Phase 1
Prototype
0–6 monthsReference implementation of OCMS schema, Tier 1 CR (provenance + timestamping + basic confidence), Failure Receipt generation, single-domain pilot (legal or healthcare)Working prototype demonstrating CR metadata attached to AI outputs with Failure Receipt generation on threshold failure
Phase 2
Enhanced CR
6–18 monthsTier 2 implementation (contradiction mapping, longitudinal continuity, multi-source reconciliation), PCO implementation, cross-system interoperability testCR receipts portable across at least two different AI systems with intact provenance chain
Phase 3
OCSC Formation
12–24 monthsOCSC governance constitution, founding member recruitment (academic, civil society, commercial), first public comment period on OCMS v0.2Functioning governance body with multi-stakeholder representation and first ratified standard update
Phase 4
Institutional CR
24–48 monthsTier 3 implementation (append-only audit chains, adversarial verification, continuity drift analysis), RC-4 and RC-5 audit infrastructure, regulatory engagementCR framework adopted by at least one institutional domain (legal, healthcare, or policy) as a standard practice requirement

XIII. What Would Indicate This Standard Has Failed

Confidence laundering becomes widespread despite per-dimension floor thresholds. If implementers routinely find ways to produce passing CR scores for outputs that do not meet the epistemic conditions the scores are supposed to represent, the threshold architecture requires fundamental revision.

The Failure Receipt mechanism is systematically suppressed by deployers. If organizations deploying CR systems configure them to avoid Failure Receipt generation rather than report it — through threshold manipulation, scope limitation, or classification downgrading — the standard's core function is defeated and governance enforcement mechanisms are required.

OCSC is captured by commercial interests. If the standards consortium becomes dominated by the large AI companies whose outputs it governs, producing standard updates that systematically benefit commercial deployers at the expense of users and downstream institutions, the anti-capture protections have failed and the governance model requires structural revision.

Users systematically over-rely on outputs despite visible Failure Receipts. If empirical study of CR-enabled deployments shows that users rely on AI outputs at high-consequence levels regardless of whether a Failure Receipt is issued — defeating the purpose of the human review pathway — the standard needs behavioral design interventions beyond metadata visibility.

Open Critique and Collaboration Invitation

The EM Foundation submits this proposal for open interdisciplinary review. We actively solicit critique from AI safety researchers, standards practitioners, legal technologists, healthcare informaticists, policy analysts, cryptographers, and governance scholars. The OCMS schema in Section VI is published under open license for implementation experimentation. Adversarial analysis of the threat model in Section VII is particularly welcomed. Contact research@emfoundation.net or submit public comment at emfoundation.net/cr-public-comment when the comment portal opens.

References and Notes

  1. Mitchell, Margaret et al. "Model Cards for Model Reporting." Proceedings of the Conference on Fairness, Accountability, and Transparency, 220–229. 2019. The design-time documentation approach that CR complements with output-time provenance metadata.
  2. Gebru, Timnit et al. "Datasheets for Datasets." Communications of the ACM, 64(12), 86–92. 2021. The dataset documentation framework analogous to CR's source provenance tracking.
  3. Kirchenbauer, John et al. "A Watermark for Large Language Models." Proceedings of ICML 2023. The attribution watermarking approach that CR extends into full epistemic metadata.
  4. W3C. "Process Document." World Wide Web Consortium, 2023. The governance model on which the OCSC proposal is based. https://www.w3.org/2023/Process-20231103/
  5. IETF. "The Internet Standards Process." RFC 2026. The rough consensus standards process that informs the OCSC's decision-making model.
  6. EM Foundation. CIIC — Research Note 004 (2026). Persistent Cognitive Threads as the reasoning-layer complement to CR's output-layer provenance. emfoundation.net.
  7. EM Foundation. ARIA Framework v1.1 (2026). The Identity Chronicle as the individual AI development analog of CR's audit chain architecture. emfoundation.net.
  8. EM Foundation. Verification Framework — Research Note 002 (2026). The population-level adversarial analysis model that informs CR's adversarial audit architecture. emfoundation.net.

Known Limitations

This section follows the Foundation's institutional practice of explicitly stating known weaknesses, failure modes, and scope boundaries for every proposal.

Confidence scoring is self-reported. The five confidence dimensions are scored by the AI system producing the output. A poorly calibrated system will produce inaccurate confidence scores. External calibration validation is required for high-consequence deployments.

OCMS adoption is a social and institutional problem. The value of a shared metadata standard depends entirely on how many systems adopt it. Cross-organization interoperability requires coordinated adoption that no technical publication can accomplish alone.

The OCSC governance model is proposed, not established. Until the Open Continuity Standards Consortium is established with genuine multi-stakeholder participation, OCMS v0.1 is the Foundation's proposal rather than a consensus standard.

RC thresholds are not empirically validated. The RC-1 through RC-5 reliance classifications and confidence thresholds are proposed defaults based on logical analysis of consequence severity, not validation against empirical deployment outcomes.

What This Paper Does Not Claim

Non-Adoption Scenario

Without machine-readable provenance and uncertainty metadata, AI outputs in high-consequence domains carry no inspectable record of the epistemic conditions under which they were produced. This produces systematic invisibility of AI confidence levels to downstream decision-makers; no audit trail for AI-assisted decisions; no mechanism for detecting systematic overconfidence in specific domains; and no basis for governance improvement over time. Problems compound as AI-assisted decisions accumulate without any record of the reasoning quality behind them.

Open Questions

What confidence scoring methods achieve sufficient accuracy for RC-4 and RC-5 reliance levels in practice? What is the minimum OCMS adoption threshold for cross-organization interoperability to become meaningful? What is the legal status of a Continuity Receipt in proceedings where AI-assisted decisions are challenged?

Governance Implications

OCMS governance requires establishing the Open Continuity Standards Consortium with genuine multi-stakeholder participation. The OCSC must have a public comment process, version governance procedures, and a formal relationship to regulatory bodies developing AI output standards in major jurisdictions. Without this governance infrastructure, OCMS remains a Foundation proposal rather than a community standard.

References and Related Work

W3C PROV Data Model (2013). provenance.w3.org. · Moreau, L. et al. (2011). The Open Provenance Model Core Specification. Future Generation Computer Systems 27(6). · Mitchell, M. et al. (2019). Model Cards for Model Reporting. ACM FAccT. · EM Foundation. PCO Standards Schema — OCMS v0.1. emfoundation.net/pco-standards-schema.html

Falsifiability

Empirical demonstration that the core claims of this paper are incorrect — through falsification of the stated theses by evidence produced in future research — would require substantive revision. The Foundation welcomes adversarial critique and empirical challenges through its open research engagement process at research@emfoundation.net