Open Source Proposal 5 — EM Foundation — May 2026

Probabilistic Continuity Delta Protocol

Reducing communication overhead by transmitting only the changes most likely to affect continuity of meaning — semantic deltas rather than full-state synchronization

EM Foundation  ·  May 2026  ·  emfoundation.net
Open source proposal. Implementation contributions welcomed at github.com/emfoundation.
Connects to: Continuity Receipts (CR), CIIC, PCO Standards Schema.

Abstract

Modern distributed systems face a synchronization dilemma. Full-state synchronization preserves perfect fidelity but wastes bandwidth transmitting information the receiver already has. Sparse synchronization reduces bandwidth but risks drift when deltas accumulate incorrectly or incompletely. Existing delta synchronization approaches — file diff, rsync, CRDT-based systems — optimize for structural fidelity: they accurately transmit what changed syntactically. They do not optimize for semantic continuity: they have no mechanism for distinguishing a change that merely modifies formatting from a change that alters the meaning, provenance, or reliability of a document.

The Probabilistic Continuity Delta Protocol proposes a middle path: transmit the changes most likely to affect continuity of meaning. A continuity delta is a state change that affects future interpretation — new evidence, changed confidence, resolved contradiction, added source, altered preference, changed legal status, or modified decision rationale. Structural changes that do not affect any of these dimensions are candidates for deferred or batched transmission.

This paper presents the continuity delta taxonomy, the JSON schema for delta objects, a simulation comparing full synchronization, text diff, and continuity-delta synchronization across bandwidth and accuracy metrics, and drift-detection safeguards. The proposal is most immediately applicable to collaborative legal documents, scientific case records, policy analysis threads, and AI agent memory logs.

II. The Core Formula

Continuity Delta Priority Score — Conceptual DefinitionCDP = (FactChange × w1) + (ConfidenceChange × w2) + (ContradictionChange × w3) + (ProvenanceChange × w4) + (DecisionChange × w5) + (PreferenceChange × w6) Where each term = magnitude of change normalized to 0-1: FactChange = semantic distance between old and new factual claims ConfidenceChange = |new_confidence - old_confidence| ContradictionChange = 1 if contradiction added/resolved, else 0 ProvenanceChange = 1 if source added/removed/changed, else 0 DecisionChange = 1 if decision rationale modified, else 0 PreferenceChange = 1 if stated preference altered, else 0 w1=0.30, w2=0.20, w3=0.20, w4=0.15, w5=0.10, w6=0.05 Transmission rule: transmit deltas with CDP above threshold immediately; batch deltas below threshold for periodic sync Drift detection: if accumulated unbatched CDP exceeds session_drift_limit, trigger immediate full-state sync

III. Experiment Design

The simulation uses evolving document datasets that any contributor can generate or obtain from public sources — legal case records, policy documents, scientific review threads, or synthetic agent memory logs.

Test Documents

20+ evolving documents spanning 50+ revision cycles each. At least one third of revisions should contain genuine continuity-significant changes (new evidence, changed conclusions, resolved contradictions) to provide meaningful test cases for CDP prioritization.

Three Synchronization Protocols

Full-state transfer: Complete document transmitted on every change. Perfect fidelity, maximum bandwidth.

Text diff (rsync-style): Only changed lines transmitted. Efficient for structural changes; blind to semantic significance.

Continuity-delta: CDP-scored changes transmitted by priority. High-CDP changes transmitted immediately; low-CDP changes batched. Drift detection triggers full sync if accumulated unbatched CDP exceeds threshold.

Measurement Dimensions

Bandwidth consumed per revision cycle. Reconstruction accuracy after simulated packet loss. Contradiction retention (are unresolved contradictions preserved across sync?). Decision traceability (can the rationale for a decision be reconstructed from the delta log?). Drift rate (how quickly does the receiver's state diverge from sender's state under each protocol?).

IV. Comparison to Git Delta Systems and Semantic Drift

The Git comparison. Every engineer reading this proposal will ask: "isn't this just git diff with semantic awareness?" The answer is yes and no, and the distinction matters. Git diff operates on syntactic structure — it identifies which lines changed. The Continuity Delta Protocol operates on semantic significance — it identifies which changes affect continuity of meaning. A change that rewrites a paragraph for clarity without altering its claims is a zero-CDP event. The same change in git diff produces a large diff. Conversely, a single-word change that replaces "not liable" with "liable" in a legal document is a maximum-CDP event. Git diff treats it as a minor change.

Rollback and recovery mechanisms. When accumulated deltas produce a state that diverges unacceptably from the sender's state, the protocol requires a rollback mechanism. The Foundation proposes a checkpoint-based rollback: every N deltas (configurable), a full-state snapshot is appended to the delta log. If drift detection triggers a rollback, the receiver restores from the most recent checkpoint and replays only the deltas since that checkpoint. This bounds the maximum recovery cost to O(checkpoint_interval) rather than O(full_history).

Probabilistic error bounds. The CDP scoring function introduces probabilistic uncertainty — a change scored as low-CDP may in fact be high-significance. The protocol should maintain a confidence interval on cumulative CDP, triggering a full synchronization when the interval's upper bound exceeds the session drift limit. This prevents silent divergence when individual delta scores are uncertain.

Semantic drift detection. Semantic drift occurs when accumulated low-CDP deltas progressively shift the meaning of a document in ways that no individual delta's score reflects. Detection requires periodic full-state comparison between sender and receiver, not just delta log verification. The benchmark should include a semantic drift test: 50+ low-CDP changes applied in sequence, measuring when a full-state comparison first detects meaningful semantic divergence.

IV.5 End-to-End Worked Example

The Delta Protocol's operation is easiest to understand through a concrete example. The following traces a legal matter record through a complete delta cycle.

Initial state (Day 1). A legal matter record contains: one claim ("the defendant was present at the location"), one supporting source (witness testimony, confidence 0.82), and one open question (jurisdiction for the claim). The full state is synchronized to all team members at initial creation.

Delta 1 — New evidence (Day 3). A second witness provides corroborating testimony. CDP score: FactChange = 0 (same claim), ConfidenceChange = 0.12 (confidence rises from 0.82 to 0.94), ProvenanceChange = 1 (new source added). CDP = (0 × 0.30) + (0.12 × 0.20) + (0 × 0.20) + (1 × 0.15) + (0 × 0.10) + (0 × 0.05) = 0.174. CDP above threshold (0.15): transmitted immediately. Receiver updates confidence and adds source to local record.

Delta 2 — Contradiction introduced (Day 7). A third party disputes the witness location, creating a direct contradiction of the primary claim. CDP score: ContradictionChange = 1 (new contradiction), FactChange = 0.6 (significant semantic shift in claim confidence). CDP = (0.6 × 0.30) + (0 × 0.20) + (1 × 0.20) + (0 × 0.15) + (0 × 0.10) + (0 × 0.05) = 0.38. High CDP: transmitted immediately with FR-3 severity flag. Receiver records the contradiction as unresolved and updates the reliance ceiling of the matter's PCO from RC-4 to RC-3.

Delta 3 — Formatting edit (Day 8). A team member reformats the summary for readability. FactChange = 0 (same claims), all other dimensions = 0. CDP = 0. CDP below threshold: batched for next periodic sync. This structural change does not affect meaning and does not require immediate transmission.

Delta 4 — Contradiction resolved (Day 12). Investigation establishes that the third party's account referred to a different date. ContradictionChange = 1 (contradiction resolved), ConfidenceChange = 0.10 (confidence partially recovers). CDP = (0 × 0.30) + (0.10 × 0.20) + (1 × 0.20) + (0 × 0.15) + (0 × 0.10) + (0 × 0.05) = 0.22. Above threshold: transmitted immediately. Receiver updates contradiction status to resolved and the PCO's reliance ceiling returns to RC-4.

This example demonstrates the core properties: high-CDP changes (new evidence, contradictions, resolutions) transmit immediately; low-CDP changes (formatting, minor updates) batch for periodic sync. The PCO's reliance ceiling responds dynamically to the current contradiction state.

IV.6 Mathematical Formalization

The following lightweight formal notation is intended to signal rigor rather than constitute a full mathematical specification. Full formalization is deferred to the empirical validation phase.

Let S(t) be the state of a document at time t, represented as a structured object containing claims C, sources Q, confidence values K, contradictions X, and decisions D.

A continuity delta δ(t, t+1) = S(t+1) − S(t) is the set of changes between two consecutive states. For each element e of δ(t, t+1), the CDP score CDP(e) is computed as defined in Section II.

The transmission decision function τ(e) = 1 if CDP(e) ≥ θ_immediate, where θ_immediate is the immediate transmission threshold. Otherwise τ(e) = 1 if accumulated_CDP(batch) ≥ θ_batch or elapsed_time ≥ T_max.

Drift detection: drift(t) = ||S_sender(t) − S_receiver(t)||_semantic. If drift(t) ≥ D_max, a full-state synchronization is triggered regardless of individual CDP scores.

The key invariant the protocol must maintain: for all decisions d ∈ D, the causal chain of evidence and contradictions leading to d must be reconstructible from the receiver's delta log.

Known Limitations

This section follows the Foundation's institutional practice of explicitly stating known weaknesses, failure modes, and scope boundaries for every proposal. Its presence indicates analytical maturity, not weakness in the underlying proposal.

Semantic equivalence cannot be guaranteed. The CDP scoring function estimates semantic significance using weighted structural signals. Accumulated low-CDP changes may collectively shift interpretation without triggering immediate transmission or drift detection. Drift detection reduces divergence but cannot guarantee zero semantic divergence between checkpoints.

CDP calibration domain-dependence. The proposed weights are domain-general defaults. Legal documents, scientific datasets, and policy records have different continuity profiles requiring domain-specific calibration for high-consequence deployments.

Rollback cost at high checkpoint intervals. At high checkpoint intervals, rollback and replay may be computationally expensive for large documents with many deltas since the last checkpoint.

Adversarial delta injection. A participant with write access could inject low-CDP deltas designed to accumulate into high-significance semantic shifts without triggering detection. The protocol does not include mechanisms for validating delta authorship.

What This Paper Does Not Claim

Non-Adoption Scenario

Without semantic delta prioritization, distributed systems synchronizing evolving documents face a binary choice between full-state synchronization (bandwidth-intensive, lossless) and structural diff (bandwidth-efficient, blind to semantic significance). Organizations managing distributed legal matters, scientific datasets, or policy records under bandwidth constraints default to infrequent full synchronization — creating windows during which different participants operate on diverged versions without visibility into the divergence.

Open Questions

What is the correct checkpoint interval for different document types and divergence tolerances? How should CDP scoring adapt for real-time collaborative editing where delta frequency is orders of magnitude higher? Can CDP scoring be made explainable to participants? What is the minimum semantic drift threshold that produces measurable decision-quality differences?

Governance Implications

Delta protocol deployments in legal and medical contexts create evidentiary questions: what is the legal status of a delta log as a record of document evolution; how long should delta logs be retained; who has authority to roll back to a prior checkpoint. These governance questions require institutional and potentially regulatory frameworks before high-consequence deployment.

References and Related Work

Tridgell, A. and Mackerras, P. (1996). The rsync Algorithm. Technical Report, ANU. · Oster, G. et al. (2006). Data Consistency for P2P Collaborative Editing. CSCW. · Shapiro, M. et al. (2011). Conflict-Free Replicated Data Types. SSS. · EM Foundation. PCO Standards Schema — OCMS v0.1. emfoundation.net/pco-standards-schema.html

V. Falsifiability

Bandwidth reduction below 25% versus full-state transfer for real-world legal and scientific document datasets — the overhead of CDP scoring negates the savings.
Semantic drift detection latency exceeding 10 minutes for a 50-delta sequence — the system cannot catch meaningful divergence quickly enough for legal or medical document synchronization.
Decision traceability failure rate exceeding 5% — the delta log cannot reconstruct the rationale for decisions with acceptable reliability.
Open Source Contribution Invitation

Create a JSON schema for continuity delta objects compatible with the OCMS schema at emfoundation.net/pco-standards-schema.html. Build a Python simulator implementing all three synchronization protocols with configurable document set, revision rate, and bandwidth constraint. Implement drift detection with configurable thresholds. Design visual delta timelines showing which changes were transmitted immediately, batched, or deferred. Package as github.com/emfoundation/continuity-delta-protocol. The JSON schema should be submitted as a proposed extension to the OCMS standard through the Foundation's public comment process.

Contact: research@emfoundation.net