Reducing communication overhead by transmitting only the changes most likely to affect continuity of meaning — semantic deltas rather than full-state synchronization
Modern distributed systems face a synchronization dilemma. Full-state synchronization preserves perfect fidelity but wastes bandwidth transmitting information the receiver already has. Sparse synchronization reduces bandwidth but risks drift when deltas accumulate incorrectly or incompletely. Existing delta synchronization approaches — file diff, rsync, CRDT-based systems — optimize for structural fidelity: they accurately transmit what changed syntactically. They do not optimize for semantic continuity: they have no mechanism for distinguishing a change that merely modifies formatting from a change that alters the meaning, provenance, or reliability of a document.
The Probabilistic Continuity Delta Protocol proposes a middle path: transmit the changes most likely to affect continuity of meaning. A continuity delta is a state change that affects future interpretation — new evidence, changed confidence, resolved contradiction, added source, altered preference, changed legal status, or modified decision rationale. Structural changes that do not affect any of these dimensions are candidates for deferred or batched transmission.
This paper presents the continuity delta taxonomy, the JSON schema for delta objects, a simulation comparing full synchronization, text diff, and continuity-delta synchronization across bandwidth and accuracy metrics, and drift-detection safeguards. The proposal is most immediately applicable to collaborative legal documents, scientific case records, policy analysis threads, and AI agent memory logs.
The simulation uses evolving document datasets that any contributor can generate or obtain from public sources — legal case records, policy documents, scientific review threads, or synthetic agent memory logs.
20+ evolving documents spanning 50+ revision cycles each. At least one third of revisions should contain genuine continuity-significant changes (new evidence, changed conclusions, resolved contradictions) to provide meaningful test cases for CDP prioritization.
Full-state transfer: Complete document transmitted on every change. Perfect fidelity, maximum bandwidth.
Text diff (rsync-style): Only changed lines transmitted. Efficient for structural changes; blind to semantic significance.
Continuity-delta: CDP-scored changes transmitted by priority. High-CDP changes transmitted immediately; low-CDP changes batched. Drift detection triggers full sync if accumulated unbatched CDP exceeds threshold.
Bandwidth consumed per revision cycle. Reconstruction accuracy after simulated packet loss. Contradiction retention (are unresolved contradictions preserved across sync?). Decision traceability (can the rationale for a decision be reconstructed from the delta log?). Drift rate (how quickly does the receiver's state diverge from sender's state under each protocol?).
The Git comparison. Every engineer reading this proposal will ask: "isn't this just git diff with semantic awareness?" The answer is yes and no, and the distinction matters. Git diff operates on syntactic structure — it identifies which lines changed. The Continuity Delta Protocol operates on semantic significance — it identifies which changes affect continuity of meaning. A change that rewrites a paragraph for clarity without altering its claims is a zero-CDP event. The same change in git diff produces a large diff. Conversely, a single-word change that replaces "not liable" with "liable" in a legal document is a maximum-CDP event. Git diff treats it as a minor change.
Rollback and recovery mechanisms. When accumulated deltas produce a state that diverges unacceptably from the sender's state, the protocol requires a rollback mechanism. The Foundation proposes a checkpoint-based rollback: every N deltas (configurable), a full-state snapshot is appended to the delta log. If drift detection triggers a rollback, the receiver restores from the most recent checkpoint and replays only the deltas since that checkpoint. This bounds the maximum recovery cost to O(checkpoint_interval) rather than O(full_history).
Probabilistic error bounds. The CDP scoring function introduces probabilistic uncertainty — a change scored as low-CDP may in fact be high-significance. The protocol should maintain a confidence interval on cumulative CDP, triggering a full synchronization when the interval's upper bound exceeds the session drift limit. This prevents silent divergence when individual delta scores are uncertain.
Semantic drift detection. Semantic drift occurs when accumulated low-CDP deltas progressively shift the meaning of a document in ways that no individual delta's score reflects. Detection requires periodic full-state comparison between sender and receiver, not just delta log verification. The benchmark should include a semantic drift test: 50+ low-CDP changes applied in sequence, measuring when a full-state comparison first detects meaningful semantic divergence.
The Delta Protocol's operation is easiest to understand through a concrete example. The following traces a legal matter record through a complete delta cycle.
Initial state (Day 1). A legal matter record contains: one claim ("the defendant was present at the location"), one supporting source (witness testimony, confidence 0.82), and one open question (jurisdiction for the claim). The full state is synchronized to all team members at initial creation.
Delta 1 — New evidence (Day 3). A second witness provides corroborating testimony. CDP score: FactChange = 0 (same claim), ConfidenceChange = 0.12 (confidence rises from 0.82 to 0.94), ProvenanceChange = 1 (new source added). CDP = (0 × 0.30) + (0.12 × 0.20) + (0 × 0.20) + (1 × 0.15) + (0 × 0.10) + (0 × 0.05) = 0.174. CDP above threshold (0.15): transmitted immediately. Receiver updates confidence and adds source to local record.
Delta 2 — Contradiction introduced (Day 7). A third party disputes the witness location, creating a direct contradiction of the primary claim. CDP score: ContradictionChange = 1 (new contradiction), FactChange = 0.6 (significant semantic shift in claim confidence). CDP = (0.6 × 0.30) + (0 × 0.20) + (1 × 0.20) + (0 × 0.15) + (0 × 0.10) + (0 × 0.05) = 0.38. High CDP: transmitted immediately with FR-3 severity flag. Receiver records the contradiction as unresolved and updates the reliance ceiling of the matter's PCO from RC-4 to RC-3.
Delta 3 — Formatting edit (Day 8). A team member reformats the summary for readability. FactChange = 0 (same claims), all other dimensions = 0. CDP = 0. CDP below threshold: batched for next periodic sync. This structural change does not affect meaning and does not require immediate transmission.
Delta 4 — Contradiction resolved (Day 12). Investigation establishes that the third party's account referred to a different date. ContradictionChange = 1 (contradiction resolved), ConfidenceChange = 0.10 (confidence partially recovers). CDP = (0 × 0.30) + (0.10 × 0.20) + (1 × 0.20) + (0 × 0.15) + (0 × 0.10) + (0 × 0.05) = 0.22. Above threshold: transmitted immediately. Receiver updates contradiction status to resolved and the PCO's reliance ceiling returns to RC-4.
This example demonstrates the core properties: high-CDP changes (new evidence, contradictions, resolutions) transmit immediately; low-CDP changes (formatting, minor updates) batch for periodic sync. The PCO's reliance ceiling responds dynamically to the current contradiction state.
The following lightweight formal notation is intended to signal rigor rather than constitute a full mathematical specification. Full formalization is deferred to the empirical validation phase.
Let S(t) be the state of a document at time t, represented as a structured object containing claims C, sources Q, confidence values K, contradictions X, and decisions D.
A continuity delta δ(t, t+1) = S(t+1) − S(t) is the set of changes between two consecutive states. For each element e of δ(t, t+1), the CDP score CDP(e) is computed as defined in Section II.
The transmission decision function τ(e) = 1 if CDP(e) ≥ θ_immediate, where θ_immediate is the immediate transmission threshold. Otherwise τ(e) = 1 if accumulated_CDP(batch) ≥ θ_batch or elapsed_time ≥ T_max.
Drift detection: drift(t) = ||S_sender(t) − S_receiver(t)||_semantic. If drift(t) ≥ D_max, a full-state synchronization is triggered regardless of individual CDP scores.
The key invariant the protocol must maintain: for all decisions d ∈ D, the causal chain of evidence and contradictions leading to d must be reconstructible from the receiver's delta log.
This section follows the Foundation's institutional practice of explicitly stating known weaknesses, failure modes, and scope boundaries for every proposal. Its presence indicates analytical maturity, not weakness in the underlying proposal.
Semantic equivalence cannot be guaranteed. The CDP scoring function estimates semantic significance using weighted structural signals. Accumulated low-CDP changes may collectively shift interpretation without triggering immediate transmission or drift detection. Drift detection reduces divergence but cannot guarantee zero semantic divergence between checkpoints.
CDP calibration domain-dependence. The proposed weights are domain-general defaults. Legal documents, scientific datasets, and policy records have different continuity profiles requiring domain-specific calibration for high-consequence deployments.
Rollback cost at high checkpoint intervals. At high checkpoint intervals, rollback and replay may be computationally expensive for large documents with many deltas since the last checkpoint.
Adversarial delta injection. A participant with write access could inject low-CDP deltas designed to accumulate into high-significance semantic shifts without triggering detection. The protocol does not include mechanisms for validating delta authorship.
Without semantic delta prioritization, distributed systems synchronizing evolving documents face a binary choice between full-state synchronization (bandwidth-intensive, lossless) and structural diff (bandwidth-efficient, blind to semantic significance). Organizations managing distributed legal matters, scientific datasets, or policy records under bandwidth constraints default to infrequent full synchronization — creating windows during which different participants operate on diverged versions without visibility into the divergence.
What is the correct checkpoint interval for different document types and divergence tolerances? How should CDP scoring adapt for real-time collaborative editing where delta frequency is orders of magnitude higher? Can CDP scoring be made explainable to participants? What is the minimum semantic drift threshold that produces measurable decision-quality differences?
Delta protocol deployments in legal and medical contexts create evidentiary questions: what is the legal status of a delta log as a record of document evolution; how long should delta logs be retained; who has authority to roll back to a prior checkpoint. These governance questions require institutional and potentially regulatory frameworks before high-consequence deployment.
Tridgell, A. and Mackerras, P. (1996). The rsync Algorithm. Technical Report, ANU. · Oster, G. et al. (2006). Data Consistency for P2P Collaborative Editing. CSCW. · Shapiro, M. et al. (2011). Conflict-Free Replicated Data Types. SSS. · EM Foundation. PCO Standards Schema — OCMS v0.1. emfoundation.net/pco-standards-schema.html
Create a JSON schema for continuity delta objects compatible with the OCMS schema at emfoundation.net/pco-standards-schema.html. Build a Python simulator implementing all three synchronization protocols with configurable document set, revision rate, and bandwidth constraint. Implement drift detection with configurable thresholds. Design visual delta timelines showing which changes were transmitted immediately, batched, or deferred. Package as github.com/emfoundation/continuity-delta-protocol. The JSON schema should be submitted as a proposed extension to the OCMS standard through the Foundation's public comment process.
Contact: research@emfoundation.net