Skip to main content

Operational Table Maintenance Contract

This is a mechanics/reference page for bounded operational tables: what prunes them, what safety rule applies, and what operators can rely on during recovery.

Quick orientation

  • Read this if: you are debugging prune behavior, sizing retention, or validating recovery assumptions.
  • Skip this if: you only need the high-level cluster model.
  • Go deeper: pair this with Data lifecycle and retention, Backplane, and Presence.

Maintenance loops

Maintenance matrix

TableBounded byPrune pathSafety rule
presence_entriesexpires_at_ms TTL; heartbeat capsscheduler sweep plus heartbeat cleanupPresence is derived inventory; delete only expired rows.
connectionsexpires_at_ms TTLscheduler sweep plus edge heartbeat cleanupActive owners must keep refreshing TTL.
channel_inbound_dedupeexpires_at_ms TTLinline best-effort cleanup plus scheduler sweepReplay protection remains valid inside the configured dedupe window.
channel_inboxterminal retention windowscheduler sweepCompleted rows wait for dependent channel_outbox cleanup; failed rows age out after the window.
channel_outboxinline success deletion; terminal retention for failuresdelivery path plus scheduler sweepCanonical transcript recovery must not depend on retained outbox rows.
conversation_leaseslease_expires_at_ms TTLscheduler sweepOnly expired leases are removed.
workspace_leaseslease_expires_at_ms TTLscheduler sweepActive owners must renew before expiry.
oauth_pendingexpires_atscheduler sweep plus callback consumptionLive auth handshakes must survive until their advertised expiry.
oauth_refresh_leaseslease_expires_at_ms TTLscheduler sweepDelete only expired refresh ownership.
models_dev_refresh_leaseslease_expires_at_ms TTLscheduler sweepRefresh lease cleanup must not delete the separate cache row.
outboxtime retention window, default 24houtbox scheduler sweepReplay history is bounded; durable state remains authoritative after prune.
outbox_consumersstale updated_at against same retention windowoutbox scheduler sweepRemove only stale cursors that have stopped advancing.

What operators should expect

SituationExpected behavior
Normal operationBackground jobs keep bounded tables from growing without limit.
Clustered deploymentPrune loops run under a single-writer lock or lease so replicas do not race cleanup.
Tick failureThe system logs statestore.lifecycle_tick_failed or outbox.lifecycle_tick_failed and increments error counters.
Recovery after pruneCritical state must still be reconstructible from durable tables and APIs, not only from operational buffers.

Explicit non-goals

  • models_dev_cache does not need a prune loop because it is bounded by a singleton primary key.
  • Successful channel_outbox rows are already removed inline; a second background success-prune job would duplicate that lifecycle.

Observability hooks

Use these as the first stop when checking maintenance health:

  • lifecycle_prune_rows_total{scheduler="statestore",table="..."}
  • lifecycle_prune_rows_total{scheduler="outbox",table="..."}
  • statestore.lifecycle_pruned
  • outbox.lifecycle_pruned
  • lifecycle_tick_errors_total{scheduler=...}