Skip to main content

Gateway data model map

This is a schema reference page for the Gateway StateStore under the clean-break conversation/turn architecture. It is intentionally table- and audit-heavy and should be used as a mechanics map, not a system overview.

Quick orientation

This page is a lightweight, human-readable map of the target Gateway StateStore schema: table groups (“bounded contexts”), retention expectations, and sensitivity/PII notes for operators.

For the audited foreign-key vs soft-reference decisions on approval/policy linkage columns, see Gateway FK audit.

This document complements (and does not replace) the broader retention guidance in Data lifecycle and retention.

Bounded-context cluster map

Bounded contexts (table groups)

Identity + scope

Tables:

  • tenants, agents, workspaces, agent_workspaces

Purpose: durable identity and workspace scoping (multi-tenant boundary).

Channels + conversations

Tables:

  • channel_accounts, channel_threads
  • conversations, conversation_state, transcript_events
  • conversation_model_overrides, conversation_provider_pins
  • conversation_send_policy_overrides, conversation_queue_overrides, conversation_queue_signals
  • channel_inbound_dedupe, channel_inbox, channel_outbox

Purpose: connector accounts/threads, durable conversations/transcript retention, per-conversation overrides, and inbound/outbound queueing.

Secrets + auth

Tables:

  • secrets, secret_versions
  • auth_profiles, auth_profile_secrets

Purpose: DB-backed secret handles + encrypted versions, and provider auth profile metadata.

Policy + approvals

Tables:

  • policy_snapshots, policy_overrides
  • approvals, review_entries
  • plans, planner_events

Purpose: durable policy bundles/overrides plus approval, review, and plan/audit surfaces.

Turn processing + durable coordination

Tables:

  • turns, turn_items, turn_pauses, turn_artifacts
  • resume_tokens
  • conversation_leases, workspace_leases
  • idempotency_records, concurrency_slots

Purpose: durable turn orchestration state, immutable per-turn trace items, pause/resume metadata, evidence linkage, and the leases/idempotency primitives that keep conversation progress safe under retries or restarts.

Automation (watchers)

Tables:

  • watchers, watcher_firings

Purpose: durable watcher definitions and their firing/queue history.

Location + location automation

Tables:

  • location_profiles, location_places
  • location_samples, location_subject_states, location_events
  • automation_triggers

Purpose: durable location configuration plus accepted/rejected samples, geofence/category state, emitted events, and location-trigger definitions.

Canvas artifacts

Tables:

  • canvas_artifacts, canvas_artifact_links

Purpose: operator/agent-authored artifacts that can be linked to plans, conversations, work items, or turns.

Context reports

Tables:

  • context_reports

Purpose: structured reports emitted by runtime components to support debugging/observability.

Secret resolution audit

Tables:

  • secret_resolutions

Purpose: audit trail of secret-handle resolutions (success/failure + minimal context).

Presence + backplane

Tables:

  • principals, connections
  • outbox, outbox_consumers
  • presence_entries

Purpose: node/client identity, live connections/capabilities, durable outbox delivery, and TTL presence inventory.

Pairing + OAuth

Tables:

  • node_pairings
  • oauth_pending, oauth_refresh_leases
  • peer_identity_links

Purpose: node pairing workflow and OAuth onboarding/book-keeping.

Managed desktop environments

Tables:

  • desktop_environment_hosts, desktop_environments

Purpose: gateway-managed desktop sandbox inventory, desired state, host health, and runtime status for managed desktop nodes.

Routing + model cache (operator/dev)

Tables:

  • routing_configs
  • models_dev_cache, models_dev_refresh_leases

Purpose: operator-managed routing configuration and a bounded “models list” dev cache/lease.

WorkBoard

Tables:

  • work_items, work_item_tasks, subagents
  • work_item_events, work_item_links
  • work_artifacts, work_decisions
  • work_signals, work_signal_firings
  • work_item_state_kv, agent_state_kv, work_scope_activity

Purpose: durable work tracking + drilldown surfaces for evidence, decisions, signals, and scoped state.

Memory

Tables:

  • memory_items, memory_item_provenance, memory_item_tags, memory_tombstones
  • memory_item_embeddings, vector_metadata

Purpose: durable agent memory (canonical content) plus derived indexes (embeddings/vectors).

Timestamp audit

Tables below were reviewed because they either had mutable-looking fields without updated_at or intentionally model state as append-only/domain-clocked rows.

Table / columnsDecisionReasoning
channel_accounts.statusMutable, add updated_atAccount status is semantic state and previously had no authoritative "last changed" clock. Migration 116_timestamp_audit.sql adds updated_at; DAL status changes must write it.
channel_threadsImmutable row; keep created_at onlyA thread row is the durable mapping from (channel_account_id, provider_thread_id) to Tyrum scope. If provider identity changes, create a new row instead of mutating the existing mapping.
work_signals.status, work_signals.trigger_spec_json, work_signals.payload_jsonMutable, add updated_atSignal definitions are edited and paused/resumed after creation. last_fired_at only captures deliveries, not config/status edits, so updated_at is the missing operational clock.
approvals.status, approvals.resume_token, approvals.resolution_jsonMutable, but keep lifecycle clocksApproval rows are mostly single-transition records. created_at, expires_at, and resolved_at already answer when the request was pending, expired, or resolved.
conversation_state.summary_json, conversation_state.pending_jsonMutable projection; keep updated_atConversation state is the mutable continuity layer. Operators need one authoritative freshness clock plus any checkpoint-specific clocks.
transcript_eventsAppend-only; keep created_at onlyTranscript history is immutable event data. New history appends rather than mutating old rows.
turns.status, turns.blocked_reason, turns.outcome_jsonMutable, but keep lifecycle clocksTurns should expose queued_at, started_at, finished_at, and any pause/cancel clocks that answer operator questions directly.
turn_pauses.status, turn_pauses.resolution_jsonMutable, but keep lifecycle clocksPause rows are single-transition records. created_at and resolved_at answer when work blocked and when it resumed or was denied.
turn_artifacts.retention_expires_at, turn_artifacts.bytes_deleted_at, turn_artifacts.metadata_jsonMutable, but keep retention clocksArtifact lifecycle work should use explicit retention clocks rather than a generic catch-all mutation timestamp.
routing_configsImmutable by revision; keep created_at onlyRouting config updates already create a new revision row; operators sort by revision/created_at rather than in-place mutation time.
policy_snapshotsImmutable snapshot; keep created_at onlySnapshots are content-addressed audit records. New policy state is modeled as a new snapshot row.
channel_inboxMutable, but keep domain clocksQueue state already has purpose-built timestamps: received_at_ms, processed_at, and lease_expires_at_ms. Adding a generic updated_at would duplicate those semantics.
channel_outboxMutable, but keep domain clocksDelivery lifecycle is already represented by created_at, sent_at, and lease_expires_at_ms, which answer the operator questions that matter here.
connectionsMutable, but keep heartbeat clocksConnection freshness is defined by last_seen_at_ms/expires_at_ms; a generic updated_at would be less precise than the existing heartbeat fields.
conversation_leases, workspace_leases, oauth_refresh_leases, models_dev_refresh_leasesMutable, but keep expiry clock onlyLease ownership changes are operationally meaningful only through the current expiry window, so lease_expires_at_ms is the authoritative freshness signal.

This audit therefore adds generic updated_at only to channel_accounts, conversation_state, and work_signals. The remaining reviewed mutable tables are intentionally modeled with either append-only rows or more specific lifecycle clocks than a generic "last changed" timestamp.

Retention + sensitivity (PII map)

The table below is intentionally coarse-grained: it’s meant to help operators decide what needs strict retention/redaction controls and what can be safely pruned when a deployment is under storage pressure.

Area (bounded context)Typical retentionPII/sensitive content?Primary “hot” tables to watchNotes / pruning posture
Identity + scopeDurableLowtenants, agents, workspacesKeep; small cardinality.
Channels + conversationsDurable + bounded queuesHigh (user content)conversations, channel_inbox, channel_outboxConversation transcripts are bounded by compaction; terminal channel rows follow explicit lifecycle windows.
Secrets + authDurableSecretssecret_versions, auth_profile_secretsTreat as highly sensitive even if encrypted-at-rest.
Policy + approvalsDurableMedium–Highapprovals, review_entries, policy_snapshotsApprovals/prompts and review evidence can contain user content; policy bundles are security-critical.
Turn processingDurable + bounded logsMedium–Highturns, turn_artifacts, idempotency_recordsOutputs/errors may contain user data; prune/compact under explicit policy.
Automation (watchers)Durable + bounded historyLow–Mediumwatcher_firingsHistory can grow; safe to prune old firings.
Location + location automationDurable config + bounded historyHigh (location)location_samples, location_events, automation_triggersKeep profiles/places/triggers durable; prune raw samples and old event history under explicit retention policy.
Canvas artifactsDurable (budgeted)Medium–Highcanvas_artifactsHuman-authored content; bound by budgets/retention.
Context reportsBoundedMedium–Highcontext_reportsDebug payloads can contain user content; keep short retention by default.
Secret resolution auditBoundedMediumsecret_resolutionsKeep for audit/security; prune with policy if needed.
Presence + backplaneTTL + bounded replayMediumpresence_entries, connections, outboxPresence/connections are TTL; outbox is durable but must be compacted.
Pairing + OAuthTTL + boundedHigh (tokens/identifiers)oauth_pending, node_pairingsOAuth pending rows are short-lived; pairing contains device metadata.
Managed desktop environmentsDurable + bounded logsMedium–Highdesktop_environments, desktop_environment_hostsControl-plane state is durable; attached logs and takeover metadata are operationally sensitive.
Routing + model cacheDurable + TTL leasesLow–Mediumrouting_configs, models_dev_cacheConfig may contain endpoints/keys-by-handle; treat as sensitive metadata.
WorkBoardDurable (budgeted)Medium–Highwork_artifacts, work_decisions, work_item_eventsEvidence/decisions can include user content; history can grow without budgets.
MemoryDurable + derived indexesHigh (user content)memory_items, memory_item_embeddingsCanonical memory is durable; embeddings/vectors are derived and can be rebuilt.

Pruning checklist

When storage pressure increases, prune in this order (least risky → most sensitive). Always align with your deployment’s retention policy and audit requirements.

  1. TTL / derived surfaces (safe first)

    • presence_entries, connections (remove expired rows)
    • channel_inbound_dedupe (remove expired rows)
    • oauth_pending (remove expired rows)
    • conversation_leases, workspace_leases, models_dev_refresh_leases, oauth_refresh_leases (remove expired/abandoned leases)
    • memory_item_embeddings, vector_metadata (derived; safe to rebuild)
  2. Queue history + high-volume operational logs

    • channel_inbox, channel_outbox (retain only what’s needed for delivery/audit; prune terminal rows under the configured lifecycle windows)
    • watcher_firings (prune old, terminal firings)
    • context_reports (short retention; prune aggressively)
    • location_samples (high-volume sensor history; prune under explicit location retention windows)
    • idempotency_records, concurrency_slots (bounded retention; prune old terminal rows)
  3. Durable “why did it do that?” surfaces (prune only with policy)

    • turns / turn_artifacts metadata (coordinate with artifact-byte retention)
    • location_events and review_entries (retain long enough to answer trigger and safety/audit questions)
    • WorkBoard history tables: work_item_events, work_signal_firings, and oversized work_artifacts / work_decisions bodies
  4. Canonical truth (avoid ad-hoc deletion)

    • conversations, transcript_events, conversation_state, and memory_items are the durable system-of-record for continuity and memory. Only prune via explicit operator policy (and consider tombstones/export requirements).
    • approvals / policy_* / routing_configs are security/audit sensitive; delete only with explicit policy and change-control.

Migration notes (SQLite vs Postgres)

  • Canonical reference: treat this page plus ARCH-20 conversation and turn clean-break decision as the target-state schema contract until the persistence cutover in issue #1823 lands.
  • SQLite: altering tables is constrained; prefer additive migrations and avoid editing already-applied migrations. When a rebuild is unavoidable, prefer explicit copy + rebuild patterns and document the operational impact (downtime, backup/restore).
  • Postgres: uses typed columns (UUID, TIMESTAMPTZ, JSONB, …) and CASCADE drops for rebuilds. Keep JSON fields as JSONB for queryability and consider GIN indexes where warranted by measured query patterns.