Replay Jobs

Persist replay work as local jobs instead of one-off terminal commands.

The persisted replay workflow is now backed by SQLite and exposed through both CLI commands and HTTP endpoints. Jobs survive process restarts and track status plus replay progress over time. Timestamp jobs resolve their window to concrete offsets before they are saved, jobs can store an optional messages-per-second cap, and job reads include derived progress metrics. Running jobs can be cancelled cooperatively through the API or job CLI.

Default Storage

data/lighthouse.sqlite

Override it with LIGHTHOUSE_DB_PATH when you need a different location.

Job Lifecycle

draft: created but not started
running: replay is in progress
completed: replay or dry-run finished successfully
failed: replay exited with an error
cancelled: job was cancelled before completion

Commands

npm run replay:jobs -- create --source orders --destination orders-replay --partition 0 --start 10 --end 25 --job-id incident-2026-04-28
npm run replay:jobs -- create --source orders --destination orders-replay --partition 0 --start-timestamp 2026-04-28T14:03:00.000Z --end-timestamp 2026-04-28T14:08:00.000Z --job-id incident-window-2026-04-28
npm run replay:jobs -- create --source orders --destination orders-replay --partition 0 --start 10 --end 25 --messages-per-second 10 --job-id incident-throttled-2026-04-28
npm run replay:jobs -- start --job-id incident-2026-04-28
npm run replay:jobs -- list
npm run replay:jobs -- show --job-id incident-2026-04-28
npm run replay:jobs -- cancel --job-id incident-2026-04-28

Dry-Run Jobs

A dry-run job uses the same persisted workflow. It reads and previews the requested records but does not connect a producer or write to the destination topic.

What the Job Record Stores

job id
source topic and destination topic
partition, replay mode, resolved start offset, and resolved end offset
original timestamp window for timestamp jobs
status
dry-run flag
optional messages-per-second cap
replayed count and total message count
derived progress snapshot returned at read time with percent, throughput, elapsed time, and ETA
created, started, and completed timestamps
error message and last replayed offset

Cancellation is cooperative. API-started jobs are aborted in-process; CLI-started jobs stop after the running process observes the persisted cancelled state.

For repo-facing details, see docs/REPLAY_JOBS.md and docs/REPLAY_API.md.