Backup and scaling
Choose the right data layout for one node or many, and make sure the review state can be restored.
Default: SQLite single node
SQLite is the default because it is operationally simple and good enough for a single maintainer instance. The tradeoff is obvious: if the volume is lost, review state is lost.
Continuous backup with Litestream
BACKUP_ACKNOWLEDGED=true
LITESTREAM_ACCESS_KEY_ID=<key>
LITESTREAM_SECRET_ACCESS_KEY=<secret>
LITESTREAM_ENDPOINT=s3.example.com
LITESTREAM_REGION=us-east-1docker compose --profile litestream up -dbashScheduled backups
The bundled backup profile writes the active app database to the gittensory-backups volume. SQLite installs use an online backup; Postgres installs use pg_dump. The same run also snapshots Qdrant when it is enabled.
docker compose --profile backup up -dbashRetention: how many backups are kept
Each run keeps the newest BACKUP_RETAIN backups (default 7) — applied independently per target: postgres/, sqlite/, and qdrant/ in the gittensory-backups volume each retain their own newest 7, not 7 combined across all three. Set it in .env to change the window:
BACKUP_RETAIN=14scripts/backup.sh's normalize_backup_retain guards against misconfiguration rather than failing the run: a non-numeric or empty value falls back to 7 with a logged warning, and BACKUP_RETAIN=0 is coerced up to 1 (a retention window of zero would delete the backup the script just took, so the script refuses that rather than leaving you with nothing).
.backup command itself fails, the output file is empty, or its PRAGMA integrity_check doesn't come back ok — the script deletes the bad output, logs the failure, and — critically — skips the retention prune for the sqlite target on that run, so a broken backup can never push a known-good one out of the retained window. Postgres and Qdrant retention still run normally on that same pass, since only the SQLite leg failed. The run still exits non-zero so the failure is loud.Multi-instance: Postgres and Redis
- Postgres
- Use DATABASE_URL for a shared database and queue claiming with SKIP LOCKED semantics.
- Redis
- Use REDIS_URL for distributed rate limiting, webhook deduplication, and shared short-lived caches.
- PgBouncer
- Use the pgbouncer profile when many replicas need pooled database connections.
POSTGRES_PASSWORD=<password>
DATABASE_URL=postgres://gittensory:<password>@pgbouncer:5432/gittensory
REDIS_URL=redis://redis:6379
QDRANT_URL=http://qdrant:6333docker compose --profile pgbouncer --profile qdrant up -dbashPgBouncer pools connections between instances and Postgres. Each app instance still opens its own connection pool to whatever it's pointed at (PgBouncer or Postgres directly), shared by every HTTP handler and queue worker in that instance — set PGPOOL_MAX (default 10) if a single instance needs more headroom than that under real concurrency (many registered repos, higher QUEUE_CONCURRENCY). Raise it gradually and watch for GittensoryPostgresConnectionPressure: that alert means you're approaching Postgres's own max_connections, a different ceiling than this per-instance pool size.
One-time SQLite to Postgres copy
Existing SQLite installs can copy state into a fresh Postgres database with the bundled migrator. It dry-runs by default and only commits when --execute is present.
export DATABASE_URL=postgres://gittensory:<password>@pgbouncer:5432/gittensory
npm run selfhost:postgres:migrate -- --sqlite /data/gittensory.sqlite
npm run selfhost:postgres:migrate -- --sqlite /data/gittensory.sqlite --executebashRestore checks
- Restore to a separate host or volume, never over the live instance first.
- Boot the app and confirm
/readyreturns 200. - Confirm migrations do not fail or reapply incorrectly.
- Confirm recent review rows and job state are present.
Verify a backup is restorable
The backup profile ships verify-backup.sh, which checks the newest backup without touching the live database: Postgres .dump archives with pg_restore --list, and SQLite .sqlite.gz backups with a gzip and integrity_check pass. Run it against the newest backup, or a specific file:
docker compose --profile backup run --rm backup sh /verify-backup.sh
docker compose --profile backup run --rm backup sh /verify-backup.sh /backups/postgres/gittensory-<timestamp>.dumpbashA healthy run ends with [verify] postgres archive OK: … (N TOC entries) (or [verify] sqlite backup OK), then [verify] complete, and exits 0. Corruption, a missing backup, or an empty archive exits non-zero with a [verify] reason.
To prove a dump actually restores, opt into a scratch restore into a throwaway database — never the live one:
docker compose --profile backup run --rm \
-e VERIFY_RESTORE_SCRATCH=1 \
-e GITTENSORY_VERIFY_SCRATCH_DATABASE_URL=postgres://user:pass@host:5432/gittensory_verify \
backup sh /verify-backup.shbashpg_restore --clean against GITTENSORY_VERIFY_SCRATCH_DATABASE_URL, so point it at a dedicated database you can afford to drop. The script refuses to run when that URL equals the live backup source.Restore drill: what "restore-tested" actually verifies
This exact flow was run against a real production backup on a live instance on 2026-07-04 (backup gittensory-20260704T090939Z.dump): the dump was restored into a throwaway, network-isolated scratch database (a separate container, never the live one), which the script's own identity check confirmed was distinct from the backup source before touching anything. The restore completed cleanly and, at the time of this drill, repopulated all 84 application tables, including the largest operational tables with their full row counts intact (hundreds of thousands of rows in the biggest tables) — not just an empty schema. Table and row counts will grow over time; treat them as a point-in-time result, not an invariant.
This proves the backup content and the restore path both work end-to-end against real data. It deliberately stops short of booting a full app instance against the scratch database and polling /ready: that endpoint also gates on live Redis, Qdrant, the configured AI provider, Codex auth, and a real GitHub App key (see Operations's health endpoints section) — reproducing all of those for a disposable scratch instance would mean copying real credentials into new, throwaway infrastructure, which is a bigger risk than the drill is worth. This drill proves the dump can be restored and its contents inspected at the database layer — it does not exercise the app's own db readiness probe, migration boot path, or /ready response. A full disaster-recovery rehearsal still needs to verify app readiness on the target infrastructure, using the operator's own real credentials.
After scaling, revisit Operations and Security because network and credential boundaries change.