← Back to library
Security Practice

Monitoring and Backup: So You Don't Sleep Through a Failure

A service almost always fails quietly: a node dropped at four in the morning, the host ran out of balance and the server was shut off, a cert expired — and you find out from the flood of complaints in the bot. Below is how to set up monitoring with Telegram alerts in a couple of minutes and configure a DB backup that actually saves you.

This material is about engineering your own infrastructure and is educational in nature. You comply with the laws of your own jurisdiction yourself.

The main trouble isn't that something breaks — everything breaks, always. The trouble is that you're the last to find out. The job of monitoring is to find out first, before the clients. The job of backup is to have somewhere to recover from when you found out too late.

What to actually monitor

Not everything at once — here's the minimum whose loss takes the service down:

  • Nodes — went offline, clients have no exit. We watch the status in the panel plus external uptime.
  • Subscription page — not responding, clients can't fetch the config. An HTTP check of the subscription URL.
  • Node disk and RAM — filled up, xray crashed. node_exporter or a simple script.
  • Host balance — dropped to zero, the server got shut off. Reminders, auto-topup.
  • Domain and certificate — cert expired, everything's down. Expiry monitoring and auto-renew.

Uptime Kuma + Telegram alerts

A lightweight monitor with notifications to a bot comes up in a couple of minutes. It listens only on 127.0.0.1 and is exposed via a reverse proxy:

bash
docker run -d --restart=always -p 127.0.0.1:3001:3001 \
  -v uptime-kuma:/app/data --name uptime-kuma louislam/uptime-kuma:1

Then you open it behind the reverse proxy and add monitors:

  • TCP to each node — SERVER_IP:443.
  • HTTP to the subscription page — its public URL.

Under Notifications → Telegram you enter your bot's token and chat_id — a message arrives on failure. Set the interval to 60 seconds with 2–3 retries so the monitor doesn't ping you on every network blink.

DB backup: mandatory and off-site

No DB backup means a seizure or crash equals losing all your clients. The panel's PostgreSQL is dumped with one command:

bash
cd /opt/remnawave
docker compose exec -T remnawave-db pg_dump -U postgres postgres > backup-$(date +%F).sql
tar czf conf-$(date +%F).tgz .env docker-compose.yml

You'll forget to do this by hand, so we put it on cron — nightly, daily, with compression:

cron
0 4 * * * root cd /opt/remnawave && docker compose exec -T remnawave-db pg_dump -U postgres postgres | gzip > /root/backups/db-$(date +\%F).sql.gz

And — most importantly — you upload the backup off-site, to a different provider:

bash
# rclone is configured beforehand (rclone config), then one line in the same cron
rclone copy /root/backups remote:vpnhub-backups

A backup on the same server that got seized or deleted is useless. Keep copies elsewhere — another cloud, another provider — and store 7–14 generations.

Restore (and verifying it works)

A dump that's never been restored isn't a backup, it's hope. Test the restore on a spare server regularly:

bash
cd /opt/remnawave
docker compose up -d remnawave-db                 # bring up the DB only
gunzip < /root/backups/db-2026-07-02.sql.gz | \
  docker compose exec -T remnawave-db psql -U postgres postgres   # load the dump
docker compose up -d                              # bring up the panel

After loading, check with your own eyes: users, nodes, and squads are in place, nodes came up online. A "backup that doesn't restore" is the absence of a backup.

Certificates: auto-renewal

An expired cert takes everything down at once, and that's the most galling kind of death — because it's prevented for free:

bash
# certbot sets up the timer itself — check that it's there
systemctl list-timers | grep certbot

# once a quarter run a dry renew to check
certbot renew --dry-run

"Don't-burn-money" hygiene

Monitoring is also about margin:

  • Delete extra and disabled VPS — they tick against your bill, this is the main quiet killer of profit.
  • Set up balance reminders for your hosts — some providers send them by email or Telegram.
  • Build one dashboard with the status of all nodes, the subscription, and balances, so you see everything on one screen.

Checklist

  • [ ] Uptime Kuma is up, monitoring the nodes and the subscription page.
  • [ ] Alerts land in Telegram, 60s interval + retries.
  • [ ] DB backup on cron daily, with compression.
  • [ ] Backups are uploaded off-site, 7–14 copies kept.
  • [ ] Restore is tested on a spare server.
  • [ ] Certificate auto-renew works.

A backup before any panel changes is a basic of hardening (see "Panel Hardening"). And the response to a caught failure — moving to a white IP, cascades, spare nodes — we cover in the sections on block circumvention and cascades.

Next guide Cloudflare ECH and the Russian Block: Why the Site Won't Load → Article unclear or something off? Message me and I will help or fix it. @notrealvpn →
This material is educational and covers network-infrastructure engineering. You are responsible for complying with the laws of your jurisdiction.