Monitoring a Small VPS with Prometheus and Grafana

Monitoring often gets postponed because it sounds like it's for a different scale of operation, a dozen servers, a dedicated ops team, dashboards in a NOC. For one small VPS running a client's application, that framing makes "set up monitoring" feel like overkill compared to just SSHing in and running htop when something feels slow. But the actual setup for a single server is small, and the difference between "SSH in when something feels slow" and "look at a graph of the last week" is bigger than the setup effort suggests.

The pieces, for one server

node_exporter runs on the server itself and exposes system metrics, CPU, memory, disk, network, in a format Prometheus understands:

wget https://github.com/prometheus/node_exporter/releases/download/v1.8.0/node_exporter-1.8.0.linux-amd64.tar.gz
tar xzf node_exporter-*.tar.gz
sudo mv node_exporter-*/node_exporter /usr/local/bin/

A small systemd unit keeps it running:

# /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
ExecStart=/usr/local/bin/node_exporter
Restart=always

[Install]
WantedBy=multi-user.target

Prometheus itself can run on the same VPS for a single-server setup, or on a separate small monitoring VPS if you'd rather it survive the server it's watching having a bad day. Either way, the config is short:

# prometheus.yml
scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

Grafana connects to Prometheus as a data source and turns those metrics into dashboards, the community dashboard for node_exporter (a well-known dashboard ID that's been stable for years) gives you CPU, memory, disk, and network graphs without building anything from scratch.

What this actually buys you

The obvious answer is graphs, but the more useful answer is history. "The server feels slow today" becomes "memory usage has climbed steadily over the past two weeks and is now near the limit", which is a completely different, and much more actionable, statement. A memory leak, a slowly filling disk, a creeping increase in load average as traffic grows, all of these are invisible in a single htop snapshot and obvious in a week-long graph.

Alerting: the part worth doing even minimally

Dashboards require someone to look at them. Alerting means the server tells you when something's wrong. Even a minimal Alertmanager setup, two or three rules, covers the situations that actually matter for a small VPS:

# alert.rules.yml
groups:
  - name: basic
    rules:
      - alert: DiskSpaceLow
        expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} < 0.10
        for: 30m
        annotations:
          summary: "Disk space below 10% on {{ $labels.instance }}"

      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 0.90
        for: 15m
        annotations:
          summary: "Memory usage above 90% on {{ $labels.instance }}"

These two rules alone catch the two most common "the server has been slowly degrading and nobody noticed" scenarios, and Alertmanager can send the result to email, Slack, or a webhook, whatever already gets attention.

Scaling it up costs nothing extra later

The reason this is worth setting up even for one server is that the setup doesn't change when a second server shows up, it just gets another scrape_configs entry and another node_exporter install. There's no "monitoring project" to kick off later, no migration from "no monitoring" to "monitoring", just one more target in a config file that was already there. Starting with one server means the second, third, and tenth servers join a system that already works, rather than being the trigger for building one under pressure.

The pieces, for one server

node_exporter runs on the server itself and exposes system metrics, CPU, memory, disk, network, in a format Prometheus understands:

wget https://github.com/prometheus/node_exporter/releases/download/v1.8.0/node_exporter-1.8.0.linux-amd64.tar.gz tar xzf node_exporter-*.tar.gz sudo mv node_exporter-*/node_exporter /usr/local/bin/

A small systemd unit keeps it running:

# /etc/systemd/system/node_exporter.service [Unit] Description=Node Exporter After=network.target [Service] ExecStart=/usr/local/bin/node_exporter Restart=always [Install] WantedBy=multi-user.target

# prometheus.yml scrape_configs: - job_name: 'node' static_configs: - targets: ['localhost:9100']

What this actually buys you

Alerting: the part worth doing even minimally

# alert.rules.yml groups: - name: basic rules: - alert: DiskSpaceLow expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} < 0.10 for: 30m annotations: summary: "Disk space below 10% on {{ $labels.instance }}" - alert: HighMemoryUsage expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) > 0.90 for: 15m annotations: summary: "Memory usage above 90% on {{ $labels.instance }}"

Scaling it up costs nothing extra later