Monitoring infrastruktur IT adalah krusial untuk memastikan availability dan performance. Prometheus dan Grafana adalah kombinasi modern yang powerful untuk monitoring dan observability. Artikel ini membahas setup lengkap monitoring stack untuk server Linux.
Arsitektur Monitoring Stack
Komponen:
1. Prometheus: Time-series database untuk metrics
2. Grafana: Dashboard dan visualization
3. Node Exporter: Agent untuk Linux system metrics
4. Alertmanager: Alert routing dan management
1. Instalasi Prometheus
Download dan Setup Prometheus
# Buat user untuk prometheus
sudo useradd --no-create-home --shell /bin/false prometheus
Download Prometheus (cek versi terbaru di github.com/prometheus/prometheus)
cd /tmp
wget
https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
Extract
tar xvfz prometheus-2.45.0.linux-amd64.tar.gz
Copy binaries
sudo cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
sudo cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/
Set ownership
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
Buat direktori config dan data
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
Konfigurasi Prometheus
sudo nano /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'prometheus'
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
rule_files:
- "alert_rules.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['localhost:9100']
- job_name: 'remote_servers'
static_configs:
- targets: ['192.168.1.10:9100', '192.168.1.11:9100']
Alert Rules
sudo nano /etc/prometheus/alert_rules.yml
groups: - name: node_alerts rules: - alert: HighCPUUsage expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 5m labels: severity: warning annotations: summary: "High CPU usage detected" description: "CPU usage is above 80% for more than 5 minutes"- alert: HighMemoryUsage expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85 for: 5m labels: severity: warning annotations: summary: "High memory usage detected" description: "Memory usage is above 85%" - alert: DiskSpaceLow expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100) < 10 for: 5m labels: severity: critical annotations: summary: "Low disk space" description: "Less than 10% disk space remaining" - alert: NodeDown expr: up{job="node_exporter"} == 0 for: 1m labels: severity: critical annotations: summary: "Node exporter is down" description: "Node exporter has been down for more than 1 minute"Systemd Service
sudo nano /etc/systemd/system/prometheus.service[Unit] Description=Prometheus Monitoring System Documentation=https://prometheus.io/docs/introduction/overview/ Wants=network-online.target After=network-online.target[Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/prometheus \ --config.file=/etc/prometheus/prometheus.yml \ --storage.tsdb.path=/var/lib/prometheus/ \ --storage.tsdb.retention.time=30d \ --web.console.templates=/etc/prometheus/consoles \ --web.console.libraries=/etc/prometheus/console_libraries \ --web.listen-address=0.0.0.0:9090 Restart=always RestartSec=5
[Install] WantedBy=multi-user.target
# Reload systemd sudo systemctl daemon-reloadEnable dan start
sudo systemctl enable prometheus sudo systemctl start prometheus
Cek status
sudo systemctl status prometheus
2. Instalasi Node Exporter
Node Exporter adalah agent yang meng-export system metrics untuk Prometheus.
# Download (adjust versi) cd /tmp wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gzExtract dan install
tar xvfz node_exporter-1.6.1.linux-amd64.tar.gz sudo cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/ sudo chown prometheus:prometheus /usr/local/bin/node_exporter
Node Exporter Systemd Service
sudo nano /etc/systemd/system/node_exporter.service[Unit] Description=Node Exporter Wants=network-online.target After=network-online.target[Service] User=prometheus ExecStart=/usr/local/bin/node_exporter \ --path.rootfs=/host \ --collector.filesystem.ignored-mount-points='^/(sys|proc|dev|run)($|/)' \ --collector.netdev.ignored-devices='^(lo|docker. |veth.|br-.*)$' Restart=always RestartSec=5
[Install] WantedBy=multi-user.target
# Enable dan start sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter3. Instalasi Grafana
# Install dependencies sudo apt-get install -y apt-transport-https software-properties-common wgetAdd Grafana repository
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
Update dan install
sudo apt-get update sudo apt-get install -y grafana
Enable dan start
sudo systemctl daemon-reload sudo systemctl enable grafana-server sudo systemctl start grafana-server
Cek status
sudo systemctl status grafana-server
4. Konfigurasi Grafana
Akses Grafana
Buka browser dan akses:
http://your-server-ip:3000Default credentials:
– Username: admin
– Password: admin (akan diminta ganti saat pertama login)Add Prometheus Data Source
- Klik Configuration → Data Sources
- Click “Add data source”
- Select “Prometheus”
- Set URL:
http://localhost:9090- Click “Save & Test”
Import Dashboard
- Klik “+” → Import
- Import ID
1860(Node Exporter Full)- Select Prometheus data source
- Click Import
Dashboard lain yang direkomendasikan:
– Node Exporter: ID 1860
– Docker Monitoring: ID 179
– Linux Hosts: ID 10180
– MySQL Overview: ID 73625. Setup Alertmanager
Instalasi
cd /tmp wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz tar xvfz alertmanager-0.25.0.linux-amd64.tar.gz sudo cp alertmanager-0.25.0.linux-amd64/alertmanager /usr/local/bin/ sudo cp alertmanager-0.25.0.linux-amd64/amtool /usr/local/bin/ sudo chown prometheus:prometheus /usr/local/bin/alertmanager sudo chown prometheus:prometheus /usr/local/bin/amtoolBuat direktori config
sudo mkdir /etc/alertmanager sudo chown prometheus:prometheus /etc/alertmanager
Konfigurasi Alertmanager
sudo nano /etc/alertmanager/alertmanager.ymlglobal: smtp_smarthost: 'smtp.gmail.com:587' smtp_from: '[email protected]' smtp_auth_username: '[email protected]' smtp_auth_password: 'your-email-password'templates:
- '/etc/alertmanager/template/*.tmpl'
route: receiver: 'email-notifications' group_by: ['alertname', 'severity'] group_wait: 10s group_interval: 10s repeat_interval: 1h
receivers:
- name: 'email-notifications' email_configs:
- to: '[email protected]' subject: 'Prometheus Alert: {{ .GroupLabels.alertname }}' body: | {{ range .Alerts }} Alert: {{ .Annotations.summary }} Description: {{ .Annotations.description }} Severity: {{ .Labels.severity }} {{ end }}
inhibit_rules:
- source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'instance']
Alertmanager Service
sudo nano /etc/systemd/system/alertmanager.service[Unit] Description=Alertmanager Wants=network-online.target After=network-online.target[Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/alertmanager \ --config.file=/etc/alertmanager/alertmanager.yml \ --storage.path=/var/lib/alertmanager \ --web.listen-address=0.0.0.0:9093 Restart=always RestartSec=5
[Install] WantedBy=multi-user.target
# Enable dan start sudo systemctl daemon-reload sudo systemctl enable alertmanager sudo systemctl start alertmanager6. Custom Metrics dengan Textfile Collector
Node Exporter bisa membaca custom metrics dari text files.
# Buat script untuk generate metrics sudo nano /usr/local/bin/custom-metrics.sh#!/bin/bash TEXTFILE=/var/lib/node_exporter/textfile_collectorBackup metrics
backup_count=$(find /backup -name "*.tar.gz" -mtime -1 | wc -l) echo "backups_completed_today $backup_count" > "$TEXTFILE/backup.prom"
Application metrics
app_errors=$(tail -100 /var/log/app.log | grep -i error | wc -l) echo "application_errors $app_errors" >> "$TEXTFILE/app.prom"
# Jadikan executable sudo chmod +x /usr/local/bin/custom-metrics.shBuat direktori
sudo mkdir -p /var/lib/node_exporter/textfile_collector sudo chown -R prometheus:prometheus /var/lib/node_exporter
Add ke cron
crontab -e
/5 * /usr/local/bin/custom-metrics.sh
7. Query Examples (PromQL)
CPU Usage
# CPU usage percentage 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)CPU usage per core
100 - (avg by(instance, cpu) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory Usage
# Memory usage percentage (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100Memory usage in GB
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / 1024 / 1024 / 1024
Disk Usage
# Disk usage percentage (node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_avail_bytes{mountpoint="/"}) / node_filesystem_size_bytes{mountpoint="/"} * 100Disk free space
node_filesystem_avail_bytes{mountpoint="/"} / 1024 / 1024 / 1024
Network
# Network traffic rate irate(node_network_receive_bytes_total[5m]) irate(node_network_transmit_bytes_total[5m])8. Backup dan Maintenance
Backup Prometheus Data
#!/bin/bash # backup-prometheus.shDATE=$(date +%Y%m%d) BACKUP_DIR=/backup/prometheus mkdir -p $BACKUP_DIR
Stop Prometheus
sudo systemctl stop prometheus
Backup data
tar -czf $BACKUP_DIR/prometheus-data-$DATE.tar.gz /var/lib/prometheus/
Backup config
tar -czf $BACKUP_DIR/prometheus-config-$DATE.tar.gz /etc/prometheus/
Start Prometheus
sudo systemctl start prometheus
Keep 7 days
find $BACKUP_DIR -name "*.tar.gz" -mtime +7 -delete
Maintenance Rutin
# Check disk usage Prometheus du -sh /var/lib/prometheus/Compact data (Prometheus melakukan ini otomatis)
Manual compaction (hati-hati!)
promtool tsdb analyze /var/lib/prometheus/
Kesimpulan
Setup Prometheus dan Grafana memberikan:
- Real-time metrics untuk system resources
- Historical data untuk trend analysis
- Alerting untuk proactive monitoring
- Beautiful dashboards untuk visualization
- Scalable architecture untuk multi-server monitoring
Stack ini adalah standar industry untuk monitoring infrastruktur modern dan sangat recommended untuk production environments.
Ditulis oleh
Hendra Wijaya