Prerequisites

Basic Linux command line skills, System Administration Fundamentals(useradd, usermod, groups, systemctl, journalctl, ps, top, htop, ufw, firewalld, iptables), Linux Server Access

How to Setup Prometheus and Grafana on Linux?

To setup prometheus grafana linux monitoring stack, you need to install Prometheus (metrics collection engine) on port 9090, deploy Node Exporter (port 9100) for system metrics, install Grafana (visualization platform) on port 3000, and configure Grafana to query Prometheus as a data source. This modern monitoring architecture provides real-time visibility into Linux system performance, resource utilization, and application health with customizable dashboards.

Quick Installation Commands:

# Install Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
sudo mv prometheus-2.48.0.linux-amd64 /opt/prometheus

# Install Grafana (Ubuntu/Debian)
sudo apt-get install -y software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt-get update && sudo apt-get install grafana

# Start services
sudo systemctl start prometheus grafana-server
sudo systemctl enable prometheus grafana-server

Access Points:

  • Prometheus UI: http://localhost:9090
  • Grafana Dashboard: http://localhost:3000 (default login: admin/admin)

Table of Contents

  1. What is Prometheus Grafana Monitoring Stack?
  2. Why Choose Prometheus for Linux System Monitoring?
  3. How to Install Prometheus on Linux Server
  4. How to Setup Node Exporter for System Metrics
  5. How to Install Grafana Visualization Platform
  6. How to Configure Grafana Data Source Connection
  7. How to Create Grafana Monitoring Dashboards
  8. Prometheus Configuration Best Practices
  9. FAQ Section
  10. Troubleshooting Common Issues

What is Prometheus Grafana Monitoring Stack?

The prometheus grafana monitoring stack represents a powerful open-source solution for comprehensive Linux system observability. Furthermore, this architecture combines two industry-leading tools that work seamlessly together to provide enterprise-grade monitoring capabilities.

Prometheus serves as the metrics collection and time-series database engine, while Grafana functions as the visualization and dashboard platform. Consequently, this combination enables system administrators to collect, store, query, and visualize metrics from Linux servers, applications, and infrastructure components.

Architecture Components

ComponentFunctionDefault PortPurpose
PrometheusTime-series database9090Metrics collection, storage, and querying
GrafanaVisualization platform3000Dashboard creation and data visualization
Node ExporterSystem metrics exporter9100Hardware and OS metrics exposure
AlertmanagerAlert routing system9093Alert management and notifications

The stack operates on a pull-based model, where Prometheus actively scrapes metrics endpoints at configurable intervals. Subsequently, Grafana queries this stored data using PromQL (Prometheus Query Language) to generate real-time visualizations.

According to the Prometheus official documentation, this architecture scales efficiently from single-server deployments to massive distributed systems monitoring thousands of targets. Moreover, the Cloud Native Computing Foundation (CNCF) has graduated Prometheus as a production-ready project, validating its reliability for mission-critical environments.


Why Choose Prometheus for Linux System Monitoring?

Organizations transitioning to modern monitoring solutions benefit significantly from Prometheus’s unique advantages. Specifically, several compelling reasons make this prometheus grafana linux stack the preferred choice for system administrators.

Key Advantages

1. Pull-Based Metrics Collection Unlike traditional push-based monitoring systems, Prometheus actively pulls metrics from configured targets. Consequently, this approach simplifies network configuration and reduces the attack surface since monitored systems don’t need outbound connectivity.

2. Multi-Dimensional Data Model Prometheus stores metrics as time-series data with key-value labels, enabling flexible querying and aggregation. Therefore, you can slice and dice metrics across multiple dimensions without restructuring your data model.

3. Powerful Query Language (PromQL) PromQL provides sophisticated data manipulation capabilities that rival SQL in expressiveness. Additionally, it supports mathematical operations, aggregation functions, and complex temporal queries essential for performance analysis.

4. Built-in Service Discovery Prometheus automatically discovers monitoring targets through various mechanisms including Kubernetes, Consul, and file-based configurations. As a result, dynamic infrastructure scales seamlessly without manual intervention.

5. No External Dependencies The single binary deployment model eliminates complex dependency chains. Furthermore, Prometheus operates independently without requiring external databases or message queues, simplifying both deployment and maintenance.

Comparison with Traditional Monitoring

FeaturePrometheus StackTraditional Tools (Nagios/Zabbix)
ArchitecturePull-based, distributedAgent-based, centralized
Query LanguagePromQL (powerful)Limited query capabilities
Cloud NativeKubernetes-integratedRequires adaptation
StorageLocal time-series DBExternal database required
ScalabilityHorizontal scalingVertical scaling limitations

The Linux Foundation recognizes this monitoring stack as a cornerstone technology for modern infrastructure management. Similarly, the CNCF landscape positions it as the de-facto standard for cloud-native monitoring solutions.

For readers establishing baseline Linux performance understanding, review our guide on System Performance Monitoring with top and htop (Post #41) before proceeding with advanced monitoring implementations.


How to Install Prometheus on Linux Server

Installing Prometheus requires methodical execution of several interconnected steps. Initially, we’ll establish the foundational components before progressing to advanced configurations.

Prerequisites Verification

Before beginning the installation process, ensure your Linux system meets these requirements:

# Check system resources
free -h                    # Minimum 2GB RAM recommended
df -h /opt                # Minimum 20GB storage for metrics
uname -r                  # Kernel 3.10 or higher

# Verify network connectivity
curl -I https://github.com  # Internet access for downloads

Step 1: Download Prometheus Binary

Navigate to the Prometheus downloads page to identify the latest stable release. Subsequently, execute these commands:

# Create Prometheus user (security best practice)
sudo useradd --no-create-home --shell /bin/false prometheus

# Download latest version (replace version as needed)
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz

# Extract archive
tar xvfz prometheus-2.48.0.linux-amd64.tar.gz
cd prometheus-2.48.0.linux-amd64

Step 2: Configure Directory Structure

Organizing Prometheus files correctly ensures maintainability and security. Therefore, establish the following directory hierarchy:

# Create directories
sudo mkdir -p /etc/prometheus /var/lib/prometheus

# Move binaries
sudo cp prometheus promtool /usr/local/bin/

# Move configuration files
sudo cp -r consoles console_libraries /etc/prometheus/

# Set ownership
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool

Step 3: Create Prometheus Configuration

The prometheus.yml configuration file defines scrape targets and global settings. Consequently, create this essential file:

sudo nano /etc/prometheus/prometheus.yml

Insert this foundational configuration:

# Global configuration
global:
  scrape_interval: 15s          # Scrape targets every 15 seconds
  evaluation_interval: 15s       # Evaluate rules every 15 seconds
  external_labels:
    monitor: 'linux-monitoring'

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets: []

# Rule files
rule_files:
  # - "alert_rules.yml"

# Scrape configurations
scrape_configs:
  # Prometheus self-monitoring
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

Step 4: Create Systemd Service

Systemd integration enables automatic startup and process management. Additionally, this configuration ensures Prometheus restarts automatically after failures:

sudo nano /etc/systemd/system/prometheus.service

Configure the service unit:

[Unit]
Description=Prometheus Monitoring System
Documentation=https://prometheus.io/docs/
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
    --config.file=/etc/prometheus/prometheus.yml \
    --storage.tsdb.path=/var/lib/prometheus/ \
    --web.console.templates=/etc/prometheus/consoles \
    --web.console.libraries=/etc/prometheus/console_libraries \
    --storage.tsdb.retention.time=15d \
    --web.enable-lifecycle

Restart=always
RestartSec=10s

[Install]
WantedBy=multi-user.target

Step 5: Start Prometheus Service

Finally, activate and verify the Prometheus service:

# Reload systemd daemon
sudo systemctl daemon-reload

# Start Prometheus
sudo systemctl start prometheus

# Enable automatic startup
sudo systemctl enable prometheus

# Verify service status
sudo systemctl status prometheus

# Check listening ports
sudo ss -tulpn | grep prometheus

Verification Steps

Access the Prometheus web interface to confirm successful installation:

# Open in browser
http://your-server-ip:9090

# Test PromQL query in web UI
up{job="prometheus"}

Additionally, verify metrics endpoint accessibility:

curl http://localhost:9090/metrics | head -20

For enhanced Linux administration automation, explore our comprehensive guide on Introduction to Ansible for Linux Automation (Post #37).


How to Setup Node Exporter for System Metrics

Node Exporter exposes hardware and operating system metrics in Prometheus-compatible format. Specifically, this exporter provides critical visibility into CPU, memory, disk, network, and filesystem performance.

Installation Process

Begin by downloading and configuring Node Exporter:

# Download Node Exporter
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz

# Extract binary
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz

# Move to system path
sudo mv node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/

# Create dedicated user
sudo useradd --no-create-home --shell /bin/false node_exporter

# Set permissions
sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Systemd Service Configuration

Create a dedicated service unit for Node Exporter:

sudo nano /etc/systemd/system/node_exporter.service

Service configuration:

[Unit]
Description=Prometheus Node Exporter
Documentation=https://github.com/prometheus/node_exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter \
    --collector.filesystem.mount-points-exclude='^/(dev|proc|sys|var/lib/docker/.+)($|/)' \
    --collector.netclass.ignored-devices='^(veth.*|docker.*)$'

Restart=always
RestartSec=5s

[Install]
WantedBy=multi-user.target

Enable and Start Service

Activate Node Exporter and verify operation:

# Start service
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter

# Verify status
sudo systemctl status node_exporter

# Test metrics endpoint
curl http://localhost:9100/metrics | grep node_cpu

Configure Prometheus Scraping

Update Prometheus configuration to scrape Node Exporter metrics:

sudo nano /etc/prometheus/prometheus.yml

Add this scrape configuration:

scrape_configs:
  # Existing prometheus job...
  
  # Node Exporter job
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']
        labels:
          instance: 'linux-server-01'

Reload Prometheus to apply changes:

# Hot reload configuration (if --web.enable-lifecycle enabled)
curl -X POST http://localhost:9090/-/reload

# Or restart service
sudo systemctl restart prometheus

Available Metrics Categories

Node Exporter provides these metric families:

Metric FamilyDescriptionExample Metrics
node_cpu_*CPU statisticsUsage, idle time, steal time
node_memory_*Memory metricsAvailable, cached, swap usage
node_disk_*Disk I/ORead/write bytes, operations
node_network_*Network statisticsTransmitted bytes, errors, drops
node_filesystem_*Filesystem infoAvailable space, inodes

For detailed disk performance analysis techniques, reference our guide on Disk I/O Performance Analysis (Post #42).


How to Install Grafana Visualization Platform

Grafana transforms raw metrics into actionable insights through customizable dashboards. Moreover, this platform supports multiple data sources beyond Prometheus, creating a unified observability solution.

Installation Methods

Method 1: APT Repository (Ubuntu/Debian)

# Install dependencies
sudo apt-get install -y software-properties-common wget

# Add Grafana GPG key
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null

# Add repository
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

# Update and install
sudo apt-get update
sudo apt-get install grafana

# Start service
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Method 2: YUM Repository (RHEL/CentOS)

# Create repository file
sudo nano /etc/yum.repos.d/grafana.repo

Repository configuration:

[grafana]
name=grafana
baseurl=https://rpm.grafana.com
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://rpm.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt

Install Grafana:

# Install package
sudo yum install grafana

# Start and enable
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Method 3: Binary Installation

For distributions without package managers or requiring specific versions:

# Download binary
wget https://dl.grafana.com/oss/release/grafana-10.2.2.linux-amd64.tar.gz

# Extract and install
tar -zxvf grafana-10.2.2.linux-amd64.tar.gz
sudo mv grafana-10.2.2 /opt/grafana

# Create systemd service (similar to previous examples)

Firewall Configuration

Ensure Grafana’s port accessibility:

# UFW firewall
sudo ufw allow 3000/tcp

# Firewalld
sudo firewall-cmd --permanent --add-port=3000/tcp
sudo firewall-cmd --reload

# iptables
sudo iptables -A INPUT -p tcp --dport 3000 -j ACCEPT
sudo iptables-save > /etc/iptables/rules.v4

Initial Access and Security

Access Grafana web interface and secure the installation:

# Access URL
http://your-server-ip:3000

# Default credentials
Username: admin
Password: admin

Important: Immediately change the default password upon first login. Additionally, consider implementing these security hardening measures:

# Edit Grafana configuration
sudo nano /etc/grafana/grafana.ini

Key security settings:

[server]
protocol = https
cert_file = /etc/ssl/certs/grafana.crt
cert_key = /etc/ssl/private/grafana.key

[security]

admin_password = <strong-password> secret_key = <random-secret-key> cookie_secure = true cookie_samesite = strict

[auth]

disable_login_form = false disable_signout_menu = false

For comprehensive SSH security practices applicable to web services, review our guide on SSH Server Setup and Security Hardening (Post #22).


How to Configure Grafana Data Source Connection

Establishing the Prometheus-Grafana connection enables data flow between these components. Furthermore, proper configuration ensures optimal query performance and reliable dashboard updates.

Step-by-Step Configuration

  1. Access Data Sources Settings

Navigate through Grafana’s interface:

  • Log into Grafana (http://localhost:3000)
  • Click the gear icon (⚙️) in the left sidebar
  • Select “Data Sources”
  • Click “Add data source”
  1. Select Prometheus Data Source
- Choose "Prometheus" from the time-series database section
- This integration provides native support for PromQL queries
  1. Configure Connection Parameters

Enter these essential settings:

ParameterValueDescription
NamePrometheus-LocalIdentifier for this data source
URLhttp://localhost:9090Prometheus server address
AccessServer (default)Query through Grafana backend
Scrape interval15sMatch Prometheus global setting
Query timeout60sMaximum query execution time
HTTP MethodPOSTRecommended for large queries
  1. Advanced Settings Configuration
# Custom HTTP Headers (if authentication required)
Header: Authorization
Value: Bearer <your-token>

# Timeout configuration
Timeout: 60

# TLS/SSL Settings (if using HTTPS)
Skip TLS Verify: false (recommended: false for production)
TLS Client Auth: Configure if mutual TLS required
  1. Test Connection

Click “Save & Test” button at the bottom. Successfully, you should see:

✓ Data source is working

Troubleshooting Connection Issues

If connection testing fails, systematically verify these components:

# 1. Check Prometheus service status
sudo systemctl status prometheus

# 2. Verify Prometheus is listening
sudo ss -tulpn | grep 9090

# 3. Test Prometheus API directly
curl http://localhost:9090/api/v1/query?query=up

# 4. Check firewall rules
sudo iptables -L -n | grep 9090

# 5. Review Grafana logs
sudo journalctl -u grafana-server -f

Multiple Data Source Configuration

For monitoring distributed systems, configure additional Prometheus instances:

# Example: Remote Prometheus server
Name: Prometheus-Remote-DC1
URL: http://prometheus-dc1.example.com:9090
Access: Server
Basic Auth: Enable if required
  User: monitoring
  Password: <secure-password>

The Grafana documentation provides comprehensive details on advanced configuration options. Similarly, monitoring best practices from Red Hat emphasize proper data source organization for enterprise environments.


How to Create Grafana Monitoring Dashboards

Dashboard creation transforms raw metrics into actionable visualizations. Consequently, well-designed dashboards enable rapid incident detection and performance optimization.

Quick Start: Import Pre-built Dashboards

Grafana community provides thousands of production-ready dashboards:

  1. Navigate to Dashboard Import
    • Click “+” icon in left sidebar
    • Select “Import”
    • Enter dashboard ID or upload JSON
  2. Recommended Dashboard IDs
DashboardIDPurposePanels
Node Exporter Full1860Complete system metrics15+ panels
Node Exporter Quickstart13978Essential metrics only8 panels
Prometheus Stats3662Prometheus self-monitoring12 panels
Linux System Overview14513High-level system view10 panels
  1. Import Process
# Import by ID
1. Enter dashboard ID: 1860
2. Click "Load"
3. Select Prometheus data source: "Prometheus-Local"
4. Click "Import"

Creating Custom Dashboards

Build tailored dashboards for specific monitoring requirements:

1: Create New Dashboard

Dashboard → New → New Dashboard → Add visualization

2: Configure Panel Queries

Example: CPU Usage Panel

# PromQL query
100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Panel configuration:

Title: "CPU Usage %"
Type: Graph/Time series
Legend: "{{instance}}"
Unit: Percent (0-100)
Thresholds:
  - 80: Yellow warning
  - 90: Red critical

3: Memory Monitoring Panel

# Available memory percentage
(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

4: Disk I/O Panel

# Disk read throughput
rate(node_disk_read_bytes_total[5m])

# Disk write throughput
rate(node_disk_written_bytes_total[5m])

5: Network Traffic Panel

# Network receive rate
rate(node_network_receive_bytes_total{device!~"lo|docker.*"}[5m])

# Network transmit rate
rate(node_network_transmit_bytes_total{device!~"lo|docker.*"}[5m])

Dashboard Organization Best Practices

Structure dashboards logically for maximum effectiveness:

1. Dashboard Hierarchy

├── Overview Dashboard (high-level metrics)
├── System Resources Dashboard (detailed CPU/memory/disk)
├── Network Dashboard (network-specific metrics)
├── Application Dashboard (service-specific metrics)
└── Alerting Dashboard (active alerts and trends)

2. Panel Arrangement Guidelines

RowPanelsPurpose
Top RowKey Performance Indicators (KPIs)At-a-glance status
Middle RowsDetailed metrics graphsTrend analysis
Bottom RowTables and logsDetailed investigation

3. Variables Configuration

Create dashboard variables for flexibility:

# Instance selector variable
Name: instance
Type: Query
Query: label_values(node_cpu_seconds_total, instance)

Use variables in queries:

up{instance="$instance"}

Advanced Dashboard Features

Templating Example

{
  "templating": {
    "list": [
      {
        "name": "instance",
        "type": "query",
        "datasource": "Prometheus-Local",
        "query": "label_values(node_uname_info, instance)",
        "refresh": 1
      }
    ]
  }
}

Alert Integration

Configure visual alerts within panels:

Alert Configuration:
  Condition: 
    - WHEN: avg() OF query(A, 5m, now)
    - IS ABOVE: 90
  
  Notifications:
    - Send to: ops-team-slack
    - Message: "High CPU usage detected on {{instance}}"

For network-specific monitoring configurations, consult our detailed guide on Network Performance Monitoring (Post #44).


Prometheus Configuration Best Practices

Optimizing Prometheus configuration ensures reliable monitoring at scale. Moreover, following industry best practices prevents common pitfalls and performance bottlenecks.

Scrape Configuration Optimization

1. Appropriate Scrape Intervals

global:
  scrape_interval: 15s      # Default: good for most cases
  
scrape_configs:
  - job_name: 'high-frequency-metrics'
    scrape_interval: 5s     # Critical services
    static_configs:
      - targets: ['app-server:8080']
  
  - job_name: 'low-frequency-metrics'
    scrape_interval: 60s    # Infrastructure metrics
    static_configs:
      - targets: ['storage-array:9100']

2. Relabeling for Metric Organization

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['server1:9100', 'server2:9100', 'server3:9100']
    relabel_configs:
      # Extract datacenter from hostname
      - source_labels: [__address__]
        regex: '(.*)-dc(\d+)-(.*)'
        target_label: datacenter
        replacement: 'dc$2'
      
      # Add environment label
      - target_label: environment
        replacement: 'production'

Storage Configuration

1. Retention Policy

# Command line flags
--storage.tsdb.path=/var/lib/prometheus/
--storage.tsdb.retention.time=15d      # Time-based retention
--storage.tsdb.retention.size=50GB     # Size-based retention

2. Storage Sizing Calculation

# Formula: ingested_samples × retention_time × bytes_per_sample
# Example: 100k samples/sec × 15 days × 2 bytes
= 100,000 × (15 × 86,400) × 2
= 259.2 GB required storage

# Add 20% overhead for compaction
Total = 259.2 × 1.2 = 311 GB

Query Performance Optimization

1. Efficient PromQL Patterns

# ❌ Inefficient: Wide time range without aggregation
http_requests_total[7d]

# ✅ Efficient: Aggregated over time
rate(http_requests_total[5m])

# ❌ Inefficient: Multiple regex matches
{job=~"web.*|api.*|service.*"}

# ✅ Efficient: Precise label matching
{job="web-server",environment="prod"}

2. Recording Rules for Heavy Queries

groups:
  - name: cpu_rules
    interval: 30s
    rules:
      - record: instance:cpu_usage:rate5m
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
      
      - record: job:http_requests:rate5m
        expr: sum by(job) (rate(http_requests_total[5m]))

Security Hardening

1. Authentication Configuration

# prometheus.yml
global:
  external_labels:
    cluster: 'production'

# Enable basic authentication
basic_auth_users:
  prometheus: $2y$10$<bcrypt-hash>

Generate bcrypt password:

# Install htpasswd
sudo apt-get install apache2-utils

# Generate hash
htpasswd -nBC 10 "" | tr -d ':\n'

2. TLS Configuration

# Enable HTTPS
tls_server_config:
  cert_file: /etc/prometheus/prometheus.crt
  key_file: /etc/prometheus/prometheus.key
  client_ca_file: /etc/prometheus/client_ca.crt
  client_auth_type: RequireAndVerifyClientCert

High Availability Setup

# Prometheus HA configuration
global:
  external_labels:
    replica: 'A'           # Use 'B' for second instance
    cluster: 'production'

# Both instances scrape same targets
scrape_configs:
  - job_name: 'federated-cluster'
    honor_labels: true
    metrics_path: '/federate'
    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'
    static_configs:
      - targets:
        - 'prometheus-a:9090'
        - 'prometheus-b:9090'

The Prometheus best practices guide provides comprehensive recommendations. Additionally, CNCF’s monitoring whitepaper offers architectural patterns for production deployments.


FAQ Section

How does Prometheus collect metrics from Linux systems?

Prometheus uses a pull-based model where the Prometheus server actively scrapes metrics from configured HTTP endpoints. Specifically, exporters like Node Exporter expose metrics at /metrics endpoints, which Prometheus polls at regular intervals (typically 15-60 seconds). This approach differs from push-based systems and provides better reliability since Prometheus controls the scraping frequency and can detect when targets become unavailable.

What’s the difference between Prometheus and Grafana?

Prometheus functions as the metrics collection, storage, and querying engine, while Grafana serves as the visualization and dashboarding platform. Furthermore, Prometheus includes a basic built-in web UI primarily for query testing, whereas Grafana excels at creating production-grade dashboards with advanced visualization options. Organizations typically use both together: Prometheus handles the data pipeline, and Grafana provides the user-facing analytics interface.

Can I monitor multiple Linux servers with one Prometheus instance?

Yes, absolutely. A single Prometheus instance can monitor hundreds to thousands of targets depending on hardware resources and scrape interval configuration. Moreover, you configure multiple targets within scrape jobs using static configurations, file-based service discovery, or dynamic discovery mechanisms like Kubernetes, Consul, or cloud provider APIs. For very large deployments exceeding 100,000 targets, consider federation where multiple Prometheus instances aggregate metrics to a central instance.

How much disk space does Prometheus require?

Storage requirements depend on several factors: number of monitored metrics, scrape interval, retention period, and cardinality. As a general calculation, expect approximately 1-2 bytes per sample. For example, monitoring 10,000 samples per second with 15-day retention requires roughly 260GB. Additionally, plan for 20-30% overhead for compaction and indexing. Use Prometheus’s storage calculations and monitor actual usage with prometheus_tsdb_storage_blocks_bytes metric.

What happens if Prometheus server goes down?

During Prometheus downtime, metrics collection stops and monitoring gaps occur in your time-series data. However, Prometheus doesn’t buffer metrics from exporters, so data during outages is permanently lost. To mitigate this risk, implement high-availability configurations with redundant Prometheus instances scraping identical targets. Additionally, consider remote storage solutions like Thanos or Cortex for long-term retention and cross-instance querying capabilities.

How do I secure Prometheus and Grafana in production?

Implement these security layers: (1) Enable TLS/SSL encryption for all web interfaces using valid certificates, (2) Configure strong authentication mechanisms including OAuth, LDAP, or SAML integration, (3) Implement network segmentation placing monitoring systems in isolated VLANs, (4) Enable audit logging to track access and configuration changes, (5) Regularly update both Prometheus and Grafana to patch security vulnerabilities, (6) Use firewall rules restricting access to monitoring ports. Refer to OWASP security guidelines for comprehensive application security practices.

Can Prometheus monitor applications besides system metrics?

Yes, Prometheus supports extensive application monitoring through custom exporters and client libraries. Specifically, official client libraries exist for Go, Java, Python, Ruby, and other languages enabling application instrumentation. Furthermore, community-maintained exporters provide metrics for databases (MySQL, PostgreSQL, Redis), web servers (Apache, Nginx), message queues (RabbitMQ, Kafka), and countless other services. The Prometheus exporters page lists hundreds of available integrations.

What is PromQL and why is it important?

PromQL (Prometheus Query Language) is the functional query language for selecting and aggregating time-series data. Its importance stems from enabling sophisticated analysis operations like rate calculations, statistical aggregations, mathematical transformations, and temporal functions that would be extremely complex with traditional SQL. Additionally, PromQL’s label-based filtering allows flexible metric slicing across multiple dimensions without schema changes. Mastering PromQL is essential for creating meaningful alerts and dashboards.

How often should I update Prometheus and Grafana?

Follow these update strategies: (1) Apply security patches immediately upon release, (2) Update to minor versions within 2-3 months of release after community testing, (3) Plan major version upgrades annually during maintenance windows, (4) Always test updates in staging environments before production deployment, (5) Subscribe to project security mailing lists for vulnerability notifications. Consequently, balance stability requirements against security needs based on your organization’s risk tolerance.

What are the alternatives to Prometheus and Grafana?

Several monitoring alternatives exist: (1) InfluxDB + Chronograf – Similar time-series approach with different query language, (2) Elastic Stack (ELK) – Better for log analysis but heavier resource requirements, (3) Zabbix – Traditional monitoring with agent-based architecture, (4) Nagios – Legacy monitoring focused on availability rather than metrics, (5) DataDog/New Relic – Commercial SaaS solutions with broader feature sets but ongoing costs. However, the Prometheus/Grafana combination remains the preferred open-source standard for cloud-native environments due to CNCF backing and extensive ecosystem support.


Troubleshooting Common Issues

Issue 1: Prometheus Fails to Start

Symptoms:

sudo systemctl status prometheus
# Output: Failed to start Prometheus

Diagnostic Steps:

# Check service logs
sudo journalctl -u prometheus -n 100 --no-pager

# Verify configuration syntax
/usr/local/bin/promtool check config /etc/prometheus/prometheus.yml

# Check file permissions
ls -la /etc/prometheus/
ls -la /var/lib/prometheus/

# Verify port availability
sudo ss -tulpn | grep 9090

Solutions:

# Fix configuration errors
sudo nano /etc/prometheus/prometheus.yml
# Validate after changes
promtool check config /etc/prometheus/prometheus.yml

# Correct permissions
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus

# Kill process using port 9090
sudo lsof -ti:9090 | xargs sudo kill -9

# Restart service
sudo systemctl restart prometheus

Issue 2: Node Exporter Metrics Not Appearing

Symptoms:

  • Node Exporter running but metrics not in Prometheus
  • up{job="node_exporter"} showing 0 or absent

Diagnostic Commands:

# Test Node Exporter endpoint
curl http://localhost:9100/metrics | head

# Check Prometheus targets
curl http://localhost:9090/api/v1/targets | jq

# Verify network connectivity
telnet localhost 9100

# Check firewall rules
sudo iptables -L -n -v | grep 9100

Resolution Steps:

# 1. Verify scrape configuration
sudo nano /etc/prometheus/prometheus.yml

# Ensure this section exists:
scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']

# 2. Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload

# 3. Check Prometheus logs
sudo journalctl -u prometheus -f

# 4. Restart services if needed
sudo systemctl restart node_exporter prometheus

Issue 3: Grafana Cannot Connect to Prometheus

Symptoms:

  • “Bad Gateway” or “Connection Refused” errors
  • Data source test fails

Troubleshooting Process:

# 1. Verify Prometheus accessibility
curl http://localhost:9090/api/v1/query?query=up

# 2. Check Grafana logs
sudo tail -f /var/log/grafana/grafana.log

# 3. Test from Grafana server (if different host)
curl http://prometheus-host:9090/api/v1/label/__name__/values

# 4. Verify DNS resolution
nslookup prometheus-host

# 5. Check network policies
sudo iptables -L OUTPUT -n -v

Fix Configurations:

# In Grafana data source settings:
URL: http://localhost:9090          # If co-located
URL: http://prometheus-ip:9090      # If remote

# Access mode:
Access: Server (default)            # Queries through Grafana backend
Access: Browser                     # Direct browser queries (less common)

# For remote Prometheus:
# Add firewall rule on Prometheus server
sudo ufw allow from grafana-ip to any port 9090

# Test connectivity
telnet prometheus-ip 9090

Issue 4: High Memory Usage by Prometheus

Symptoms:

  • OOM (Out of Memory) kills
  • Slow query performance
  • prometheus_tsdb_head_chunks metric excessively high

Analysis Commands:

# Check memory usage
free -h
ps aux | grep prometheus

# Query Prometheus internal metrics
curl 'http://localhost:9090/api/v1/query?query=process_resident_memory_bytes'

# Check cardinality
curl 'http://localhost:9090/api/v1/status/tsdb'

# Identify high cardinality metrics
curl http://localhost:9090/api/v1/label/__name__/values | jq

Optimization Solutions:

# 1. Reduce retention period
# Edit systemd service
ExecStart=/usr/local/bin/prometheus \
    --storage.tsdb.retention.time=7d \      # Reduced from 15d
    --storage.tsdb.retention.size=20GB

# 2. Optimize scrape intervals
global:
  scrape_interval: 30s      # Increased from 15s

# 3. Filter unnecessary metrics
scrape_configs:
  - job_name: 'node_exporter'
    static_configs:
      - targets: ['localhost:9100']
    metric_relabel_configs:
      # Drop unused metrics
      - source_labels: [__name__]
        regex: 'node_scrape_collector_.*'
        action: drop

# 4. Implement recording rules for expensive queries
groups:
  - name: precomputed_metrics
    interval: 60s
    rules:
      - record: instance:node_cpu_utilization:rate5m
        expr: (100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100))

Issue 5: Dashboards Not Updating in Real-time

Symptoms:

  • Stale data in Grafana panels
  • Graphs not refreshing automatically

Debug Process:

# 1. Check Grafana data source health
# Navigate to: Configuration → Data Sources → Test

# 2. Verify Prometheus has recent data
curl 'http://localhost:9090/api/v1/query?query=up&time='$(date +%s)

# 3. Check dashboard refresh settings
# In Grafana: Dashboard Settings → Time Range

# 4. Monitor query performance
curl -G http://localhost:9090/api/v1/query_range \
    --data-urlencode 'query=up' \
    --data-urlencode 'start='$(date -d '1 hour ago' +%s) \
    --data-urlencode 'end='$(date +%s) \
    --data-urlencode 'step=15s'

Resolution Steps:

# 1. Set appropriate dashboard refresh interval
Dashboard Settings → Time Range → Refresh: 10s

# 2. Optimize slow queries with recording rules
# See High Memory Usage section above

# 3. Increase Grafana query timeout
sudo nano /etc/grafana/grafana.ini

[dataproxy]

timeout = 90 # 4. Restart Grafana sudo systemctl restart grafana-server


Issue 6: Alert Not Triggering

Symptoms:

  • Conditions met but no notifications
  • Alertmanager shows no active alerts

Verification Commands:

# Check alert rules syntax
promtool check rules /etc/prometheus/alert_rules.yml

# Query alert state
curl http://localhost:9090/api/v1/alerts | jq

# Check Alertmanager status
curl http://localhost:9093/api/v1/status | jq

# Test notification channel
curl -X POST http://localhost:9093/api/v1/alerts \
  -H 'Content-Type: application/json' \
  -d '[{"labels":{"alertname":"TestAlert"}}]'

Troubleshooting Steps:

# 1. Verify alert rule configuration
groups:
  - name: system_alerts
    interval: 30s
    rules:
      - alert: HighCPUUsage
        expr: instance:node_cpu_utilization:rate5m > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"

# 2. Check Alertmanager configuration
sudo nano /etc/alertmanager/alertmanager.yml

# Verify route and receiver configuration
route:
  group_by: ['alertname', 'cluster']
  receiver: 'team-alerts'

receivers:
  - name: 'team-alerts'
    webhook_configs:
      - url: 'http://your-webhook-url'

Reload configurations:

# Reload Prometheus rules
curl -X POST http://localhost:9090/-/reload

# Reload Alertmanager configuration
curl -X POST http://localhost:9093/-/reload

# Check logs
sudo journalctl -u alertmanager -f

For comprehensive log analysis techniques that complement monitoring, explore our guide on Log Rotation and Management (Post #39).


Additional Resources for Setup Prometheus Grafana Linux

Official Documentation

Community Resources

Related LinuxTips.pro Articles

Learning Path Recommendations

  1. Beginner: Start with Node Exporter installation and basic Grafana dashboards
  2. Intermediate: Learn PromQL query language and create custom recording rules
  3. Advanced: Implement high-availability setups and federation architectures
  4. Expert: Contribute to Prometheus ecosystem and develop custom exporters

Conclusion

Implementing a setup prometheus grafana linux monitoring stack establishes enterprise-grade observability for your infrastructure. Throughout this comprehensive guide, we’ve covered the complete installation process, from Prometheus deployment and Node Exporter configuration to Grafana dashboard creation and performance optimization.

By following these procedures, you’ve gained the capability to monitor critical system metrics, create informative visualizations, and establish proactive alerting mechanisms. Furthermore, this monitoring foundation scales seamlessly from single-server deployments to massive distributed systems managing thousands of targets.

Remember that effective monitoring constitutes an ongoing process rather than a one-time implementation. Consequently, regularly review your dashboards, optimize queries for performance, and adapt your monitoring strategy as your infrastructure evolves. The combination of Prometheus’s robust data collection with Grafana’s powerful visualization capabilities provides the observability foundation necessary for maintaining reliable, high-performance Linux systems.

Next Steps:

  1. Implement alerting rules for critical metrics
  2. Explore advanced PromQL queries for deeper insights
  3. Integrate application-specific exporters for comprehensive monitoring
  4. Consider remote storage solutions for long-term metric retention
  5. Review our Log Analysis with ELK Stack (Post #47) guide for complementary log monitoring

Start building production-grade monitoring today and gain unprecedented visibility into your Linux infrastructure performance!


Last Updated: 2025 – Optimized for AI Overviews and Featured Snippets

Mark as Complete

Did you find this guide helpful? Track your progress by marking it as completed.