Linux Mastery 100 #84

Intermediate

Tutorial

21 min read

November 25, 2025

luc

MySQL Galera Cluster: Multi-Master Database Setup Guide

Reading Progress 0%

Quick Actions

All Guides

Knowledge Overview

Prerequisites

📚 Knowledge Prerequisites (3 Essential Areas):
Linux System Administration
Command line proficiency
MySQL/MariaDB basics
Network configuration skills
Database Administration Basics
MySQL configuration experience
Security fundamentals
Service management
Infrastructure Knowledge
High availability concepts
Basic networking
Linux security

What You'll Learn

📝 What Readers Will Learn (4 Main Categories):
Database Clustering Fundamentals
Multi-master replication concepts
Galera's certification mechanism
Comparison with traditional MySQL replication
Practical Implementation Skills
3-node cluster configuration
HAProxy load balancing setup
SSL encryption implementation
Bootstrap procedures
Production Operations
Health monitoring techniques
Failure handling procedures
Split-brain recovery
Performance optimization
Security & Maintenance
SSL hardening strategies
Automated backup solutions
Monitoring script creation
Troubleshooting workflows

Tools Required

🛠️ Required Tools & Software (5 Categories):
Server Infrastructure
3 Linux servers (4GB+ RAM)
Network connectivity (1Gbps recommended)
NTP/Chrony synchronization
Database Software
MariaDB 10.6+ with Galera
Galera Cluster 4.0+
mariabackup utility
Load Balancing & Monitoring
HAProxy 2.0+
sysbench for testing
netcat for connectivity
Security & Administration
OpenSSL for certificates
UFW/iptables for firewall
SSH with sudo access
Optional Monitoring Tools
Prometheus + Grafana
Nagios/Zabbix
Custom monitoring scripts

Time Investment

21 minutes reading time
42-63 minutes hands-on practice

Guide Content

MySQL Galera Cluster Configuration

MySQL Galera cluster provides synchronous multi-master database replication with automatic failover capabilities. Furthermore, this clustering solution ensures zero data loss through immediate transaction synchronization across all nodes. Additionally, the setup process involves installing MariaDB with Galera libraries, configuring wsrep settings, and initializing the cluster bootstrap sequence.

Bash

# Quick Galera cluster status check
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%'"

# Essential cluster variables
mysql -u root -p -e "SHOW VARIABLES LIKE 'wsrep%'"

# Node connectivity verification
mysql -u root -p -e "SELECT * FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME IN ('wsrep_cluster_size','wsrep_ready','wsrep_connected')"

What is MySQL Galera Cluster and Why Use It?
How to Prepare Servers for Galera Deployment?
How to Install MariaDB Galera on Linux Systems?
How to Configure Galera Cluster Settings?
How to Initialize and Bootstrap the Cluster?
How to Add New Nodes to Existing Cluster?
How to Test Cluster Replication and Failover?
How to Monitor Galera Cluster Performance?
How to Handle Cluster Split-Brain Scenarios?
What are Galera Cluster Production Best Practices?

What is MySQL Galera Cluster and Why Use It?

MySQL Galera cluster represents a synchronous multi-master database replication solution that eliminates single points of failure. Moreover, Galera provides active-active clustering where all nodes accept both read and write operations simultaneously. Consequently, this architecture ensures true high availability for mission-critical database workloads.

Key Galera Cluster Benefits

Synchronous Replication: Unlike traditional MySQL replication, Galera ensures immediate transaction synchronization across all cluster nodes. Therefore, data consistency remains guaranteed even during network partitions or node failures.

Multi-Master Architecture: Each cluster node functions as a master, accepting both read and write operations. Subsequently, this eliminates the need for complex master-slave failover procedures.

Automatic Node Recovery: Failed nodes automatically rejoin the cluster through incremental state transfer (IST) or state snapshot transfer (SST). Thus, maintenance operations become significantly simplified.

Zero Data Loss: Galera's certification-based replication ensures transaction atomicity across all nodes. Consequently, committed transactions never experience data loss scenarios.

Galera vs Traditional MySQL Replication

Feature	Galera Cluster	Traditional Replication
Replication Type	Synchronous multi-master	Asynchronous master-slave
Write Availability	All nodes accept writes	Only master accepts writes
Failover Process	Automatic and immediate	Manual intervention required
Data Consistency	Strong consistency guaranteed	Eventual consistency only
Split-Brain Protection	Built-in quorum mechanism	Requires external solutions

How to Prepare Servers for Galera Deployment?

Proper server preparation ensures optimal Galera cluster performance and prevents common configuration issues. Additionally, network connectivity and security requirements must be carefully addressed before installation begins.

System Requirements Verification

Bash

# Check system specifications
cat /proc/meminfo | grep MemTotal
cat /proc/cpuinfo | grep "processor" | wc -l
df -h /var/lib/mysql

# Verify network connectivity between nodes
ping -c 3 10.0.1.10  # Node 1
ping -c 3 10.0.1.11  # Node 2  
ping -c 3 10.0.1.12  # Node 3

# Test port accessibility for Galera communication
nc -z 10.0.1.10 3306  # MySQL port
nc -z 10.0.1.10 4567  # Galera communication
nc -z 10.0.1.10 4568  # IST transfer
nc -z 10.0.1.10 4444  # SST transfer

Firewall Configuration Setup

Bash

# Configure firewall rules for Galera ports
sudo ufw allow 3306/tcp   # MySQL client connections
sudo ufw allow 4567/tcp   # Galera cluster communication
sudo ufw allow 4568/tcp   # Incremental State Transfer
sudo ufw allow 4444/tcp   # State Snapshot Transfer

# Alternative iptables configuration
sudo iptables -A INPUT -p tcp --dport 3306 -s 10.0.1.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 4567 -s 10.0.1.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 4568 -s 10.0.1.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 4444 -s 10.0.1.0/24 -j ACCEPT

# Save iptables rules
sudo iptables-save > /etc/iptables/rules.v4

Time Synchronization Configuration

Bash

# Install and configure NTP for time synchronization
sudo apt update && sudo apt install -y ntp
sudo systemctl enable ntp
sudo systemctl start ntp

# Verify time synchronization across nodes
timedatectl status
ntpq -p

# Alternative: Configure chrony for better accuracy
sudo apt install -y chrony
sudo systemctl enable chronyd
sudo systemctl start chronyd
chrony sources -v

Host File Configuration

Bash

# Configure /etc/hosts for consistent node resolution
sudo tee -a /etc/hosts << EOF
10.0.1.10    galera-node1
10.0.1.11    galera-node2
10.0.1.12    galera-node3
EOF

# Test hostname resolution
nslookup galera-node1
nslookup galera-node2
nslookup galera-node3

How to Install MariaDB Galera on Linux Systems?

MariaDB provides native Galera support through its built-in wsrep libraries. Furthermore, the installation process varies slightly between different Linux distributions but follows consistent configuration patterns.

Ubuntu/Debian Installation Process

Bash

# Update package repositories
sudo apt update && sudo apt upgrade -y

# Install MariaDB server with Galera support
sudo apt install -y mariadb-server mariadb-client galera-arbitrator-4

# Verify MariaDB installation
mysql --version
mysqladmin version

# Check Galera library availability
mysql -u root -p -e "SHOW GLOBAL VARIABLES LIKE 'wsrep%'"

# Stop MariaDB service before cluster configuration
sudo systemctl stop mariadb
sudo systemctl disable mariadb

CentOS/RHEL Installation Process

Bash

# Install MariaDB repository
sudo tee /etc/yum.repos.d/MariaDB.repo << EOF
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.6/centos7-amd64
gpgkey = https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck = 1
EOF

# Install MariaDB with Galera support
sudo yum install -y MariaDB-server MariaDB-client galera-4

# Enable and start MariaDB temporarily for initial setup
sudo systemctl enable mariadb
sudo systemctl start mariadb

# Secure MariaDB installation
sudo mysql_secure_installation

# Stop MariaDB before cluster configuration
sudo systemctl stop mariadb

Initial Database Security Configuration

Bash

# Secure MariaDB installation on the first node
sudo mysql_secure_installation

# Create cluster administration user
mysql -u root -p << EOF
CREATE USER 'cluster_admin'@'localhost' IDENTIFIED BY 'SecureClusterPassword123!';
GRANT ALL PRIVILEGES ON *.* TO 'cluster_admin'@'localhost' WITH GRANT OPTION;

CREATE USER 'cluster_admin'@'10.0.1.%' IDENTIFIED BY 'SecureClusterPassword123!';
GRANT ALL PRIVILEGES ON *.* TO 'cluster_admin'@'10.0.1.%' WITH GRANT OPTION;

CREATE USER 'sst_user'@'localhost' IDENTIFIED BY 'SSTPassword456!';
GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON *.* TO 'sst_user'@'localhost';

FLUSH PRIVILEGES;
EOF

How to Configure Galera Cluster Settings?

Galera configuration requires precise wsrep parameter settings to ensure optimal cluster performance and data consistency. Moreover, proper configuration prevents common issues like split-brain scenarios and improves replication efficiency.

Primary Node Configuration (Node 1)

Bash

# Create Galera configuration file
sudo tee /etc/mysql/mariadb.conf.d/60-galera.cnf << EOF
[galera]
# Galera cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="production_cluster"
wsrep_cluster_address="gcomm://10.0.1.10,10.0.1.11,10.0.1.12"
wsrep_node_name="galera-node1"
wsrep_node_address="10.0.1.10"

# Replication settings
wsrep_sst_method=rsync
wsrep_sst_auth="sst_user:SSTPassword456!"
wsrep_slave_threads=16

# Performance optimizations
wsrep_certify_nonPK=1
wsrep_max_ws_rows=0
wsrep_max_ws_size=2147483647
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1

# Binary logging configuration
log-bin=/var/log/mysql/mysql-bin
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Performance tuning
innodb_buffer_pool_size=1G
innodb_log_file_size=256M
innodb_flush_log_at_trx_commit=2
query_cache_size=0
query_cache_type=0
EOF

# Set proper file permissions
sudo chmod 644 /etc/mysql/mariadb.conf.d/60-galera.cnf
sudo chown root:root /etc/mysql/mariadb.conf.d/60-galera.cnf

Secondary Node Configuration (Node 2)

Bash

# Configure second cluster node
sudo tee /etc/mysql/mariadb.conf.d/60-galera.cnf << EOF
[galera]
# Galera cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="production_cluster"
wsrep_cluster_address="gcomm://10.0.1.10,10.0.1.11,10.0.1.12"
wsrep_node_name="galera-node2"
wsrep_node_address="10.0.1.11"

# Replication settings
wsrep_sst_method=rsync
wsrep_sst_auth="sst_user:SSTPassword456!"
wsrep_slave_threads=16

# Performance optimizations
wsrep_certify_nonPK=1
wsrep_max_ws_rows=0
wsrep_max_ws_size=2147483647
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1

# Binary logging configuration
log-bin=/var/log/mysql/mysql-bin
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Performance tuning
innodb_buffer_pool_size=1G
innodb_log_file_size=256M
innodb_flush_log_at_trx_commit=2
query_cache_size=0
query_cache_type=0
EOF

Third Node Configuration (Node 3)

Bash

# Configure third cluster node
sudo tee /etc/mysql/mariadb.conf.d/60-galera.cnf << EOF
[galera]
# Galera cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="production_cluster"
wsrep_cluster_address="gcomm://10.0.1.10,10.0.1.11,10.0.1.12"
wsrep_node_name="galera-node3"
wsrep_node_address="10.0.1.12"

# Replication settings
wsrep_sst_method=rsync
wsrep_sst_auth="sst_user:SSTPassword456!"
wsrep_slave_threads=16

# Performance optimizations
wsrep_certify_nonPK=1
wsrep_max_ws_rows=0
wsrep_max_ws_size=2147483647
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1

# Binary logging configuration
log-bin=/var/log/mysql/mysql-bin
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0

# Performance tuning
innodb_buffer_pool_size=1G
innodb_log_file_size=256M
innodb_flush_log_at_trx_commit=2
query_cache_size=0
query_cache_type=0
EOF

How to Initialize and Bootstrap the Cluster?

Cluster initialization requires careful bootstrap sequencing to establish proper Galera communication. Additionally, the bootstrap process creates the initial cluster state that subsequent nodes will join.

Bootstrap First Cluster Node

Bash

# Initialize the cluster on the first node
sudo galera_new_cluster

# Alternative bootstrap method
sudo systemctl start mariadb@bootstrap.service

# Verify cluster bootstrap status
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%'"

# Check cluster state variables
mysql -u root -p << EOF
SELECT VARIABLE_NAME, VARIABLE_VALUE 
FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
WHERE VARIABLE_NAME IN (
    'wsrep_ready',
    'wsrep_cluster_status',
    'wsrep_cluster_size',
    'wsrep_local_state_comment'
);
EOF

# Verify node is primary and ready
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status'"

Cluster Bootstrap Verification

Bash

# Comprehensive cluster status check
mysql -u root -p << EOF
SELECT 
    VARIABLE_NAME,
    VARIABLE_VALUE,
    CASE 
        WHEN VARIABLE_NAME = 'wsrep_ready' AND VARIABLE_VALUE = 'ON' THEN 'OK'
        WHEN VARIABLE_NAME = 'wsrep_cluster_status' AND VARIABLE_VALUE = 'Primary' THEN 'OK'
        WHEN VARIABLE_NAME = 'wsrep_local_state_comment' AND VARIABLE_VALUE = 'Synced' THEN 'OK'
        ELSE 'CHECK'
    END as STATUS
FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
WHERE VARIABLE_NAME IN (
    'wsrep_ready',
    'wsrep_cluster_status',
    'wsrep_local_state_comment',
    'wsrep_cluster_size',
    'wsrep_local_index'
)
ORDER BY VARIABLE_NAME;
EOF

# Check error logs for bootstrap issues
sudo tail -f /var/log/mysql/error.log

Create Test Database and Tables

Bash

# Create test database for replication verification
mysql -u root -p << EOF
CREATE DATABASE galera_test;
USE galera_test;

CREATE TABLE cluster_nodes (
    id INT AUTO_INCREMENT PRIMARY KEY,
    node_name VARCHAR(50),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    data VARCHAR(255)
);

INSERT INTO cluster_nodes (node_name, data) VALUES 
('galera-node1', 'Bootstrap node data'),
('galera-node1', 'Initial cluster setup complete');

SELECT * FROM cluster_nodes;
EOF

How to Add New Nodes to Existing Cluster?

Adding nodes to an existing Galera cluster requires proper sequencing and state transfer configuration. Moreover, the process involves automatic synchronization through either IST or SST mechanisms.

Join Second Node to Cluster

Bash

# Start MariaDB on the second node (Node 2)
sudo systemctl start mariadb

# Monitor the joining process
sudo tail -f /var/log/mysql/error.log

# Verify successful cluster join
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"

# Check node synchronization status
mysql -u root -p << EOF
SELECT 
    VARIABLE_NAME,
    VARIABLE_VALUE
FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
WHERE VARIABLE_NAME IN (
    'wsrep_cluster_size',
    'wsrep_local_state_comment',
    'wsrep_ready',
    'wsrep_connected'
);
EOF

Join Third Node to Cluster

Bash

# Start MariaDB on the third node (Node 3)
sudo systemctl start mariadb

# Verify three-node cluster formation
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"

# Test data replication across all nodes
mysql -u root -p << EOF
USE galera_test;
INSERT INTO cluster_nodes (node_name, data) VALUES 
('galera-node3', 'Third node joined successfully');

SELECT node_name, data, created_at FROM cluster_nodes ORDER BY created_at;
EOF

# Verify cluster membership
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_incoming_addresses'"

Monitor State Transfer Process

Bash

# Monitor SST/IST progress during node joining
mysql -u root -p << EOF
SELECT 
    VARIABLE_NAME,
    VARIABLE_VALUE
FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
WHERE VARIABLE_NAME IN (
    'wsrep_local_state',
    'wsrep_local_state_comment',
    'wsrep_local_recv_queue_avg',
    'wsrep_local_send_queue_avg',
    'wsrep_flow_control_paused'
);
EOF

# Check for SST/IST completion
grep -E "(SST|IST)" /var/log/mysql/error.log | tail -10

# Verify all nodes are synchronized
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_local_state_comment'"

How to Test Cluster Replication and Failover?

Comprehensive testing validates Galera cluster functionality and ensures proper failover behavior. Additionally, testing scenarios should cover both planned and unplanned node failures.

Data Replication Testing

Bash

# Test write operations on Node 1
mysql -u root -p << EOF
USE galera_test;
INSERT INTO cluster_nodes (node_name, data) VALUES 
('node1-test', 'Write test from node 1 - $(date)');
SELECT COUNT(*) as total_records FROM cluster_nodes;
EOF

# Verify replication on Node 2
mysql -h 10.0.1.11 -u root -p << EOF
USE galera_test;
SELECT node_name, data FROM cluster_nodes ORDER BY id DESC LIMIT 3;
EOF

# Test concurrent writes from Node 3
mysql -h 10.0.1.12 -u root -p << EOF
USE galera_test;
INSERT INTO cluster_nodes (node_name, data) VALUES 
('node3-test', 'Concurrent write test - $(date)');
SELECT COUNT(*) as total_records FROM cluster_nodes;
EOF

# Verify data consistency across all nodes
for node in 10.0.1.10 10.0.1.11 10.0.1.12; do
    echo "Node $node record count:"
    mysql -h $node -u root -p -e "SELECT COUNT(*) FROM galera_test.cluster_nodes"
done

Failover Testing Procedures

Bash

# Simulate Node 2 failure
ssh 10.0.1.11 'sudo systemctl stop mariadb'

# Verify cluster continues with 2 nodes
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"

# Test write operations during failover
mysql -u root -p << EOF
USE galera_test;
INSERT INTO cluster_nodes (node_name, data) VALUES 
('failover-test', 'Data written during node2 failure - $(date)');
EOF

# Restart failed node and verify rejoin
ssh 10.0.1.11 'sudo systemctl start mariadb'

# Wait for node recovery and check cluster size
sleep 30
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"

# Verify data synchronization after rejoin
mysql -h 10.0.1.11 -u root -p << EOF
USE galera_test;
SELECT node_name, data FROM cluster_nodes WHERE data LIKE '%failover%';
EOF

Performance Benchmark Testing

Bash

# Install sysbench for database benchmarking
sudo apt install -y sysbench

# Prepare benchmark database
mysql -u root -p -e "CREATE DATABASE sysbench_test"

# Run sysbench preparation
sysbench oltp_read_write \
    --db-driver=mysql \
    --mysql-host=10.0.1.10 \
    --mysql-port=3306 \
    --mysql-user=root \
    --mysql-password=YourPassword \
    --mysql-db=sysbench_test \
    --tables=4 \
    --table-size=10000 \
    prepare

# Execute benchmark test
sysbench oltp_read_write \
    --db-driver=mysql \
    --mysql-host=10.0.1.10 \
    --mysql-port=3306 \
    --mysql-user=root \
    --mysql-password=YourPassword \
    --mysql-db=sysbench_test \
    --tables=4 \
    --table-size=10000 \
    --threads=8 \
    --time=60 \
    --report-interval=10 \
    run

# Cleanup benchmark data
sysbench oltp_read_write \
    --db-driver=mysql \
    --mysql-host=10.0.1.10 \
    --mysql-port=3306 \
    --mysql-user=root \
    --mysql-password=YourPassword \
    --mysql-db=sysbench_test \
    --tables=4 \
    cleanup

How to Monitor Galera Cluster Performance?

Effective monitoring ensures optimal cluster performance and early detection of potential issues. Furthermore, proper monitoring strategies help maintain cluster health and prevent service disruptions.

Essential Cluster Monitoring Commands

Bash

# Real-time cluster status monitoring
watch -n 2 'mysql -u root -p -e "SHOW STATUS LIKE \"wsrep%\"" | grep -E "(cluster_size|ready|local_state|flow_control)"'

# Comprehensive cluster health check
mysql -u root -p << EOF
SELECT 
    'Cluster Size' as Metric, wsrep_cluster_size as Value
FROM (SELECT VARIABLE_VALUE as wsrep_cluster_size 
      FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
      WHERE VARIABLE_NAME = 'wsrep_cluster_size') t1
UNION ALL
SELECT 
    'Local State', wsrep_local_state_comment 
FROM (SELECT VARIABLE_VALUE as wsrep_local_state_comment 
      FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
      WHERE VARIABLE_NAME = 'wsrep_local_state_comment') t2
UNION ALL
SELECT 
    'Ready Status', wsrep_ready 
FROM (SELECT VARIABLE_VALUE as wsrep_ready 
      FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
      WHERE VARIABLE_NAME = 'wsrep_ready') t3
UNION ALL
SELECT 
    'Connected', wsrep_connected 
FROM (SELECT VARIABLE_VALUE as wsrep_connected 
      FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
      WHERE VARIABLE_NAME = 'wsrep_connected') t4;
EOF

# Monitor replication queue lengths
mysql -u root -p << EOF
SELECT 
    VARIABLE_NAME,
    VARIABLE_VALUE,
    CASE 
        WHEN VARIABLE_NAME LIKE '%queue%' AND CAST(VARIABLE_VALUE AS DECIMAL) > 100 THEN 'WARNING'
        WHEN VARIABLE_NAME LIKE '%queue%' AND CAST(VARIABLE_VALUE AS DECIMAL) > 1000 THEN 'CRITICAL'
        ELSE 'OK'
    END as STATUS
FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
WHERE VARIABLE_NAME IN (
    'wsrep_local_recv_queue',
    'wsrep_local_send_queue',
    'wsrep_local_recv_queue_avg',
    'wsrep_local_send_queue_avg'
);
EOF

Automated Monitoring Script

Bash

# Create comprehensive cluster monitoring script
sudo tee /usr/local/bin/galera_monitor.sh << 'EOF'
#!/bin/bash

# Galera Cluster Monitoring Script
LOG_FILE="/var/log/galera_monitor.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')

echo "[$DATE] Galera Cluster Health Check" >> $LOG_FILE

# Check cluster connectivity
CLUSTER_SIZE=$(mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW STATUS LIKE 'wsrep_cluster_size'" | grep wsrep | awk '{print $2}')
LOCAL_STATE=$(mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW STATUS LIKE 'wsrep_local_state_comment'" | grep wsrep | awk '{print $2}')
READY_STATUS=$(mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW STATUS LIKE 'wsrep_ready'" | grep wsrep | awk '{print $2}')

# Log cluster status
echo "[$DATE] Cluster Size: $CLUSTER_SIZE" >> $LOG_FILE
echo "[$DATE] Local State: $LOCAL_STATE" >> $LOG_FILE
echo "[$DATE] Ready Status: $READY_STATUS" >> $LOG_FILE

# Check for issues
if [ "$CLUSTER_SIZE" -lt "3" ]; then
    echo "[$DATE] WARNING: Cluster size reduced to $CLUSTER_SIZE" >> $LOG_FILE
    # Send alert notification
    echo "Galera cluster size warning: $CLUSTER_SIZE nodes" | mail -s "Galera Alert" admin@example.com
fi

if [ "$LOCAL_STATE" != "Synced" ]; then
    echo "[$DATE] ERROR: Node not synchronized - $LOCAL_STATE" >> $LOG_FILE
    echo "Galera node synchronization error: $LOCAL_STATE" | mail -s "Galera Critical" admin@example.com
fi

# Check flow control
FLOW_CONTROL=$(mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW STATUS LIKE 'wsrep_flow_control_paused'" | grep wsrep | awk '{print $2}')
if (( $(echo "$FLOW_CONTROL > 0.1" | bc -l) )); then
    echo "[$DATE] WARNING: Flow control active - $FLOW_CONTROL" >> $LOG_FILE
fi

echo "[$DATE] Health check completed" >> $LOG_FILE
EOF

# Make script executable
sudo chmod +x /usr/local/bin/galera_monitor.sh

# Schedule monitoring via cron
echo "*/5 * * * * /usr/local/bin/galera_monitor.sh" | sudo crontab -

Performance Metrics Collection

Bash

# Create performance metrics collection script
sudo tee /usr/local/bin/galera_metrics.sh << 'EOF'
#!/bin/bash

METRICS_FILE="/var/log/galera_metrics.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

# Collect key performance metrics
mysql -u root -p${MYSQL_ROOT_PASSWORD} << SQL >> $METRICS_FILE
SELECT 
    '$TIMESTAMP' as timestamp,
    VARIABLE_NAME,
    VARIABLE_VALUE
FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
WHERE VARIABLE_NAME IN (
    'wsrep_replicated',
    'wsrep_replicated_bytes',
    'wsrep_received',
    'wsrep_received_bytes',
    'wsrep_commit_window',
    'wsrep_cert_deps_distance',
    'wsrep_apply_window'
);
SQL

# Log current connections
CONNECTIONS=$(mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW STATUS LIKE 'Threads_connected'" | grep Threads | awk '{print $2}')
echo "$TIMESTAMP,connections,$CONNECTIONS" >> $METRICS_FILE
EOF

sudo chmod +x /usr/local/bin/galera_metrics.sh

# Schedule metrics collection every minute
echo "* * * * * /usr/local/bin/galera_metrics.sh" | sudo crontab -

How to Handle Cluster Split-Brain Scenarios?

Split-brain scenarios require immediate attention and proper recovery procedures to maintain data integrity. Moreover, Galera's built-in quorum mechanism helps prevent most split-brain conditions automatically.

Detect Split-Brain Conditions

Bash

# Check for non-primary cluster status
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status'"

# Verify cluster connectivity
mysql -u root -p << EOF
SELECT 
    CASE 
        WHEN VARIABLE_VALUE = 'Primary' THEN 'HEALTHY'
        WHEN VARIABLE_VALUE = 'non-Primary' THEN 'SPLIT-BRAIN'
        ELSE 'UNKNOWN'
    END as cluster_health,
    VARIABLE_VALUE as cluster_status
FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
WHERE VARIABLE_NAME = 'wsrep_cluster_status';
EOF

# Check node connectivity across cluster
for node in 10.0.1.10 10.0.1.11 10.0.1.12; do
    echo "Checking node $node:"
    mysql -h $node -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status'" 2>/dev/null || echo "Node $node unreachable"
done

Split-Brain Recovery Procedures

Bash

# Method 1: Bootstrap from the most recent node
# First, stop all MariaDB services
ssh 10.0.1.10 'sudo systemctl stop mariadb'
ssh 10.0.1.11 'sudo systemctl stop mariadb' 
ssh 10.0.1.12 'sudo systemctl stop mariadb'

# Check grastate.dat files to find the most recent node
for node in 10.0.1.10 10.0.1.11 10.0.1.12; do
    echo "Node $node grastate.dat:"
    ssh $node 'sudo cat /var/lib/mysql/grastate.dat | grep seqno'
done

# Bootstrap from the node with highest seqno
sudo galera_new_cluster  # Run on the selected node

# Start other nodes normally
ssh 10.0.1.11 'sudo systemctl start mariadb'
ssh 10.0.1.12 'sudo systemctl start mariadb'

# Verify cluster recovery
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"

Manual Bootstrap Recovery

Bash

# Method 2: Force bootstrap when automatic detection fails
# Edit grastate.dat on the chosen bootstrap node
sudo systemctl stop mariadb
sudo sed -i 's/safe_to_bootstrap: 0/safe_to_bootstrap: 1/' /var/lib/mysql/grastate.dat

# Verify the change
sudo cat /var/lib/mysql/grastate.dat

# Bootstrap the cluster
sudo galera_new_cluster

# Start remaining nodes
ssh 10.0.1.11 'sudo systemctl start mariadb'
ssh 10.0.1.12 'sudo systemctl start mariadb'

# Test data consistency after recovery
mysql -u root -p << EOF
USE galera_test;
SELECT COUNT(*) as record_count FROM cluster_nodes;
INSERT INTO cluster_nodes (node_name, data) VALUES 
('recovery-test', 'Post split-brain recovery test - $(date)');
EOF

# Verify replication works across all nodes
for node in 10.0.1.10 10.0.1.11 10.0.1.12; do
    echo "Node $node record count after recovery:"
    mysql -h $node -u root -p -e "SELECT COUNT(*) FROM galera_test.cluster_nodes"
done

Prevent Future Split-Brain Issues

Bash

# Configure Galera arbitrator for even-node clusters
sudo tee /etc/default/garb << EOF
# Galera Arbitrator configuration
GALERA_NODES="galera-node1:4567,galera-node2:4567"
GALERA_GROUP="production_cluster"
GALERA_OPTIONS=""
LOG_FILE="/var/log/garbd.log"
EOF

# Start Galera arbitrator service
sudo systemctl enable garbd
sudo systemctl start garbd

# Verify arbitrator connectivity
sudo systemctl status garbd
sudo tail -f /var/log/garbd.log

What are Galera Cluster Production Best Practices?

Production deployments require careful consideration of performance optimization, security hardening, and operational procedures. Additionally, these best practices ensure reliable cluster operation and simplified maintenance workflows.

Security Hardening Configuration

Bash

# Create dedicated Galera SSL certificates
sudo mkdir -p /etc/mysql/ssl
sudo openssl genrsa -out /etc/mysql/ssl/galera-ca-key.pem 4096
sudo openssl req -new -x509 -nodes -days 365000 \
    -key /etc/mysql/ssl/galera-ca-key.pem \
    -out /etc/mysql/ssl/galera-ca-cert.pem \
    -subj "/C=US/ST=State/L=City/O=Organization/CN=Galera-CA"

# Generate node certificates
for node in node1 node2 node3; do
    sudo openssl genrsa -out /etc/mysql/ssl/galera-$node-key.pem 4096
    sudo openssl req -new \
        -key /etc/mysql/ssl/galera-$node-key.pem \
        -out /etc/mysql/ssl/galera-$node-req.pem \
        -subj "/C=US/ST=State/L=City/O=Organization/CN=galera-$node"
    
    sudo openssl x509 -req -days 365000 \
        -in /etc/mysql/ssl/galera-$node-req.pem \
        -CA /etc/mysql/ssl/galera-ca-cert.pem \
        -CAkey /etc/mysql/ssl/galera-ca-key.pem \
        -CAcreateserial \
        -out /etc/mysql/ssl/galera-$node-cert.pem
done

# Configure SSL in Galera settings
sudo tee -a /etc/mysql/mariadb.conf.d/60-galera.cnf << EOF

# SSL Configuration
wsrep_provider_options="socket.ssl_key=/etc/mysql/ssl/galera-node1-key.pem;socket.ssl_cert=/etc/mysql/ssl/galera-node1-cert.pem;socket.ssl_ca=/etc/mysql/ssl/galera-ca-cert.pem;socket.ssl_cipher=AES128-SHA"

# MySQL SSL settings
ssl-ca=/etc/mysql/ssl/galera-ca-cert.pem
ssl-cert=/etc/mysql/ssl/galera-node1-cert.pem
ssl-key=/etc/mysql/ssl/galera-node1-key.pem
require_secure_transport=ON
EOF

# Set proper SSL file permissions
sudo chmod 600 /etc/mysql/ssl/*.pem
sudo chown mysql:mysql /etc/mysql/ssl/*.pem

Performance Optimization Settings

Bash

# Optimize Galera-specific parameters
sudo tee -a /etc/mysql/mariadb.conf.d/60-galera.cnf << EOF

# Advanced Performance Tuning
wsrep_slave_threads=16
wsrep_applier_threads=4
wsrep_replicator_threads=4

# Flow control optimization
wsrep_provider_options="gcs.fc_limit=32;gcs.fc_factor=0.8;gcs.fc_master_slave=YES"

# Memory optimization
innodb_buffer_pool_size=2G
innodb_buffer_pool_instances=8
innodb_log_buffer_size=64M
innodb_flush_method=O_DIRECT

# Network optimization
wsrep_provider_options="evs.keepalive_period=PT3S;evs.inactive_check_period=PT10S;evs.suspect_timeout=PT30S;evs.inactive_timeout=PT1M;evs.consensus_timeout=PT1M"

# Transaction optimization
transaction-isolation=READ-COMMITTED
innodb_locks_unsafe_for_binlog=1
EOF

# Apply memory tuning based on system resources
TOTAL_RAM=$(free -g | awk '/^Mem:/{print $2}')
BUFFER_POOL=$((TOTAL_RAM * 70 / 100))G

sudo sed -i "s/innodb_buffer_pool_size=2G/innodb_buffer_pool_size=$BUFFER_POOL/" /etc/mysql/mariadb.conf.d/60-galera.cnf

Backup Strategy Implementation

Bash

# Create automated Galera backup script
sudo tee /usr/local/bin/galera_backup.sh << 'EOF'
#!/bin/bash

BACKUP_DIR="/backup/mysql"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=7

# Create backup directory
mkdir -p $BACKUP_DIR

# Perform consistent backup using mariabackup
mariabackup --backup \
    --target-dir=$BACKUP_DIR/full_backup_$DATE \
    --user=backup_user \
    --password=$BACKUP_PASSWORD \
    --galera-info \
    --slave-info \
    --safe-slave-backup

# Verify backup integrity
mariabackup --prepare --target-dir=$BACKUP_DIR/full_backup_$DATE

# Compress backup
tar -czf $BACKUP_DIR/galera_backup_$DATE.tar.gz -C $BACKUP_DIR full_backup_$DATE
rm -rf $BACKUP_DIR/full_backup_$DATE

# Remove old backups
find $BACKUP_DIR -name "galera_backup_*.tar.gz" -mtime +$RETENTION_DAYS -delete

# Log backup completion
echo "$(date): Backup completed - galera_backup_$DATE.tar.gz" >> /var/log/galera_backup.log
EOF

sudo chmod +x /usr/local/bin/galera_backup.sh

# Schedule daily backups
echo "0 2 * * * /usr/local/bin/galera_backup.sh" | sudo crontab -

Load Balancer Integration

Bash

# Configure HAProxy for Galera load balancing
sudo tee /etc/haproxy/haproxy.cfg << EOF
global
    daemon
    user haproxy
    group haproxy
    log 127.0.0.1:514 local0

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
    log global

# MySQL Galera cluster configuration
listen galera_cluster
    bind *:3306
    mode tcp
    option mysql-check user haproxy_check
    balance leastconn
    
    server galera-node1 10.0.1.10:3306 check weight 1
    server galera-node2 10.0.1.11:3306 check weight 1
    server galera-node3 10.0.1.12:3306 check weight 1

# Statistics interface
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 30s
    stats admin if TRUE
EOF

# Create HAProxy health check user
mysql -u root -p << EOF
CREATE USER 'haproxy_check'@'%';
GRANT USAGE ON *.* TO 'haproxy_check'@'%';
FLUSH PRIVILEGES;
EOF

# Start and enable HAProxy
sudo systemctl enable haproxy
sudo systemctl start haproxy
sudo systemctl status haproxy

Frequently Asked Questions

What is the difference between Galera and traditional MySQL replication?

Galera provides synchronous multi-master replication with immediate consistency guarantees, while traditional MySQL replication uses asynchronous master-slave architecture. Consequently, Galera ensures zero data loss and eliminates replication lag, whereas traditional replication may experience data loss during failover scenarios. Additionally, Galera enables writes on all nodes simultaneously, improving application scalability.

How many nodes should I deploy in a Galera cluster?

Production Galera clusters should contain an odd number of nodes (3, 5, or 7) to maintain proper quorum for split-brain protection. Moreover, three nodes provide optimal cost-to-benefit ratio for most applications, while larger clusters increase network overhead. Furthermore, consider deploying nodes across different availability zones for geographic redundancy.

Can I mix different server specifications in a Galera cluster?

While technically possible, mixing server specifications can create performance bottlenecks and uneven load distribution. Therefore, use identical hardware configurations across all cluster nodes for optimal performance. However, if mixed specifications are necessary, configure the weakest node's limitations across all cluster members.

How does Galera handle network partitions?

Galera automatically detects network partitions and promotes the partition containing the majority of nodes to primary status. Subsequently, minority partitions enter non-primary mode and refuse write operations. Additionally, this quorum-based approach prevents split-brain scenarios and maintains data consistency.

What happens when a Galera node fails and rejoins?

Failed nodes automatically rejoin the cluster through State Snapshot Transfer (SST) or Incremental State Transfer (IST). Moreover, IST transfers only missing transactions if the node was offline briefly, while SST performs full synchronization for extended outages. Furthermore, the process occurs transparently without manual intervention.

How do I handle schema changes in Galera clusters?

Schema changes should be executed on one node and will replicate automatically to all cluster members. However, large schema modifications may cause performance impact during replication. Therefore, consider performing schema changes during maintenance windows and monitor cluster performance during the process.

What are the storage requirements for Galera clusters?

Each Galera node requires sufficient storage for the complete dataset plus additional space for write-set caching and state transfer operations. Moreover, plan for 20-30% additional storage capacity beyond the base dataset size. Additionally, use high-performance storage (SSD) for optimal cluster performance.

How do I optimize Galera cluster performance?

Optimize performance through proper network configuration, adequate memory allocation, and appropriate wsrep parameter tuning. Furthermore, configure dedicated network interfaces for cluster communication and ensure low-latency connectivity between nodes. Additionally, monitor queue lengths and flow control metrics regularly.

Troubleshooting Common Issues

Cluster Bootstrap Failures

Problem: Cluster fails to initialize during bootstrap process.

Solution: Check network connectivity, verify configuration file syntax, and examine MySQL error logs.

Bash

# Verify configuration syntax
sudo mysqld --print-defaults

# Check network connectivity between nodes
telnet 10.0.1.10 4567
telnet 10.0.1.11 4567
telnet 10.0.1.12 4567

# Examine error logs for specific issues
sudo tail -f /var/log/mysql/error.log | grep -i "galera\|wsrep"

# Reset cluster state if necessary
sudo rm -f /var/lib/mysql/grastate.dat
sudo galera_new_cluster

Node Synchronization Problems

Problem: Node shows "non-Primary" status or fails to synchronize.

Solution: Check cluster connectivity, verify authentication credentials, and restart problematic nodes.

Bash

# Check node connectivity status
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_connected'"

# Verify cluster authentication
mysql -u root -p -e "SHOW VARIABLES LIKE 'wsrep_sst_auth'"

# Force node resynchronization
sudo systemctl stop mariadb
sudo rm -f /var/lib/mysql/galera.cache
sudo systemctl start mariadb

# Monitor rejoin process
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_local_state_comment'"

Performance Degradation Issues

Problem: Cluster experiences slow query performance or high replication lag.

Solution: Analyze flow control metrics, optimize memory allocation, and check network latency.

Bash

# Check flow control status
mysql -u root -p << EOF
SELECT 
    VARIABLE_NAME,
    VARIABLE_VALUE,
    CASE 
        WHEN VARIABLE_NAME = 'wsrep_flow_control_paused' AND CAST(VARIABLE_VALUE AS DECIMAL) > 0.1 THEN 'HIGH'
        WHEN VARIABLE_NAME LIKE '%queue%' AND CAST(VARIABLE_VALUE AS DECIMAL) > 100 THEN 'CONGESTED'
        ELSE 'NORMAL'
    END as STATUS
FROM INFORMATION_SCHEMA.GLOBAL_STATUS 
WHERE VARIABLE_NAME IN (
    'wsrep_flow_control_paused',
    'wsrep_local_recv_queue',
    'wsrep_local_send_queue'
);
EOF

# Monitor certification conflicts
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_local_cert_failures'"

# Analyze slow queries
mysql -u root -p -e "SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY sum_timer_wait DESC LIMIT 10"

SSL Connection Errors

Problem: Nodes fail to connect using SSL configuration.

Solution: Verify certificate validity, check file permissions, and validate SSL configuration.

Bash

# Test SSL certificate validity
openssl x509 -in /etc/mysql/ssl/galera-node1-cert.pem -text -noout

# Check certificate file permissions
ls -la /etc/mysql/ssl/

# Verify SSL configuration
mysql -u root -p -e "SHOW VARIABLES LIKE '%ssl%'"

# Test SSL connectivity manually
openssl s_client -connect 10.0.1.10:3306 -cert /etc/mysql/ssl/galera-node1-cert.pem -key /etc/mysql/ssl/galera-node1-key.pem

Additional Resources

Official Documentation

MariaDB Galera Cluster Documentation - Comprehensive official guide for MariaDB Galera implementation
Galera Cluster Official Documentation - Complete Galera cluster reference documentation
MySQL Group Replication - Oracle's alternative clustering solution

Performance Tuning Resources

MariaDB Performance Optimization - Database performance tuning best practices
Galera Performance Tuning - Cluster-specific optimization guidelines
Database Monitoring Best Practices - Comprehensive monitoring strategies

Community Resources

MariaDB Community Forum - Active community support and discussions
Galera Cluster Mailing List - Official developer discussions
Stack Overflow Galera Tags - Community troubleshooting and solutions

Related LinuxTips.pro Articles

Linux Clustering with Pacemaker and Corosync - High availability infrastructure setup
MySQL/MariaDB Administration on Linux - Database server management fundamentals
Database Backup and Recovery Strategies - Data protection best practices

This article is part of the Linux Mastery 100 series, providing comprehensive guides for Linux system administration and DevOps practices.

Knowledge Overview

Prerequisites

What You'll Learn

Tools Required

Time Investment

Guide Content

MySQL Galera Cluster Configuration

Table of Contents

What is MySQL Galera Cluster and Why Use It?

Key Galera Cluster Benefits

Galera vs Traditional MySQL Replication

How to Prepare Servers for Galera Deployment?

System Requirements Verification

Firewall Configuration Setup

Time Synchronization Configuration

Host File Configuration

How to Install MariaDB Galera on Linux Systems?

Ubuntu/Debian Installation Process

CentOS/RHEL Installation Process

Initial Database Security Configuration

How to Configure Galera Cluster Settings?

Primary Node Configuration (Node 1)

Secondary Node Configuration (Node 2)

Third Node Configuration (Node 3)

How to Initialize and Bootstrap the Cluster?

Bootstrap First Cluster Node

Cluster Bootstrap Verification

Create Test Database and Tables

How to Add New Nodes to Existing Cluster?

Join Second Node to Cluster

Join Third Node to Cluster

Monitor State Transfer Process

How to Test Cluster Replication and Failover?

Data Replication Testing

Failover Testing Procedures

Performance Benchmark Testing

How to Monitor Galera Cluster Performance?

Essential Cluster Monitoring Commands

Automated Monitoring Script

Performance Metrics Collection

How to Handle Cluster Split-Brain Scenarios?

Detect Split-Brain Conditions

Split-Brain Recovery Procedures

Manual Bootstrap Recovery

Prevent Future Split-Brain Issues

What are Galera Cluster Production Best Practices?

Security Hardening Configuration

Performance Optimization Settings

Backup Strategy Implementation

Load Balancer Integration

Frequently Asked Questions

What is the difference between Galera and traditional MySQL replication?

How many nodes should I deploy in a Galera cluster?

Can I mix different server specifications in a Galera cluster?

How does Galera handle network partitions?

What happens when a Galera node fails and rejoins?

How do I handle schema changes in Galera clusters?

What are the storage requirements for Galera clusters?

How do I optimize Galera cluster performance?

Troubleshooting Common Issues

Cluster Bootstrap Failures

Node Synchronization Problems

Performance Degradation Issues

SSL Connection Errors

Additional Resources

Official Documentation

Performance Tuning Resources

Community Resources

Related LinuxTips.pro Articles