MySQL Galera Cluster: Multi-Master Database Setup Guide
Knowledge Overview
Prerequisites
- π Knowledge Prerequisites (3 Essential Areas):
- Linux System Administration
- Command line proficiency
- MySQL/MariaDB basics
- Network configuration skills
- Database Administration Basics
- MySQL configuration experience
- Security fundamentals
- Service management
- Infrastructure Knowledge
- High availability concepts
- Basic networking
- Linux security
What You'll Learn
- π What Readers Will Learn (4 Main Categories):
- Database Clustering Fundamentals
- Multi-master replication concepts
- Galera's certification mechanism
- Comparison with traditional MySQL replication
- Practical Implementation Skills
- 3-node cluster configuration
- HAProxy load balancing setup
- SSL encryption implementation
- Bootstrap procedures
- Production Operations
- Health monitoring techniques
- Failure handling procedures
- Split-brain recovery
- Performance optimization
- Security & Maintenance
- SSL hardening strategies
- Automated backup solutions
- Monitoring script creation
- Troubleshooting workflows
Tools Required
- π οΈ Required Tools & Software (5 Categories):
- Server Infrastructure
- 3 Linux servers (4GB+ RAM)
- Network connectivity (1Gbps recommended)
- NTP/Chrony synchronization
- Database Software
- MariaDB 10.6+ with Galera
- Galera Cluster 4.0+
- mariabackup utility
- Load Balancing & Monitoring
- HAProxy 2.0+
- sysbench for testing
- netcat for connectivity
- Security & Administration
- OpenSSL for certificates
- UFW/iptables for firewall
- SSH with sudo access
- Optional Monitoring Tools
- Prometheus + Grafana
- Nagios/Zabbix
- Custom monitoring scripts
Time Investment
21 minutes reading time
42-63 minutes hands-on practice
Guide Content
MySQL Galera Cluster Configuration
MySQL Galera cluster provides synchronous multi-master database replication with automatic failover capabilities. Furthermore, this clustering solution ensures zero data loss through immediate transaction synchronization across all nodes. Additionally, the setup process involves installing MariaDB with Galera libraries, configuring wsrep settings, and initializing the cluster bootstrap sequence.
# Quick Galera cluster status check
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%'"
# Essential cluster variables
mysql -u root -p -e "SHOW VARIABLES LIKE 'wsrep%'"
# Node connectivity verification
mysql -u root -p -e "SELECT * FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME IN ('wsrep_cluster_size','wsrep_ready','wsrep_connected')"
Table of Contents
- What is MySQL Galera Cluster and Why Use It?
- How to Prepare Servers for Galera Deployment?
- How to Install MariaDB Galera on Linux Systems?
- How to Configure Galera Cluster Settings?
- How to Initialize and Bootstrap the Cluster?
- How to Add New Nodes to Existing Cluster?
- How to Test Cluster Replication and Failover?
- How to Monitor Galera Cluster Performance?
- How to Handle Cluster Split-Brain Scenarios?
- What are Galera Cluster Production Best Practices?
What is MySQL Galera Cluster and Why Use It?
MySQL Galera cluster represents a synchronous multi-master database replication solution that eliminates single points of failure. Moreover, Galera provides active-active clustering where all nodes accept both read and write operations simultaneously. Consequently, this architecture ensures true high availability for mission-critical database workloads.
Key Galera Cluster Benefits
Synchronous Replication: Unlike traditional MySQL replication, Galera ensures immediate transaction synchronization across all cluster nodes. Therefore, data consistency remains guaranteed even during network partitions or node failures.
Multi-Master Architecture: Each cluster node functions as a master, accepting both read and write operations. Subsequently, this eliminates the need for complex master-slave failover procedures.
Automatic Node Recovery: Failed nodes automatically rejoin the cluster through incremental state transfer (IST) or state snapshot transfer (SST). Thus, maintenance operations become significantly simplified.
Zero Data Loss: Galera's certification-based replication ensures transaction atomicity across all nodes. Consequently, committed transactions never experience data loss scenarios.
Galera vs Traditional MySQL Replication
| Feature | Galera Cluster | Traditional Replication |
|---|---|---|
| Replication Type | Synchronous multi-master | Asynchronous master-slave |
| Write Availability | All nodes accept writes | Only master accepts writes |
| Failover Process | Automatic and immediate | Manual intervention required |
| Data Consistency | Strong consistency guaranteed | Eventual consistency only |
| Split-Brain Protection | Built-in quorum mechanism | Requires external solutions |
How to Prepare Servers for Galera Deployment?
Proper server preparation ensures optimal Galera cluster performance and prevents common configuration issues. Additionally, network connectivity and security requirements must be carefully addressed before installation begins.
System Requirements Verification
# Check system specifications
cat /proc/meminfo | grep MemTotal
cat /proc/cpuinfo | grep "processor" | wc -l
df -h /var/lib/mysql
# Verify network connectivity between nodes
ping -c 3 10.0.1.10 # Node 1
ping -c 3 10.0.1.11 # Node 2
ping -c 3 10.0.1.12 # Node 3
# Test port accessibility for Galera communication
nc -z 10.0.1.10 3306 # MySQL port
nc -z 10.0.1.10 4567 # Galera communication
nc -z 10.0.1.10 4568 # IST transfer
nc -z 10.0.1.10 4444 # SST transfer
Firewall Configuration Setup
# Configure firewall rules for Galera ports
sudo ufw allow 3306/tcp # MySQL client connections
sudo ufw allow 4567/tcp # Galera cluster communication
sudo ufw allow 4568/tcp # Incremental State Transfer
sudo ufw allow 4444/tcp # State Snapshot Transfer
# Alternative iptables configuration
sudo iptables -A INPUT -p tcp --dport 3306 -s 10.0.1.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 4567 -s 10.0.1.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 4568 -s 10.0.1.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 4444 -s 10.0.1.0/24 -j ACCEPT
# Save iptables rules
sudo iptables-save > /etc/iptables/rules.v4
Time Synchronization Configuration
# Install and configure NTP for time synchronization
sudo apt update && sudo apt install -y ntp
sudo systemctl enable ntp
sudo systemctl start ntp
# Verify time synchronization across nodes
timedatectl status
ntpq -p
# Alternative: Configure chrony for better accuracy
sudo apt install -y chrony
sudo systemctl enable chronyd
sudo systemctl start chronyd
chrony sources -v
Host File Configuration
# Configure /etc/hosts for consistent node resolution
sudo tee -a /etc/hosts << EOF
10.0.1.10 galera-node1
10.0.1.11 galera-node2
10.0.1.12 galera-node3
EOF
# Test hostname resolution
nslookup galera-node1
nslookup galera-node2
nslookup galera-node3
How to Install MariaDB Galera on Linux Systems?
MariaDB provides native Galera support through its built-in wsrep libraries. Furthermore, the installation process varies slightly between different Linux distributions but follows consistent configuration patterns.
Ubuntu/Debian Installation Process
# Update package repositories
sudo apt update && sudo apt upgrade -y
# Install MariaDB server with Galera support
sudo apt install -y mariadb-server mariadb-client galera-arbitrator-4
# Verify MariaDB installation
mysql --version
mysqladmin version
# Check Galera library availability
mysql -u root -p -e "SHOW GLOBAL VARIABLES LIKE 'wsrep%'"
# Stop MariaDB service before cluster configuration
sudo systemctl stop mariadb
sudo systemctl disable mariadb
CentOS/RHEL Installation Process
# Install MariaDB repository
sudo tee /etc/yum.repos.d/MariaDB.repo << EOF
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.6/centos7-amd64
gpgkey = https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck = 1
EOF
# Install MariaDB with Galera support
sudo yum install -y MariaDB-server MariaDB-client galera-4
# Enable and start MariaDB temporarily for initial setup
sudo systemctl enable mariadb
sudo systemctl start mariadb
# Secure MariaDB installation
sudo mysql_secure_installation
# Stop MariaDB before cluster configuration
sudo systemctl stop mariadb
Initial Database Security Configuration
# Secure MariaDB installation on the first node
sudo mysql_secure_installation
# Create cluster administration user
mysql -u root -p << EOF
CREATE USER 'cluster_admin'@'localhost' IDENTIFIED BY 'SecureClusterPassword123!';
GRANT ALL PRIVILEGES ON *.* TO 'cluster_admin'@'localhost' WITH GRANT OPTION;
CREATE USER 'cluster_admin'@'10.0.1.%' IDENTIFIED BY 'SecureClusterPassword123!';
GRANT ALL PRIVILEGES ON *.* TO 'cluster_admin'@'10.0.1.%' WITH GRANT OPTION;
CREATE USER 'sst_user'@'localhost' IDENTIFIED BY 'SSTPassword456!';
GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON *.* TO 'sst_user'@'localhost';
FLUSH PRIVILEGES;
EOF
How to Configure Galera Cluster Settings?
Galera configuration requires precise wsrep parameter settings to ensure optimal cluster performance and data consistency. Moreover, proper configuration prevents common issues like split-brain scenarios and improves replication efficiency.
Primary Node Configuration (Node 1)
# Create Galera configuration file
sudo tee /etc/mysql/mariadb.conf.d/60-galera.cnf << EOF
[galera]
# Galera cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="production_cluster"
wsrep_cluster_address="gcomm://10.0.1.10,10.0.1.11,10.0.1.12"
wsrep_node_name="galera-node1"
wsrep_node_address="10.0.1.10"
# Replication settings
wsrep_sst_method=rsync
wsrep_sst_auth="sst_user:SSTPassword456!"
wsrep_slave_threads=16
# Performance optimizations
wsrep_certify_nonPK=1
wsrep_max_ws_rows=0
wsrep_max_ws_size=2147483647
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
# Binary logging configuration
log-bin=/var/log/mysql/mysql-bin
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
# Performance tuning
innodb_buffer_pool_size=1G
innodb_log_file_size=256M
innodb_flush_log_at_trx_commit=2
query_cache_size=0
query_cache_type=0
EOF
# Set proper file permissions
sudo chmod 644 /etc/mysql/mariadb.conf.d/60-galera.cnf
sudo chown root:root /etc/mysql/mariadb.conf.d/60-galera.cnf
Secondary Node Configuration (Node 2)
# Configure second cluster node
sudo tee /etc/mysql/mariadb.conf.d/60-galera.cnf << EOF
[galera]
# Galera cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="production_cluster"
wsrep_cluster_address="gcomm://10.0.1.10,10.0.1.11,10.0.1.12"
wsrep_node_name="galera-node2"
wsrep_node_address="10.0.1.11"
# Replication settings
wsrep_sst_method=rsync
wsrep_sst_auth="sst_user:SSTPassword456!"
wsrep_slave_threads=16
# Performance optimizations
wsrep_certify_nonPK=1
wsrep_max_ws_rows=0
wsrep_max_ws_size=2147483647
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
# Binary logging configuration
log-bin=/var/log/mysql/mysql-bin
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
# Performance tuning
innodb_buffer_pool_size=1G
innodb_log_file_size=256M
innodb_flush_log_at_trx_commit=2
query_cache_size=0
query_cache_type=0
EOF
Third Node Configuration (Node 3)
# Configure third cluster node
sudo tee /etc/mysql/mariadb.conf.d/60-galera.cnf << EOF
[galera]
# Galera cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="production_cluster"
wsrep_cluster_address="gcomm://10.0.1.10,10.0.1.11,10.0.1.12"
wsrep_node_name="galera-node3"
wsrep_node_address="10.0.1.12"
# Replication settings
wsrep_sst_method=rsync
wsrep_sst_auth="sst_user:SSTPassword456!"
wsrep_slave_threads=16
# Performance optimizations
wsrep_certify_nonPK=1
wsrep_max_ws_rows=0
wsrep_max_ws_size=2147483647
wsrep_debug=0
wsrep_convert_LOCK_to_trx=0
wsrep_retry_autocommit=1
wsrep_auto_increment_control=1
# Binary logging configuration
log-bin=/var/log/mysql/mysql-bin
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
# Performance tuning
innodb_buffer_pool_size=1G
innodb_log_file_size=256M
innodb_flush_log_at_trx_commit=2
query_cache_size=0
query_cache_type=0
EOF
How to Initialize and Bootstrap the Cluster?
Cluster initialization requires careful bootstrap sequencing to establish proper Galera communication. Additionally, the bootstrap process creates the initial cluster state that subsequent nodes will join.
Bootstrap First Cluster Node
# Initialize the cluster on the first node
sudo galera_new_cluster
# Alternative bootstrap method
sudo systemctl start mariadb@bootstrap.service
# Verify cluster bootstrap status
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep%'"
# Check cluster state variables
mysql -u root -p << EOF
SELECT VARIABLE_NAME, VARIABLE_VALUE
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME IN (
'wsrep_ready',
'wsrep_cluster_status',
'wsrep_cluster_size',
'wsrep_local_state_comment'
);
EOF
# Verify node is primary and ready
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status'"
Cluster Bootstrap Verification
# Comprehensive cluster status check
mysql -u root -p << EOF
SELECT
VARIABLE_NAME,
VARIABLE_VALUE,
CASE
WHEN VARIABLE_NAME = 'wsrep_ready' AND VARIABLE_VALUE = 'ON' THEN 'OK'
WHEN VARIABLE_NAME = 'wsrep_cluster_status' AND VARIABLE_VALUE = 'Primary' THEN 'OK'
WHEN VARIABLE_NAME = 'wsrep_local_state_comment' AND VARIABLE_VALUE = 'Synced' THEN 'OK'
ELSE 'CHECK'
END as STATUS
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME IN (
'wsrep_ready',
'wsrep_cluster_status',
'wsrep_local_state_comment',
'wsrep_cluster_size',
'wsrep_local_index'
)
ORDER BY VARIABLE_NAME;
EOF
# Check error logs for bootstrap issues
sudo tail -f /var/log/mysql/error.log
Create Test Database and Tables
# Create test database for replication verification
mysql -u root -p << EOF
CREATE DATABASE galera_test;
USE galera_test;
CREATE TABLE cluster_nodes (
id INT AUTO_INCREMENT PRIMARY KEY,
node_name VARCHAR(50),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
data VARCHAR(255)
);
INSERT INTO cluster_nodes (node_name, data) VALUES
('galera-node1', 'Bootstrap node data'),
('galera-node1', 'Initial cluster setup complete');
SELECT * FROM cluster_nodes;
EOF
How to Add New Nodes to Existing Cluster?
Adding nodes to an existing Galera cluster requires proper sequencing and state transfer configuration. Moreover, the process involves automatic synchronization through either IST or SST mechanisms.
Join Second Node to Cluster
# Start MariaDB on the second node (Node 2)
sudo systemctl start mariadb
# Monitor the joining process
sudo tail -f /var/log/mysql/error.log
# Verify successful cluster join
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
# Check node synchronization status
mysql -u root -p << EOF
SELECT
VARIABLE_NAME,
VARIABLE_VALUE
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME IN (
'wsrep_cluster_size',
'wsrep_local_state_comment',
'wsrep_ready',
'wsrep_connected'
);
EOF
Join Third Node to Cluster
# Start MariaDB on the third node (Node 3)
sudo systemctl start mariadb
# Verify three-node cluster formation
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
# Test data replication across all nodes
mysql -u root -p << EOF
USE galera_test;
INSERT INTO cluster_nodes (node_name, data) VALUES
('galera-node3', 'Third node joined successfully');
SELECT node_name, data, created_at FROM cluster_nodes ORDER BY created_at;
EOF
# Verify cluster membership
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_incoming_addresses'"
Monitor State Transfer Process
# Monitor SST/IST progress during node joining
mysql -u root -p << EOF
SELECT
VARIABLE_NAME,
VARIABLE_VALUE
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME IN (
'wsrep_local_state',
'wsrep_local_state_comment',
'wsrep_local_recv_queue_avg',
'wsrep_local_send_queue_avg',
'wsrep_flow_control_paused'
);
EOF
# Check for SST/IST completion
grep -E "(SST|IST)" /var/log/mysql/error.log | tail -10
# Verify all nodes are synchronized
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_local_state_comment'"
How to Test Cluster Replication and Failover?
Comprehensive testing validates Galera cluster functionality and ensures proper failover behavior. Additionally, testing scenarios should cover both planned and unplanned node failures.
Data Replication Testing
# Test write operations on Node 1
mysql -u root -p << EOF
USE galera_test;
INSERT INTO cluster_nodes (node_name, data) VALUES
('node1-test', 'Write test from node 1 - $(date)');
SELECT COUNT(*) as total_records FROM cluster_nodes;
EOF
# Verify replication on Node 2
mysql -h 10.0.1.11 -u root -p << EOF
USE galera_test;
SELECT node_name, data FROM cluster_nodes ORDER BY id DESC LIMIT 3;
EOF
# Test concurrent writes from Node 3
mysql -h 10.0.1.12 -u root -p << EOF
USE galera_test;
INSERT INTO cluster_nodes (node_name, data) VALUES
('node3-test', 'Concurrent write test - $(date)');
SELECT COUNT(*) as total_records FROM cluster_nodes;
EOF
# Verify data consistency across all nodes
for node in 10.0.1.10 10.0.1.11 10.0.1.12; do
echo "Node $node record count:"
mysql -h $node -u root -p -e "SELECT COUNT(*) FROM galera_test.cluster_nodes"
done
Failover Testing Procedures
# Simulate Node 2 failure
ssh 10.0.1.11 'sudo systemctl stop mariadb'
# Verify cluster continues with 2 nodes
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
# Test write operations during failover
mysql -u root -p << EOF
USE galera_test;
INSERT INTO cluster_nodes (node_name, data) VALUES
('failover-test', 'Data written during node2 failure - $(date)');
EOF
# Restart failed node and verify rejoin
ssh 10.0.1.11 'sudo systemctl start mariadb'
# Wait for node recovery and check cluster size
sleep 30
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
# Verify data synchronization after rejoin
mysql -h 10.0.1.11 -u root -p << EOF
USE galera_test;
SELECT node_name, data FROM cluster_nodes WHERE data LIKE '%failover%';
EOF
Performance Benchmark Testing
# Install sysbench for database benchmarking
sudo apt install -y sysbench
# Prepare benchmark database
mysql -u root -p -e "CREATE DATABASE sysbench_test"
# Run sysbench preparation
sysbench oltp_read_write \
--db-driver=mysql \
--mysql-host=10.0.1.10 \
--mysql-port=3306 \
--mysql-user=root \
--mysql-password=YourPassword \
--mysql-db=sysbench_test \
--tables=4 \
--table-size=10000 \
prepare
# Execute benchmark test
sysbench oltp_read_write \
--db-driver=mysql \
--mysql-host=10.0.1.10 \
--mysql-port=3306 \
--mysql-user=root \
--mysql-password=YourPassword \
--mysql-db=sysbench_test \
--tables=4 \
--table-size=10000 \
--threads=8 \
--time=60 \
--report-interval=10 \
run
# Cleanup benchmark data
sysbench oltp_read_write \
--db-driver=mysql \
--mysql-host=10.0.1.10 \
--mysql-port=3306 \
--mysql-user=root \
--mysql-password=YourPassword \
--mysql-db=sysbench_test \
--tables=4 \
cleanup
How to Monitor Galera Cluster Performance?
Effective monitoring ensures optimal cluster performance and early detection of potential issues. Furthermore, proper monitoring strategies help maintain cluster health and prevent service disruptions.
Essential Cluster Monitoring Commands
# Real-time cluster status monitoring
watch -n 2 'mysql -u root -p -e "SHOW STATUS LIKE \"wsrep%\"" | grep -E "(cluster_size|ready|local_state|flow_control)"'
# Comprehensive cluster health check
mysql -u root -p << EOF
SELECT
'Cluster Size' as Metric, wsrep_cluster_size as Value
FROM (SELECT VARIABLE_VALUE as wsrep_cluster_size
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_cluster_size') t1
UNION ALL
SELECT
'Local State', wsrep_local_state_comment
FROM (SELECT VARIABLE_VALUE as wsrep_local_state_comment
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_local_state_comment') t2
UNION ALL
SELECT
'Ready Status', wsrep_ready
FROM (SELECT VARIABLE_VALUE as wsrep_ready
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_ready') t3
UNION ALL
SELECT
'Connected', wsrep_connected
FROM (SELECT VARIABLE_VALUE as wsrep_connected
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_connected') t4;
EOF
# Monitor replication queue lengths
mysql -u root -p << EOF
SELECT
VARIABLE_NAME,
VARIABLE_VALUE,
CASE
WHEN VARIABLE_NAME LIKE '%queue%' AND CAST(VARIABLE_VALUE AS DECIMAL) > 100 THEN 'WARNING'
WHEN VARIABLE_NAME LIKE '%queue%' AND CAST(VARIABLE_VALUE AS DECIMAL) > 1000 THEN 'CRITICAL'
ELSE 'OK'
END as STATUS
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME IN (
'wsrep_local_recv_queue',
'wsrep_local_send_queue',
'wsrep_local_recv_queue_avg',
'wsrep_local_send_queue_avg'
);
EOF
Automated Monitoring Script
# Create comprehensive cluster monitoring script
sudo tee /usr/local/bin/galera_monitor.sh << 'EOF'
#!/bin/bash
# Galera Cluster Monitoring Script
LOG_FILE="/var/log/galera_monitor.log"
DATE=$(date '+%Y-%m-%d %H:%M:%S')
echo "[$DATE] Galera Cluster Health Check" >> $LOG_FILE
# Check cluster connectivity
CLUSTER_SIZE=$(mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW STATUS LIKE 'wsrep_cluster_size'" | grep wsrep | awk '{print $2}')
LOCAL_STATE=$(mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW STATUS LIKE 'wsrep_local_state_comment'" | grep wsrep | awk '{print $2}')
READY_STATUS=$(mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW STATUS LIKE 'wsrep_ready'" | grep wsrep | awk '{print $2}')
# Log cluster status
echo "[$DATE] Cluster Size: $CLUSTER_SIZE" >> $LOG_FILE
echo "[$DATE] Local State: $LOCAL_STATE" >> $LOG_FILE
echo "[$DATE] Ready Status: $READY_STATUS" >> $LOG_FILE
# Check for issues
if [ "$CLUSTER_SIZE" -lt "3" ]; then
echo "[$DATE] WARNING: Cluster size reduced to $CLUSTER_SIZE" >> $LOG_FILE
# Send alert notification
echo "Galera cluster size warning: $CLUSTER_SIZE nodes" | mail -s "Galera Alert" admin@example.com
fi
if [ "$LOCAL_STATE" != "Synced" ]; then
echo "[$DATE] ERROR: Node not synchronized - $LOCAL_STATE" >> $LOG_FILE
echo "Galera node synchronization error: $LOCAL_STATE" | mail -s "Galera Critical" admin@example.com
fi
# Check flow control
FLOW_CONTROL=$(mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW STATUS LIKE 'wsrep_flow_control_paused'" | grep wsrep | awk '{print $2}')
if (( $(echo "$FLOW_CONTROL > 0.1" | bc -l) )); then
echo "[$DATE] WARNING: Flow control active - $FLOW_CONTROL" >> $LOG_FILE
fi
echo "[$DATE] Health check completed" >> $LOG_FILE
EOF
# Make script executable
sudo chmod +x /usr/local/bin/galera_monitor.sh
# Schedule monitoring via cron
echo "*/5 * * * * /usr/local/bin/galera_monitor.sh" | sudo crontab -
Performance Metrics Collection
# Create performance metrics collection script
sudo tee /usr/local/bin/galera_metrics.sh << 'EOF'
#!/bin/bash
METRICS_FILE="/var/log/galera_metrics.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
# Collect key performance metrics
mysql -u root -p${MYSQL_ROOT_PASSWORD} << SQL >> $METRICS_FILE
SELECT
'$TIMESTAMP' as timestamp,
VARIABLE_NAME,
VARIABLE_VALUE
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME IN (
'wsrep_replicated',
'wsrep_replicated_bytes',
'wsrep_received',
'wsrep_received_bytes',
'wsrep_commit_window',
'wsrep_cert_deps_distance',
'wsrep_apply_window'
);
SQL
# Log current connections
CONNECTIONS=$(mysql -u root -p${MYSQL_ROOT_PASSWORD} -e "SHOW STATUS LIKE 'Threads_connected'" | grep Threads | awk '{print $2}')
echo "$TIMESTAMP,connections,$CONNECTIONS" >> $METRICS_FILE
EOF
sudo chmod +x /usr/local/bin/galera_metrics.sh
# Schedule metrics collection every minute
echo "* * * * * /usr/local/bin/galera_metrics.sh" | sudo crontab -
How to Handle Cluster Split-Brain Scenarios?
Split-brain scenarios require immediate attention and proper recovery procedures to maintain data integrity. Moreover, Galera's built-in quorum mechanism helps prevent most split-brain conditions automatically.
Detect Split-Brain Conditions
# Check for non-primary cluster status
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status'"
# Verify cluster connectivity
mysql -u root -p << EOF
SELECT
CASE
WHEN VARIABLE_VALUE = 'Primary' THEN 'HEALTHY'
WHEN VARIABLE_VALUE = 'non-Primary' THEN 'SPLIT-BRAIN'
ELSE 'UNKNOWN'
END as cluster_health,
VARIABLE_VALUE as cluster_status
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME = 'wsrep_cluster_status';
EOF
# Check node connectivity across cluster
for node in 10.0.1.10 10.0.1.11 10.0.1.12; do
echo "Checking node $node:"
mysql -h $node -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_status'" 2>/dev/null || echo "Node $node unreachable"
done
Split-Brain Recovery Procedures
# Method 1: Bootstrap from the most recent node
# First, stop all MariaDB services
ssh 10.0.1.10 'sudo systemctl stop mariadb'
ssh 10.0.1.11 'sudo systemctl stop mariadb'
ssh 10.0.1.12 'sudo systemctl stop mariadb'
# Check grastate.dat files to find the most recent node
for node in 10.0.1.10 10.0.1.11 10.0.1.12; do
echo "Node $node grastate.dat:"
ssh $node 'sudo cat /var/lib/mysql/grastate.dat | grep seqno'
done
# Bootstrap from the node with highest seqno
sudo galera_new_cluster # Run on the selected node
# Start other nodes normally
ssh 10.0.1.11 'sudo systemctl start mariadb'
ssh 10.0.1.12 'sudo systemctl start mariadb'
# Verify cluster recovery
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
Manual Bootstrap Recovery
# Method 2: Force bootstrap when automatic detection fails
# Edit grastate.dat on the chosen bootstrap node
sudo systemctl stop mariadb
sudo sed -i 's/safe_to_bootstrap: 0/safe_to_bootstrap: 1/' /var/lib/mysql/grastate.dat
# Verify the change
sudo cat /var/lib/mysql/grastate.dat
# Bootstrap the cluster
sudo galera_new_cluster
# Start remaining nodes
ssh 10.0.1.11 'sudo systemctl start mariadb'
ssh 10.0.1.12 'sudo systemctl start mariadb'
# Test data consistency after recovery
mysql -u root -p << EOF
USE galera_test;
SELECT COUNT(*) as record_count FROM cluster_nodes;
INSERT INTO cluster_nodes (node_name, data) VALUES
('recovery-test', 'Post split-brain recovery test - $(date)');
EOF
# Verify replication works across all nodes
for node in 10.0.1.10 10.0.1.11 10.0.1.12; do
echo "Node $node record count after recovery:"
mysql -h $node -u root -p -e "SELECT COUNT(*) FROM galera_test.cluster_nodes"
done
Prevent Future Split-Brain Issues
# Configure Galera arbitrator for even-node clusters
sudo tee /etc/default/garb << EOF
# Galera Arbitrator configuration
GALERA_NODES="galera-node1:4567,galera-node2:4567"
GALERA_GROUP="production_cluster"
GALERA_OPTIONS=""
LOG_FILE="/var/log/garbd.log"
EOF
# Start Galera arbitrator service
sudo systemctl enable garbd
sudo systemctl start garbd
# Verify arbitrator connectivity
sudo systemctl status garbd
sudo tail -f /var/log/garbd.log
What are Galera Cluster Production Best Practices?
Production deployments require careful consideration of performance optimization, security hardening, and operational procedures. Additionally, these best practices ensure reliable cluster operation and simplified maintenance workflows.
Security Hardening Configuration
# Create dedicated Galera SSL certificates
sudo mkdir -p /etc/mysql/ssl
sudo openssl genrsa -out /etc/mysql/ssl/galera-ca-key.pem 4096
sudo openssl req -new -x509 -nodes -days 365000 \
-key /etc/mysql/ssl/galera-ca-key.pem \
-out /etc/mysql/ssl/galera-ca-cert.pem \
-subj "/C=US/ST=State/L=City/O=Organization/CN=Galera-CA"
# Generate node certificates
for node in node1 node2 node3; do
sudo openssl genrsa -out /etc/mysql/ssl/galera-$node-key.pem 4096
sudo openssl req -new \
-key /etc/mysql/ssl/galera-$node-key.pem \
-out /etc/mysql/ssl/galera-$node-req.pem \
-subj "/C=US/ST=State/L=City/O=Organization/CN=galera-$node"
sudo openssl x509 -req -days 365000 \
-in /etc/mysql/ssl/galera-$node-req.pem \
-CA /etc/mysql/ssl/galera-ca-cert.pem \
-CAkey /etc/mysql/ssl/galera-ca-key.pem \
-CAcreateserial \
-out /etc/mysql/ssl/galera-$node-cert.pem
done
# Configure SSL in Galera settings
sudo tee -a /etc/mysql/mariadb.conf.d/60-galera.cnf << EOF
# SSL Configuration
wsrep_provider_options="socket.ssl_key=/etc/mysql/ssl/galera-node1-key.pem;socket.ssl_cert=/etc/mysql/ssl/galera-node1-cert.pem;socket.ssl_ca=/etc/mysql/ssl/galera-ca-cert.pem;socket.ssl_cipher=AES128-SHA"
# MySQL SSL settings
ssl-ca=/etc/mysql/ssl/galera-ca-cert.pem
ssl-cert=/etc/mysql/ssl/galera-node1-cert.pem
ssl-key=/etc/mysql/ssl/galera-node1-key.pem
require_secure_transport=ON
EOF
# Set proper SSL file permissions
sudo chmod 600 /etc/mysql/ssl/*.pem
sudo chown mysql:mysql /etc/mysql/ssl/*.pem
Performance Optimization Settings
# Optimize Galera-specific parameters
sudo tee -a /etc/mysql/mariadb.conf.d/60-galera.cnf << EOF
# Advanced Performance Tuning
wsrep_slave_threads=16
wsrep_applier_threads=4
wsrep_replicator_threads=4
# Flow control optimization
wsrep_provider_options="gcs.fc_limit=32;gcs.fc_factor=0.8;gcs.fc_master_slave=YES"
# Memory optimization
innodb_buffer_pool_size=2G
innodb_buffer_pool_instances=8
innodb_log_buffer_size=64M
innodb_flush_method=O_DIRECT
# Network optimization
wsrep_provider_options="evs.keepalive_period=PT3S;evs.inactive_check_period=PT10S;evs.suspect_timeout=PT30S;evs.inactive_timeout=PT1M;evs.consensus_timeout=PT1M"
# Transaction optimization
transaction-isolation=READ-COMMITTED
innodb_locks_unsafe_for_binlog=1
EOF
# Apply memory tuning based on system resources
TOTAL_RAM=$(free -g | awk '/^Mem:/{print $2}')
BUFFER_POOL=$((TOTAL_RAM * 70 / 100))G
sudo sed -i "s/innodb_buffer_pool_size=2G/innodb_buffer_pool_size=$BUFFER_POOL/" /etc/mysql/mariadb.conf.d/60-galera.cnf
Backup Strategy Implementation
# Create automated Galera backup script
sudo tee /usr/local/bin/galera_backup.sh << 'EOF'
#!/bin/bash
BACKUP_DIR="/backup/mysql"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=7
# Create backup directory
mkdir -p $BACKUP_DIR
# Perform consistent backup using mariabackup
mariabackup --backup \
--target-dir=$BACKUP_DIR/full_backup_$DATE \
--user=backup_user \
--password=$BACKUP_PASSWORD \
--galera-info \
--slave-info \
--safe-slave-backup
# Verify backup integrity
mariabackup --prepare --target-dir=$BACKUP_DIR/full_backup_$DATE
# Compress backup
tar -czf $BACKUP_DIR/galera_backup_$DATE.tar.gz -C $BACKUP_DIR full_backup_$DATE
rm -rf $BACKUP_DIR/full_backup_$DATE
# Remove old backups
find $BACKUP_DIR -name "galera_backup_*.tar.gz" -mtime +$RETENTION_DAYS -delete
# Log backup completion
echo "$(date): Backup completed - galera_backup_$DATE.tar.gz" >> /var/log/galera_backup.log
EOF
sudo chmod +x /usr/local/bin/galera_backup.sh
# Schedule daily backups
echo "0 2 * * * /usr/local/bin/galera_backup.sh" | sudo crontab -
Load Balancer Integration
# Configure HAProxy for Galera load balancing
sudo tee /etc/haproxy/haproxy.cfg << EOF
global
daemon
user haproxy
group haproxy
log 127.0.0.1:514 local0
defaults
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
log global
# MySQL Galera cluster configuration
listen galera_cluster
bind *:3306
mode tcp
option mysql-check user haproxy_check
balance leastconn
server galera-node1 10.0.1.10:3306 check weight 1
server galera-node2 10.0.1.11:3306 check weight 1
server galera-node3 10.0.1.12:3306 check weight 1
# Statistics interface
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 30s
stats admin if TRUE
EOF
# Create HAProxy health check user
mysql -u root -p << EOF
CREATE USER 'haproxy_check'@'%';
GRANT USAGE ON *.* TO 'haproxy_check'@'%';
FLUSH PRIVILEGES;
EOF
# Start and enable HAProxy
sudo systemctl enable haproxy
sudo systemctl start haproxy
sudo systemctl status haproxy
Frequently Asked Questions
What is the difference between Galera and traditional MySQL replication?
Galera provides synchronous multi-master replication with immediate consistency guarantees, while traditional MySQL replication uses asynchronous master-slave architecture. Consequently, Galera ensures zero data loss and eliminates replication lag, whereas traditional replication may experience data loss during failover scenarios. Additionally, Galera enables writes on all nodes simultaneously, improving application scalability.
How many nodes should I deploy in a Galera cluster?
Production Galera clusters should contain an odd number of nodes (3, 5, or 7) to maintain proper quorum for split-brain protection. Moreover, three nodes provide optimal cost-to-benefit ratio for most applications, while larger clusters increase network overhead. Furthermore, consider deploying nodes across different availability zones for geographic redundancy.
Can I mix different server specifications in a Galera cluster?
While technically possible, mixing server specifications can create performance bottlenecks and uneven load distribution. Therefore, use identical hardware configurations across all cluster nodes for optimal performance. However, if mixed specifications are necessary, configure the weakest node's limitations across all cluster members.
How does Galera handle network partitions?
Galera automatically detects network partitions and promotes the partition containing the majority of nodes to primary status. Subsequently, minority partitions enter non-primary mode and refuse write operations. Additionally, this quorum-based approach prevents split-brain scenarios and maintains data consistency.
What happens when a Galera node fails and rejoins?
Failed nodes automatically rejoin the cluster through State Snapshot Transfer (SST) or Incremental State Transfer (IST). Moreover, IST transfers only missing transactions if the node was offline briefly, while SST performs full synchronization for extended outages. Furthermore, the process occurs transparently without manual intervention.
How do I handle schema changes in Galera clusters?
Schema changes should be executed on one node and will replicate automatically to all cluster members. However, large schema modifications may cause performance impact during replication. Therefore, consider performing schema changes during maintenance windows and monitor cluster performance during the process.
What are the storage requirements for Galera clusters?
Each Galera node requires sufficient storage for the complete dataset plus additional space for write-set caching and state transfer operations. Moreover, plan for 20-30% additional storage capacity beyond the base dataset size. Additionally, use high-performance storage (SSD) for optimal cluster performance.
How do I optimize Galera cluster performance?
Optimize performance through proper network configuration, adequate memory allocation, and appropriate wsrep parameter tuning. Furthermore, configure dedicated network interfaces for cluster communication and ensure low-latency connectivity between nodes. Additionally, monitor queue lengths and flow control metrics regularly.
Troubleshooting Common Issues
Cluster Bootstrap Failures
Problem: Cluster fails to initialize during bootstrap process.
Solution: Check network connectivity, verify configuration file syntax, and examine MySQL error logs.
# Verify configuration syntax
sudo mysqld --print-defaults
# Check network connectivity between nodes
telnet 10.0.1.10 4567
telnet 10.0.1.11 4567
telnet 10.0.1.12 4567
# Examine error logs for specific issues
sudo tail -f /var/log/mysql/error.log | grep -i "galera\|wsrep"
# Reset cluster state if necessary
sudo rm -f /var/lib/mysql/grastate.dat
sudo galera_new_cluster
Node Synchronization Problems
Problem: Node shows "non-Primary" status or fails to synchronize.
Solution: Check cluster connectivity, verify authentication credentials, and restart problematic nodes.
# Check node connectivity status
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_connected'"
# Verify cluster authentication
mysql -u root -p -e "SHOW VARIABLES LIKE 'wsrep_sst_auth'"
# Force node resynchronization
sudo systemctl stop mariadb
sudo rm -f /var/lib/mysql/galera.cache
sudo systemctl start mariadb
# Monitor rejoin process
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_local_state_comment'"
Performance Degradation Issues
Problem: Cluster experiences slow query performance or high replication lag.
Solution: Analyze flow control metrics, optimize memory allocation, and check network latency.
# Check flow control status
mysql -u root -p << EOF
SELECT
VARIABLE_NAME,
VARIABLE_VALUE,
CASE
WHEN VARIABLE_NAME = 'wsrep_flow_control_paused' AND CAST(VARIABLE_VALUE AS DECIMAL) > 0.1 THEN 'HIGH'
WHEN VARIABLE_NAME LIKE '%queue%' AND CAST(VARIABLE_VALUE AS DECIMAL) > 100 THEN 'CONGESTED'
ELSE 'NORMAL'
END as STATUS
FROM INFORMATION_SCHEMA.GLOBAL_STATUS
WHERE VARIABLE_NAME IN (
'wsrep_flow_control_paused',
'wsrep_local_recv_queue',
'wsrep_local_send_queue'
);
EOF
# Monitor certification conflicts
mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_local_cert_failures'"
# Analyze slow queries
mysql -u root -p -e "SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY sum_timer_wait DESC LIMIT 10"
SSL Connection Errors
Problem: Nodes fail to connect using SSL configuration.
Solution: Verify certificate validity, check file permissions, and validate SSL configuration.
# Test SSL certificate validity
openssl x509 -in /etc/mysql/ssl/galera-node1-cert.pem -text -noout
# Check certificate file permissions
ls -la /etc/mysql/ssl/
# Verify SSL configuration
mysql -u root -p -e "SHOW VARIABLES LIKE '%ssl%'"
# Test SSL connectivity manually
openssl s_client -connect 10.0.1.10:3306 -cert /etc/mysql/ssl/galera-node1-cert.pem -key /etc/mysql/ssl/galera-node1-key.pem
Additional Resources
Official Documentation
- MariaDB Galera Cluster Documentation - Comprehensive official guide for MariaDB Galera implementation
- Galera Cluster Official Documentation - Complete Galera cluster reference documentation
- MySQL Group Replication - Oracle's alternative clustering solution
Performance Tuning Resources
- MariaDB Performance Optimization - Database performance tuning best practices
- Galera Performance Tuning - Cluster-specific optimization guidelines
- Database Monitoring Best Practices - Comprehensive monitoring strategies
Community Resources
- MariaDB Community Forum - Active community support and discussions
- Galera Cluster Mailing List - Official developer discussions
- Stack Overflow Galera Tags - Community troubleshooting and solutions
Related LinuxTips.pro Articles
- Linux Clustering with Pacemaker and Corosync - High availability infrastructure setup
- MySQL/MariaDB Administration on Linux - Database server management fundamentals
- Database Backup and Recovery Strategies - Data protection best practices
This article is part of the Linux Mastery 100 series, providing comprehensive guides for Linux system administration and DevOps practices.