Cluster setup

🇬🇧
🇨🇿
🇩🇪
🇧🇷
🇷🇺
🇹🇷
🇺🇦
🇨🇳
+

A cluster is a group of servers and other resources that act as a single system. Utilizing a cluster can increase data reliability, availability, and scalability. In this article, we will focus on Ceph, a distributed object storage and file system, and MariaDB Galera, a multi-master database solution, to achieve high availability and performance. This cluster setup is the basis for the website that you are currently reading.

Prerequisites

You will need a minimum of 3 servers set up running on a Linux OS. Having less than 3 servers is not recommended as it can break the cluster and could lead to split-brain issues.

Ceph

Ceph is a highly scalable, fault-tolerant, and highly available storage system.

How Ceph works

A Ceph Storage Cluster is based on several types of daemons:

All these daemons are installed on multiple servers and interact with each other to form the cluster.

Use Cases of Ceph

Ceph is commonly used in clouds of all sizes and types due to its versatility, massive scalability, and robust data protection.

Performance optimizations

The following optimizations resulted in 10 times faster read times of data stored on my cluster, which contains mostly websites and emails.

Tuning encrypted OSDs on SSDs

SSDs are typically faster than HDDs and have much lower latency. For historical reasons, the Linux kernel uses work queues in dm-crypt, which offloads encryption and disk read/write work to different threads in the kernel. This makes sense for slower HDDs, but for SSDs, it can harm performance due to the overhead of context switches.

To disable this behaviour, we first have to find the relevant device with the command dmsetup ls --tree. If you have an OSD running, the underlying device will show up in the list printed by dmsetup. To get only the relevant UUID of the ceph device, we can use dmsetup ls | grep ceph | sed 's/.*osd--block--//' | sed 's/--/-/g' | awk '{print $1;}'. Now as a final step we can set persistent options for the device using cryptsetup --perf-no_read_workqueue --perf-no_write_workqueue --persistent refresh {$PARENT_DEVICE} -d <(ceph tell mon config-key get dm-crypt/osd/{$DEVICE_UUID}/luks).

Tuning CephFS for many files in one directory

CephFS performance suffers when many files are stored in the same directory, just like with every file system. Using the option mds_bal_split_size it is however possible to split directory indexes in multiple parts to increase performance when operating in directories with many files. I have set this down to a value of 5000.

MariaDB Galera Cluster

MariaDB Galera Cluster is an open-source database system focusing on high availability, failing over between servers seamlessly, and ensuring data consistency between those servers. It is a multi-master cluster that uses synchronous replication.

How Galera Works

In a Galera Cluster, every database instance (or "node") is a master, meaning data can be written or read from any node, with changes automatically replicated across all nodes. Using synchronous replication MariaDB Galera Cluster ensures all nodes have the same data simultaneously.

Use Cases for MariaDB Galera Cluster

MariaDB Galera Cluster is best suited for applications where data availability, consistency, and durability are critical across multiple nodes, such as when deploying a web application with multiple database servers to scale up a busy service, or when deploying a high-availability database with multiple nodes.

Hire an expert

If you're considering implementing this for your business, don't hesitate to seek help from an expert. Please contact me for assistance and consultation with your implementation.