A cluster is a group of servers and other resources that act as a single system. Utilizing a cluster can increase data reliability, availability, and scalability. In this article, we will focus on Ceph, a distributed object storage and file system, and MariaDB Galera, a multi-master database solution, to achieve high availability and performance. This cluster setup is the basis for the website that you are currently reading.
Prerequisites
You will need a minimum of 3 servers set up running on a Linux OS. Having less than 3 servers is not recommended as it can break the cluster and could lead to split-brain issues.
Ceph
Ceph is a highly scalable, fault-tolerant, and highly available storage system.
How Ceph works
A Ceph Storage Cluster is based on several types of daemons:
- Ceph OSD (Object Storage Daemon): These are the heart of Ceph because they handle data storage, data replication, recovery, rebalancing, and provide some monitoring statistics to Ceph monitors.
- Ceph MON (Monitor): They maintain the master copy of the cluster map. A cluster usually has an odd number of monitors running (e.g., 3, 5, 7).
- Ceph MDS (Metadata Server): These servers are optional and store metadata for the Ceph File System (not block devices or objects).
- Ceph MGR (Manager): This daemon is responsible for keeping track of runtime metrics, managing the cluster's state, and providing additional interfaces to external monitoring and management systems.
All these daemons are installed on multiple servers and interact with each other to form the cluster.
Use Cases of Ceph
Ceph is commonly used in clouds of all sizes and types due to its versatility, massive scalability, and robust data protection.
- Object Storage: Ceph provides features such as replication and erasure coding, tiering, and the ability to set up watch/notify and object-level key-value mappings.
- Block Storage: Ceph's RADOS Block Device (RBD) supports snapshots, and replication, and can significantly improve read performance by utilizing the cache of the client and the Ceph OSD.
- File System: Ceph's file system (CephFS) ensures highly available and reliable storage, where all data gets written and read in/from the object store.
Performance optimizations
The following optimizations resulted in 10 times faster read times of data stored on my cluster, which contains mostly websites and emails.
Tuning encrypted OSDs on SSDs
SSDs are typically faster than HDDs and have much lower latency. For historical reasons, the Linux kernel uses work queues in dm-crypt, which offloads encryption and disk read/write work to different threads in the kernel. This makes sense for slower HDDs, but for SSDs, it can harm performance due to the overhead of context switches.
To disable this behaviour, we first have to find the relevant device with the command dmsetup ls --tree
. If you have an OSD running, the underlying device will show up in the list printed by dmsetup. To get only the relevant UUID of the ceph device, we can use dmsetup ls | grep ceph | sed 's/.*osd--block--//' | sed 's/--/-/g' | awk '{print $1;}'
. Now as a final step we can set persistent options for the device using cryptsetup --perf-no_read_workqueue --perf-no_write_workqueue --persistent refresh {$PARENT_DEVICE} -d <(ceph tell mon config-key get dm-crypt/osd/{$DEVICE_UUID}/luks)
.
Tuning CephFS for many files in one directory
CephFS performance suffers when many files are stored in the same directory, just like with every file system. Using the option mds_bal_split_size
it is however possible to split directory indexes in multiple parts to increase performance when operating in directories with many files. I have set this down to a value of 5000
.
MariaDB Galera Cluster
MariaDB Galera Cluster is an open-source database system focusing on high availability, failing over between servers seamlessly, and ensuring data consistency between those servers. It is a multi-master cluster that uses synchronous replication.
How Galera Works
In a Galera Cluster, every database instance (or "node") is a master, meaning data can be written or read from any node, with changes automatically replicated across all nodes. Using synchronous replication MariaDB Galera Cluster ensures all nodes have the same data simultaneously.
Use Cases for MariaDB Galera Cluster
MariaDB Galera Cluster is best suited for applications where data availability, consistency, and durability are critical across multiple nodes, such as when deploying a web application with multiple database servers to scale up a busy service, or when deploying a high-availability database with multiple nodes.
Hire an expert
If you're considering implementing this for your business, don't hesitate to seek help from an expert. Please зв'язатися зі мною for assistance and consultation with your implementation.