Traditionally, deploying and managing a Cassandra cluster has been complex. But with Kubernetes, it’s much more straightforward.
The StatefulSet manifest in this tutorial creates a Cassandra ring with three pods. The pods are scheduled by the Kubernetes control plane.
Distributed tracing is an essential tool for debugging microservices applications.
Scalability
The Cassandra open-source NoSQL database is a powerful, scalable solution. It can serve data to many users at the same time with high throughput and low latency. It is a highly scalable, distributed data model that can scale horizontally (add more nodes) and vertically (add more storage). The scalability of Cassandra is due to its replication and consistency levels.
The replication layer of Cassandra allows for a set of nodes to be clustered into a ring. All data is replicated across the ring, so requests can be sent to any node within the call to get a consistent response. When a node dies, the replication controller will start a new pod with an emptyDir to continue replicating the data stored on that node.
Cassandra also provides an additional level of scalability through its compaction process, which is performed when a Memtable becomes too full. This process reads all SSTable files, applies a merge sort, and deletes old Memtables.
Portworx enables storage admins to capture snapshots of their multi-node Cassandra database and deploy them as a StatefulSet in Kubernetes. These snapshots provide consistent, multi-node, stateful replication that eliminates performance bottlenecks and allows quick backups and restores.
Security
Since the containerization of applications and their deployment have been standardized on Kubernetes, many developers want to bring the data plane under that same umbrella. This approach suits application lifecycle management but requires careful attention to ensure the database is not a performance bottleneck.
Cassandra Kubernetes is a distributed database that operates as a ring or cluster of nodes in various locations, also known as data centers. This architecture provides fault tolerance and enables you to set a replication factor for the database so that even if one place fails, other nodes will have the same data so they can continue to serve requests.
When deploying a Cassandra cluster in your Kubernetes cluster, the easiest way to manage it is to use a StatefulSet. A StatefulSet is a Kubernetes resource that declares pods that perform the same task and that the Kubernetes control plane schedules to run on your infrastructure. In this tutorial, we deploy a StatefulSet with a custom Cassandra seed provider that lets the database discover new Cassandra Pods as they appear inside your cluster.
It also creates a headless service that serves as the network identity for the pods in the StatefulSet and binds Portworx persistent volumes to them. Once the pods are running and we’ve validated that they’re associated with the correct persistent volume claims, we can use Excel to inspect them.
Performance
Cassandra is a highly scalable and performant database. Both technologies support horizontal and vertical scaling based on the number of nodes in a cluster. They are self-healing and can scale without third-party management tools or downtime. Cassandra is agnostic to deployment, so it can run on-premises, in the cloud, or in hybrid models (watch this video to learn more).
The challenge with running Cassandra on Kubernetes comes from the fact that Kubernetes has no understanding or insight into key database functionality. Requires significant effort to script and leverage existing Kubernetes functionality. However, the open-source community tool offers a solution that reduces these difficulties by providing an operator for Cassandra. It enables it to automatically perform various tasks that would otherwise need to be done manually, such as deploying a new database pod, sizing a rack, implementing backups and repairs, and monitoring the system for any discrepancies.
In addition, a cloud-native storage platform enables stateful workloads to be managed by Kubernetes with stable network identities, consistent and durable storage across multiple locations, and automatic rolling updates. In this tutorial, you will learn how to deploy a Cassandra DaemonSet using Portworx and a 3-node Kubernetes cluster deployed in AWS through RKE.
Administration
Cassandra requires a large amount of disk space to store data and perform the shuffling necessary to keep up with its partition keys. To ensure that data is being shuffled effectively, it is recommended that you keep 50% of the keyspace free at all times. Having shallow free space can cause performance issues and may even prevent the database from compacting or rebalancing.
The Cassandra operator for Kubernetes handles much of the administrative work of deploying and managing Cassandra clusters. It configures backups, monitors the Cassandra node health status, and manages automatic repairs. It also simplifies scalability, security configurations, network setup, seed nodes, and TLS certificates.
The controller interfaces with the Kubernetes master and listens to StatefulSet definitions, Pods, and CRDs changes. If a node in the cluster is decommissioned, the controller will recognize the difference and adjust the pod definitions and CRDs to reflect the smaller number of nodes.
Once a Cassandra StatefulSet has been deployed, it can be validated by checking that the pods are running and that persistent volume claims are bound to them. To do this, use Excel to examine the persistent volumes attached to the Pods. These are the Portworx volumes that the Cassandra nodes will use to store their data.