Dial C* for Operator - Creating a Cassandra Cluster with Cass Operator
In this post we are going to take a deep dive look at provisioning a Cassandra cluster using the DataStax Kubernetes operator for Cassandra, Cass Operator. We will set up a multi-rack cluster with each rack in a different availability zone.
For the examples, I will use a nine node, regional cluster in Google Kubernetes Engine (GKE) that is spread across three zones. Here is what my Kubernetes cluster looks like:
$ kubectl get nodes --label-columns failure-domain.beta.kubernetes.io/region,failure-domain.beta.kubernetes.io/zone | awk {'print $1" "$6" "$7'} | column -t
NAME REGION ZONE
gke-cass-dev-default-pool-3cab2f1f-3swp us-east1 us-east1-d
gke-cass-dev-default-pool-3cab2f1f-408v us-east1 us-east1-d
gke-cass-dev-default-pool-3cab2f1f-pv6v us-east1 us-east1-d
gke-cass-dev-default-pool-63ec3f9d-5781 us-east1 us-east1-b
gke-cass-dev-default-pool-63ec3f9d-blrh us-east1 us-east1-b
gke-cass-dev-default-pool-63ec3f9d-g4cb us-east1 us-east1-b
gke-cass-dev-default-pool-b1ee1c3c-5th7 us-east1 us-east1-c
gke-cass-dev-default-pool-b1ee1c3c-ht20 us-east1 us-east1-c
gke-cass-dev-default-pool-b1ee1c3c-xp2v us-east1 us-east1-c
Operator Concepts
Without getting into too much detail, I want to quickly cover some fundamental concepts for some of the things we will discuss in this post. Kubernetes is made up of controllers. A controller manages the state of one more Kubernetes resource types. The controller executes an infinite loop continually trying to converge the desired state of resources with their actual state. The controller watches for changes of interest in the Kubernetes cluster, i.e., a resource added, deleted, or updated. When there is a change, a key uniquely identifying the effected resource is added to a work queue. The controller eventually gets the key from the queue and begins whatever work is necessary.
Sometimes a controller has to perform potentially long-running operations like pulling an image from a remote registry. Rather than blocking until the operation completes, the controller usually requeues the key so that it can continue with other work while the operation completes in the background. When there is no more work to do for a resource, i.e. the desired state matches the actual state, the controller removes the key from the work queue.
An operator consists of one or more controllers that manage the state of one or more custom resources. Every controller has a Reconciler object that implements a reconcile loop. The reconcile loop is passed a request, which is the resource key.
A Few Words on Terminology
A Kubernetes worker node, Kubernetes worker, or worker node is a machine that runs services necessary to run and manage pods. These services include:
- kubelet
- kube-proxy
- container runtime, e.g., Docker
A Cassandra node is the process running in a container.
A Cassandra container is the container, i.e., Docker container, in which the Cassandra node is running.
A Cassandra pod is a Kubernetes pod that includes one more containers. One of those containers is running the Cassandra node.
Installing the Operator
Apply the cass-operator-manifests.yaml
manifests as follows:
$ kubectl create -f https://raw.githubusercontent.com/datastax/cass-operator/b96bfd77775b5ba909bd9172834b4a56ef15c319/docs/user/cass-operator-manifests.yaml
namespace/cass-operator created
serviceaccount/cass-operator created
secret/cass-operator-webhook-config created
customresourcedefinition.apiextensions.k8s.io/cassandradatacenters.cassandra.datastax.com created
clusterrole.rbac.authorization.k8s.io/cass-operator-cluster-role created
clusterrolebinding.rbac.authorization.k8s.io/cass-operator created
role.rbac.authorization.k8s.io/cass-operator created
rolebinding.rbac.authorization.k8s.io/cass-operator created
service/cassandradatacenter-webhook-service created
deployment.apps/cass-operator created
validatingwebhookconfiguration.admissionregistration.k8s.io/cassandradatacenter-webhook-registration created
Note: The operator is deployed in the cass-operator
namespace.
Make sure that the operator has deployed successfully. You should see output similar to this:
$ kubectl -n cass-operator get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
cass-operator 1/1 1 1 2m8s
Create a Storage Class
We need to create a StorageClass
that is suitable for Cassandra. Place the following in a file named server-storageclass.yaml
:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: server
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
replication-type: none
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
One thing thing to note here is volumeBindingMode: WaitForFirstConsumer
. The default value is Immediate
and should not be used. It can prevent Cassandra pods from being scheduled on a worker node. If a pod fails to run and its status reports a message like, had volume node affinity conflict
, then check the volumeBindingMode
of the StorageClass
being used. See Topology-Aware Volume Provisioning in Kubernetes for more details.
Create the StorageClass
with:
$ kubectl -n cass-operator apply -f server-storageclass.yaml
storageclass.storage.k8s.io/server-storage created
The Spec
Most Kubernetes resources define spec
and status
properties. The spec
declares the desired state of a resource which includes configuration settings provided by the user, default values expanded by the system, and other properties initialized by other internal components after resource creation. We will talk about the status
in a little bit.
The manifest below declares a CassandraDatacenter
custom resource. It does not include all possible properties. It includes the minimum necessary to create a multi-zone cluster.
apiVersion: cassandra.datastax.com/v1beta1
kind: CassandraDatacenter
metadata:
name: multi-rack
spec:
clusterName: multi-rack
serverType: cassandra
serverVersion: 3.11.6
managementApiAuth:
insecure: {}
size: 9
racks:
- name: us-east1-b
zone: us-east1-b
- name: us-east1-c
zone: us-east1-c
- name: us-east1-d
zone: us-east1-d
storageConfig:
cassandraDataVolumeClaimSpec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
This spec
declares a single Cassandra datacenter. Cass Operator does support multi-DC clusters. It requires creating a separate CassandraDatacenter
for each datacenter. Discussion of multi-DC clusters is outside the scope of this post.
The size
property specifies the total number of Cassandra nodes in the datacenter.
racks
is an array of Rack
objects which consist of name
and zone
properties. The zone
should be the name of a zone in GCP (or an AWS zone if the cluster was running in AWS for example). The operator will use this to pin Cassandra pods to Kubernetes workers in the zone. More on this later.
Create the CassandraDatacenter
Put the above manifest in a file named multi-rack-cassdc.yaml
and then run:
$ kubectl -n cass-operator apply -f multi-rack-cassdc.yaml
cassandradatacenter.cassandra.datastax.com/multi-rack created
This creates a CassandraDatacenter
object named multi-rack
in the Kubernetes API server. The API server provides a REST API with which clients, like kubectl
, interact. The API server maintains state in etcd. Creating a Kubernetes resource ultimately means persisting state in etcd. When the CassandraDatacenter
object is persisted, the API server notifies any clients watching for changes, namely Cass Operator. From here the operator takes over. The new object is added to the operator’s internal work queue. The job of the operator is to make sure the desired state, i.e., the spec, matches the actual state of the CassandraDatacenter
.
Now that we have created the CassandraDatacenter
, it is time to focus our attention on what Cass Operator is doing to build the Cassandra cluster.
Monitor the Progress
We will look at a couple things to monitor the progress of the provisioning or scaling up of the cluster:
- Changes in the status of the
CassandraDatacenter
- Kubernetes events emitted by the operator
We have already discussed that the spec
describes a resource’s desired state. The status
on the other hand, describes the object’s current, observed state. Earlier I mentioned that the Kubernetes API server provides a REST API to clients. A Kubernetes object or resource is a REST resource. The status of a Kubernetes resource is typically implemented as a REST subresource that can only be modified by internal, system components. In the case of a CassandraDatacenter
, Cass Operator manages the status
property.
An event is a Kubernetes resource that is created when objects like pods change state, or when an error occurs. Like other resources, events get stored in the API server. Cass Operator generates a number of events for a CassandraDatacenter
.
Understandng both the changes in a CassandraDatacenter
’s status and the events emitted by the operator provide valuable insight into what is actually happening during the provisioning process. That understanding also makes it easier to resolve issues when things go wrong. This applies not only to CassandraDatacenter
, but also to other Kubernetes resource as well.
Monitor Status Updates
We can watch for changes in the status with:
$ kubectl -n cass-operator get -w cassdc multi-rack -o yaml
In the following sections we will discuss each of the status updates that occur while the operator works to create the Cassandra cluster.
Initial Status Update
Here is what the status
looks like initially after creating the CassandraDatacenter
:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
nodeStatuses: {}
cassandraOperatorProgress
can have one of two values, Ready
or Updating
. It will change to Ready
when the operator has no more work to do for the resource. This simple detail is really important, particularly if you are performing any automation with Cass Operator. For example, I have used Cassandra operators to provision clusters for integration tests. With Cass Operator my test setup code could simply poll cassandraOperatorProgress
to know when the cluster is ready.
conditions
is an array of DatacenterCondition
objects. A lot of Kubernetes resources use conditions in their statuses. Conditions represent the latest observations of an object’s state. They should minimally include type
and status
fields. The status
field can have as its value either True
, False
, or Unknown
. lastTransitionTime
is the time the condition transitioned from one status to another. type
identifies the condition. CassandraDatacenter
currently has the following condition types:
Ready
Initialized
ReplacingNodes
ScalingUp
Updating
Stopped
Resuming
RollingRestart
Implementing, understanding, and using conditions are often points of confusion. It is intuitive to think of and model a resource’s state as a state machine. Reminding yourself conditions are observations and not a state machine will go a long way in avoiding some of that confusion. It is worth noting there has been a lot of debate in the Kubernetes community about whether conditions should be removed. Some of the latest discussions in this ticket indicate that they are will remain.
lastRollingRestart
is only updated when a rolling restart is explicitly requested. As we will see its value will remain unchanged, and therefore we will be ignoring it for this post.
nodeStatuses
is a map that provides some details for each node. We will see it get updated as nodes are deployed.
Cassandra Node Starting
With the next update we see that a lastServerNodeStarted
property has been added to the status:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:41:24Z"
nodeStatuses: {}
lastServerNodeStarted
gets updated when a Cassandra node is starting up. The operator also adds the label cassandra.datastax.com/node-state: Starting
to the Cassandra pod. The astute reader may have noted that I said the lastServerNodeStarted
is updated when Cassandra node is starting up rather than when the pod is starting up. For Cass Operator, there is a important distinction between the Cassandra node and the Cassandra container. The Cassandra Container section at the end of the post goes over this in some detail.
Cassandra Node Started
In the next update lastServerNodeStarted
is modified and another entry is added to nodeStatuses
:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:41:50Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
The entry is keyed by the pod name, multi-rack-multi-rack-us-east1-b-sts-2
. The value consists of two fields - the node’s host id and its IP address.
When Cass Operator determines the Cassandra node is up running, it updates the node-state
label to cassandra.datastax.com/node-state: Started
. After the label update, the operator uses a label selector query to see which pods have been started and are running. When the operator finds another node running, its host ID and IP address will be added to nodeStatuses
.
Remaining Nodes Started
In this section we follow the progression of the rest of the Cassandra cluster being started. lastServerNodeStarted
is changed with each of these status updates in addition to nodeStatuses
being updated.
multi-rack-multi-rack-us-east1-c-sts-0
is started:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:42:49Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
Next, multi-rack-multi-rack-us-east1-d-sts-0
is started:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:43:53Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
Next, multi-rack-multi-rack-us-east1-c-sts-2
is started:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:44:54Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-c-sts-2:
hostID: facbbaa0-ffa7-403c-b323-e83e4cab8756
nodeIP: 10.32.8.5
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
Next, multi-rack-multi-rack-us-east1-d-sts-0
is started:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:45:50Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-c-sts-2:
hostID: facbbaa0-ffa7-403c-b323-e83e4cab8756
nodeIP: 10.32.8.5
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
multi-rack-multi-rack-us-east1-d-sts-1:
hostID: 785e30ca-5772-4a57-b4bc-4bd7b3b24ebf
nodeIP: 10.32.3.3
With five out of the nine nodes started, now is a good time point out a couple things. First, we see one node at a time is added to nodeStatuses
. Based on this it stands to reason Cass Operator is starting nodes serially. That is precisely what is happening.
Secondly, there is roughly a minute difference between the values of lastServerNodeStarted
in each status update. It is taking about a minute or so to start each node, which means it should take somewhere between nine and ten minutes for the cluster to be ready. These times will almost certainly vary depending on a number of factors like the type of disks used, the machine type, etc. It is helpful though, particularly for larger cluster sizes, to be able to gauge how long it will take to get the entire cluster up and running.
Next, multi-rack-multi-rack-us-east1-d-sts-2
is started:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:46:51Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-c-sts-2:
hostID: facbbaa0-ffa7-403c-b323-e83e4cab8756
nodeIP: 10.32.8.5
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
multi-rack-multi-rack-us-east1-d-sts-1:
hostID: 785e30ca-5772-4a57-b4bc-4bd7b3b24ebf
nodeIP: 10.32.3.3
multi-rack-multi-rack-us-east1-d-sts-2:
hostID: 8e8733ab-6f7b-4102-946d-c855adaabe49
nodeIP: 10.32.5.4
Next, multi-rack-multi-rack-us-east1-b-sts-
is started:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:48:00Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-0:
hostID: 3b1b60e0-62c6-47fb-93ff-3d164825035a
nodeIP: 10.32.1.4
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-c-sts-2:
hostID: facbbaa0-ffa7-403c-b323-e83e4cab8756
nodeIP: 10.32.8.5
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
multi-rack-multi-rack-us-east1-d-sts-1:
hostID: 785e30ca-5772-4a57-b4bc-4bd7b3b24ebf
nodeIP: 10.32.3.3
multi-rack-multi-rack-us-east1-d-sts-2:
hostID: 8e8733ab-6f7b-4102-946d-c855adaabe49
nodeIP: 10.32.5.4
Next, multi-rack-multi-rack-us-east1-c-sts-1
is started:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:48:57Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-0:
hostID: 3b1b60e0-62c6-47fb-93ff-3d164825035a
nodeIP: 10.32.1.4
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-c-sts-1:
hostID: a55082ba-0692-4ee9-97a2-a1bb16383d31
nodeIP: 10.32.7.6
multi-rack-multi-rack-us-east1-c-sts-2:
hostID: facbbaa0-ffa7-403c-b323-e83e4cab8756
nodeIP: 10.32.8.5
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
multi-rack-multi-rack-us-east1-d-sts-1:
hostID: 785e30ca-5772-4a57-b4bc-4bd7b3b24ebf
nodeIP: 10.32.3.3
multi-rack-multi-rack-us-east1-d-sts-2:
hostID: 8e8733ab-6f7b-4102-946d-c855adaabe49
nodeIP: 10.32.5.4
Finally, multi-rack-multi-rack-us-east1-b-sts-1
is started:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:48:57Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-0:
hostID: 3b1b60e0-62c6-47fb-93ff-3d164825035a
nodeIP: 10.32.1.4
multi-rack-multi-rack-us-east1-b-sts-1:
hostID: d7246bca-ae64-45ec-8533-7c3a2540b5ef
nodeIP: 10.32.2.6
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-c-sts-1:
hostID: a55082ba-0692-4ee9-97a2-a1bb16383d31
nodeIP: 10.32.7.6
multi-rack-multi-rack-us-east1-c-sts-2:
hostID: facbbaa0-ffa7-403c-b323-e83e4cab8756
nodeIP: 10.32.8.5
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
multi-rack-multi-rack-us-east1-d-sts-1:
hostID: 785e30ca-5772-4a57-b4bc-4bd7b3b24ebf
nodeIP: 10.32.3.3
multi-rack-multi-rack-us-east1-d-sts-2:
hostID: 8e8733ab-6f7b-4102-946d-c855adaabe49
nodeIP: 10.32.5.4
Although all nine nodes are now started, the operator still has more work to do. This is evident based on the ScalingUp
condition still being True
and cassandraOperatorProgress
still having a value of Updating
.
Cassandra Super User Created
With the next update the superUserUpserted
property is added to the status:
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:40:51Z"
status: "True"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:48:57Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-0:
hostID: 3b1b60e0-62c6-47fb-93ff-3d164825035a
nodeIP: 10.32.1.4
multi-rack-multi-rack-us-east1-b-sts-1:
hostID: d7246bca-ae64-45ec-8533-7c3a2540b5ef
nodeIP: 10.32.2.6
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-c-sts-1:
hostID: a55082ba-0692-4ee9-97a2-a1bb16383d31
nodeIP: 10.32.7.6
multi-rack-multi-rack-us-east1-c-sts-2:
hostID: facbbaa0-ffa7-403c-b323-e83e4cab8756
nodeIP: 10.32.8.5
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
multi-rack-multi-rack-us-east1-d-sts-1:
hostID: 785e30ca-5772-4a57-b4bc-4bd7b3b24ebf
nodeIP: 10.32.3.3
multi-rack-multi-rack-us-east1-d-sts-2:
hostID: 8e8733ab-6f7b-4102-946d-c855adaabe49
nodeIP: 10.32.5.4
superUserUpserted: "2020-05-06T16:49:55Z"
superUserUpserted
is the timestamp at which the operator creates a super user in Cassandra. We will explore this in a little more detail when we go through the events.
ScalingUp Transition
In this update the ScalingUp
condition transitions to False
. This condition changes only after all nodes have been started and after the super user has been created.
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:49:55Z"
status: "False"
type: ScalingUp
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:48:57Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-0:
hostID: 3b1b60e0-62c6-47fb-93ff-3d164825035a
nodeIP: 10.32.1.4
multi-rack-multi-rack-us-east1-b-sts-1:
hostID: d7246bca-ae64-45ec-8533-7c3a2540b5ef
nodeIP: 10.32.2.6
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-c-sts-1:
hostID: a55082ba-0692-4ee9-97a2-a1bb16383d31
nodeIP: 10.32.7.6
multi-rack-multi-rack-us-east1-c-sts-2:
hostID: facbbaa0-ffa7-403c-b323-e83e4cab8756
nodeIP: 10.32.8.5
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
multi-rack-multi-rack-us-east1-d-sts-1:
hostID: 785e30ca-5772-4a57-b4bc-4bd7b3b24ebf
nodeIP: 10.32.3.3
multi-rack-multi-rack-us-east1-d-sts-2:
hostID: 8e8733ab-6f7b-4102-946d-c855adaabe49
nodeIP: 10.32.5.4
superUserUpserted: "2020-05-06T16:49:55Z"
Add Initialized and Ready Conditions
Next, the operator adds the Initialized
and Ready
conditions to the status. Initialized
means the CassandraDatacenter
was successfully created. The transition for this condition should only happen once. Ready
means the cluster can start serving client requests. The Ready
condition will remain True
during a rolling restart for example but will transition to False
when all nodes are stopped. See The Cassandra Container section at the end of the post for more details on starting and stopping Cassandra nodes.
status:
cassandraOperatorProgress: Updating
conditions:
- lastTransitionTime: "2020-05-06T16:49:55Z"
status: "False"
type: ScalingUp
- lastTransitionTime: "2020-05-06T16:49:55Z"
status: "True"
type: Initialized
- lastTransitionTime: "2020-05-06T16:49:55Z"
status: "True"
type: Ready
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:48:57Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-0:
hostID: 3b1b60e0-62c6-47fb-93ff-3d164825035a
nodeIP: 10.32.1.4
multi-rack-multi-rack-us-east1-b-sts-1:
hostID: d7246bca-ae64-45ec-8533-7c3a2540b5ef
nodeIP: 10.32.2.6
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-c-sts-1:
hostID: a55082ba-0692-4ee9-97a2-a1bb16383d31
nodeIP: 10.32.7.6
multi-rack-multi-rack-us-east1-c-sts-2:
hostID: facbbaa0-ffa7-403c-b323-e83e4cab8756
nodeIP: 10.32.8.5
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
multi-rack-multi-rack-us-east1-d-sts-1:
hostID: 785e30ca-5772-4a57-b4bc-4bd7b3b24ebf
nodeIP: 10.32.3.3
multi-rack-multi-rack-us-east1-d-sts-2:
hostID: 8e8733ab-6f7b-4102-946d-c855adaabe49
nodeIP: 10.32.5.4
superUserUpserted: "2020-05-06T16:49:55Z"
Final Status Update
In the last update, the value of cassandraOperatorProgress
is changed to Ready
:
status:
cassandraOperatorProgress: Ready
conditions:
- lastTransitionTime: "2020-05-06T16:49:55Z"
status: "False"
type: ScalingUp
- lastTransitionTime: "2020-05-06T16:49:55Z"
status: "True"
type: Initialized
- lastTransitionTime: "2020-05-06T16:49:55Z"
status: "True"
type: Ready
lastRollingRestart: "2020-05-06T16:40:51Z"
lastServerNodeStarted: "2020-05-06T16:48:57Z"
nodeStatuses:
multi-rack-multi-rack-us-east1-b-sts-0:
hostID: 3b1b60e0-62c6-47fb-93ff-3d164825035a
nodeIP: 10.32.1.4
multi-rack-multi-rack-us-east1-b-sts-1:
hostID: d7246bca-ae64-45ec-8533-7c3a2540b5ef
nodeIP: 10.32.2.6
multi-rack-multi-rack-us-east1-b-sts-2:
hostID: 62399b3b-80f0-42f2-9930-6c4f2477c9bd
nodeIP: 10.32.0.5
multi-rack-multi-rack-us-east1-c-sts-0:
hostID: dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76
nodeIP: 10.32.6.3
multi-rack-multi-rack-us-east1-c-sts-1:
hostID: a55082ba-0692-4ee9-97a2-a1bb16383d31
nodeIP: 10.32.7.6
multi-rack-multi-rack-us-east1-c-sts-2:
hostID: facbbaa0-ffa7-403c-b323-e83e4cab8756
nodeIP: 10.32.8.5
multi-rack-multi-rack-us-east1-d-sts-0:
hostID: c7e43757-92ee-4ca3-adaa-46a128045d4d
nodeIP: 10.32.4.4
multi-rack-multi-rack-us-east1-d-sts-1:
hostID: 785e30ca-5772-4a57-b4bc-4bd7b3b24ebf
nodeIP: 10.32.3.3
multi-rack-multi-rack-us-east1-d-sts-2:
hostID: 8e8733ab-6f7b-4102-946d-c855adaabe49
nodeIP: 10.32.5.4
superUserUpserted: "2020-05-06T16:49:55Z"
We now know the operator has completed its work to scale up the cluster. We also know the cluster is initialized and ready for use. Let’s verify the desired state of the CassandraDatacenter
matches actual state. We can do this with nodetool status
and kubectl get nodes
.
$ kubectl -n cass-operator exec -it multi-rack-multi-rack-us-east1-b-sts-0 -c cassandra -- nodetool status
Datacenter: multi-rack
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.32.4.4 84.43 KiB 1 4.8% c7e43757-92ee-4ca3-adaa-46a128045d4d us-east1-d
UN 10.32.1.4 70.2 KiB 1 7.4% 3b1b60e0-62c6-47fb-93ff-3d164825035a us-east1-b
UN 10.32.6.3 65.36 KiB 1 32.5% dfd6ebfb-2e2c-4f7a-92f8-9fe60fb24e76 us-east1-c
UN 10.32.3.3 103.54 KiB 1 34.0% 785e30ca-5772-4a57-b4bc-4bd7b3b24ebf us-east1-d
UN 10.32.7.6 70.34 KiB 1 18.1% a55082ba-0692-4ee9-97a2-a1bb16383d31 us-east1-c
UN 10.32.8.5 65.36 KiB 1 19.8% facbbaa0-ffa7-403c-b323-e83e4cab8756 us-east1-c
UN 10.32.2.6 65.36 KiB 1 36.5% d7246bca-ae64-45ec-8533-7c3a2540b5ef us-east1-b
UN 10.32.0.5 65.36 KiB 1 39.9% 62399b3b-80f0-42f2-9930-6c4f2477c9bd us-east1-b
UN 10.32.5.4 65.36 KiB 1 7.0% 8e8733ab-6f7b-4102-946d-c855adaabe49 us-east1-d
nodetool status
reports nine nodes up across three racks. That looks good. Now let’s verify the pods are running where we expect them to be.
$ kubectl -n cass-operator get pods -l "cassandra.datastax.com/cluster=multi-rack" -o wide | awk {'print $1" "$7'} | column -t
NAME NODE
multi-rack-multi-rack-us-east1-b-sts-0 gke-cass-dev-default-pool-63ec3f9d-5781
multi-rack-multi-rack-us-east1-b-sts-1 gke-cass-dev-default-pool-63ec3f9d-blrh
multi-rack-multi-rack-us-east1-b-sts-2 gke-cass-dev-default-pool-63ec3f9d-g4cb
multi-rack-multi-rack-us-east1-c-sts-0 gke-cass-dev-default-pool-b1ee1c3c-5th7
multi-rack-multi-rack-us-east1-c-sts-1 gke-cass-dev-default-pool-b1ee1c3c-ht20
multi-rack-multi-rack-us-east1-c-sts-2 gke-cass-dev-default-pool-b1ee1c3c-xp2v
multi-rack-multi-rack-us-east1-d-sts-0 gke-cass-dev-default-pool-3cab2f1f-3swp
multi-rack-multi-rack-us-east1-d-sts-1 gke-cass-dev-default-pool-3cab2f1f-408v
multi-rack-multi-rack-us-east1-d-sts-2 gke-cass-dev-default-pool-3cab2f1f-pv6v
Look carefully at the output, and you will see each pod is in fact running on a separate worker node. Furthermore, the pods are running on worker nodes in the expected zones.
Monitor Events
The operator reports a number of events useful for monitoring and debugging the provisioning process. As we will see, the events provide additional insights absent from the status updates alone.
There are some nuances with events that can make working with them a bit difficult. First, events are persisted with a TTL. They expire after one hour. Secondly, events can be listed out of order. The ordering appears to be done on the client side with a sort on the Age
column. We will go through the events in the order in which the operator generates them. Lastly, while working on this post, I discovered that some events can get dropped. I created this ticket to investigate the issue. Kubernetes has in place some throttling mechanisms to prevent the system from getting overwhelmed by too many events. We won’t go through every single event as there are a lot. We will however cover enough, including some that may be dropped, in order to get an overall sense of what is going on.
We can list all of the events for the CassandraDatacenter
with the describe
command as follows:
$ kubectl -n cass-operator describe cassdc multi-rack
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingUpRack 12m cass-operator Scaling up rack us-east1-b
Normal CreatedResource 12m cass-operator Created service multi-rack-seed-service
Normal CreatedResource 12m cass-operator Created service multi-rack-multi-rack-all-pods-service
Normal CreatedResource 12m cass-operator Created statefulset multi-rack-multi-rack-us-east1-b-sts
Normal CreatedResource 12m cass-operator Created statefulset multi-rack-multi-rack-us-east1-c-sts
Normal CreatedResource 12m cass-operator Created statefulset multi-rack-multi-rack-us-east1-d-sts
Normal CreatedResource 12m cass-operator Created service multi-rack-multi-rack-service
Normal ScalingUpRack 12m cass-operator Scaling up rack us-east1-c
Normal ScalingUpRack 12m cass-operator Scaling up rack us-east1-d
Normal LabeledPodAsSeed 12m cass-operator Labeled pod a seed node multi-rack-multi-rack-us-east1-b-sts-2
Normal StartingCassandra 12m cass-operator Starting Cassandra for pod multi-rack-multi-rack-us-east1-b-sts-2
Normal StartedCassandra 11m cass-operator Started Cassandra for pod multi-rack-multi-rack-us-east1-b-sts-2
Normal StartingCassandra 11m cass-operator Starting Cassandra for pod multi-rack-multi-rack-us-east1-c-sts-0
Normal StartingCassandra 10m cass-operator Starting Cassandra for pod multi-rack-multi-rack-us-east1-d-sts-0
Normal StartedCassandra 10m cass-operator Started Cassandra for pod multi-rack-multi-rack-us-east1-c-sts-0
Normal LabeledPodAsSeed 10m cass-operator Labeled as seed node pod multi-rack-multi-rack-us-east1-c-sts-0
Normal LabeledPodAsSeed 9m44s cass-operator Labeled as seed node pod multi-rack-multi-rack-us-east1-d-sts-0
Normal StartedCassandra 9m43s cass-operator Started Cassandra for pod multi-rack-multi-rack-us-east1-d-sts-0
Normal StartingCassandra 9m43s cass-operator Starting Cassandra for pod multi-rack-multi-rack-us-east1-c-sts-2
Normal StartedCassandra 8m43s cass-operator Started Cassandra for pod multi-rack-multi-rack-us-east1-c-sts-2
Normal StartingCassandra 8m43s cass-operator Starting Cassandra for pod multi-rack-multi-rack-us-east1-d-sts-1
Normal StartedCassandra 7m47s cass-operator Started Cassandra for pod multi-rack-multi-rack-us-east1-d-sts-1
Normal StartingCassandra 7m46s cass-operator Starting Cassandra for pod multi-rack-multi-rack-us-east1-d-sts-2
Normal StartedCassandra 6m45s cass-operator Started Cassandra for pod multi-rack-multi-rack-us-east1-d-sts-2
Normal StartingCassandra 6m45s cass-operator Starting Cassandra for pod multi-rack-multi-rack-us-east1-b-sts-0
Normal LabeledPodAsSeed 5m36s cass-operator Labeled as seed node pod multi-rack-multi-rack-us-east1-b-sts-0
In the following sections we will go through several of these events as well as some that are missing.
Create Headless Services
The first thing that Cass Operator does during the initial reconciliation loop is create a few headless services:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreatedResource 10m cass-operator Created service multi-rack-seed-service
Normal CreatedResource 10m cass-operator Created service multi-rack-multi-rack-all-pods-service
Normal CreatedResource 10m cass-operator Created service multi-rack-multi-rack-service
multi-rack-seed-service
exposes all pods running seed nodes. This service is used by Cassandra to configure seed nodes.
multi-rack-multi-rack-all-pods-service
exposes all pods that are part of the CassandraDatacenter
, regardless of whether they are ready. It is used to scrape metrics with Prometheus.
multi-rack-multi-rack-service
exposes ready pods. CQL clients should use this service to establish connections to the cluster.
Create StatefulSets
Next the operator creates three StatefulSets, one for each rack:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreatedResource 12m cass-operator Created statefulset multi-rack-multi-rack-us-east1-b-sts
Normal CreatedResource 12m cass-operator Created statefulset multi-rack-multi-rack-us-east1-c-sts
Normal CreatedResource 12m cass-operator Created statefulset multi-rack-multi-rack-us-east1-d-sts
I mentioned earlier the operator will use the zone
property specified for each rack to pin pods to Kubernetes workers in the respective zones. The operator uses affinity rules to accomplish this.
Let’s take a look at the spec for multi-rack-multi-rack-us-east1-c-sts
to see how this is accomplished:
$ kubectl -n cass-operator get sts multi-rack-multi-rack-us-east1-c-sts -o yaml
...
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: failure-domain.beta.kubernetes.io/zone
operator: In
values:
- us-east1-c
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: cassandra.datastax.com/cluster
operator: Exists
- key: cassandra.datastax.com/datacenter
operator: Exists
- key: cassandra.datastax.com/rack
operator: Exists
topologyKey: kubernetes.io/hostname
...
The nodeAffinity
property constrains the worker nodes on which pods in the StatefulSet can be scheduled. requiredDuringSchedulingIgnoredDuringExecution
is a NodeSelector
which basically declares a query based on labels. In this case, if a node has the label failure-domain.beta.kubernetes.io/zone
with a value of us-east1-c
, then pods can be scheduled on that node.
Note: failure-domain.beta.kubernetes.io/zone
is one of a number of well known labels that are used by the Kubernetes runtime.
I added emphasis to can be because of the podAntiAffinity
property that is declared. It constrains the worker nodes on which the pods can be scheduled based on the labels of pods currently running on the nodes. The requiredDuringSchedulingIgnoredDuringExecution
property is a PodAffinityTerm
that defines labels that determine which pods cannot be co-located on a particular host. In short, this prevents pods from being scheduled on any node on which pods from a CassandraDatacenter
are already running. In other words, no two Cassandra nodes should be running on the same Kubernetes worker node.
Note: You can run multiple Cassandra pods on a single worker node by setting .spec.allowMultipleNodesPerWorker
to true
.
Scale up the Racks
The next events involve scaling up the racks:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingUpRack 12m cass-operator Scaling up rack us-east1-b
Normal ScalingUpRack 12m cass-operator Scaling up rack us-east1-c
Normal ScalingUpRack 12m cass-operator Scaling up rack us-east1-d
The StatefulSets are initially created with zero replicas. They are subsequently scaled up to the desired replica count, which is three (per StatefulSet) in this case.
Label the First Seed Node Pod
After the StatefulSet controller starts creating pods, Cass Operator applies the following label to a pod to designate it as a Cassandra seed node:
cassandra.datastax.com/seed-node: "true"
At this stage in the provisioning process, no pods have the seed-node
label. The following event indicates that the operator designates the pod to be a seed node:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal LabeledPodAsSeed 12m cass-operator Labeled pod a seed node multi-rack-multi-rack-us-east1-b-sts-2
Note: You can use a label selector to query for all seed node pods, e.g., kubectl -n cass-operator get pods -l cassandra.datastax.com/seed-node="true"
.
Start the First Seed Node
Next the operator starts the first seed node:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal StartingCassandra 12m cass-operator Starting Cassandra for pod multi-rack-multi-rack-us-east1-b-sts-2
The operator applies the label cassandra.datastax.com/node-state: Starting
to the pod. The operator then requeues the request with a short delay, allowing time for the Cassandra node to start. Requeuing the request ends the current reconciliation.
If you are familiar with Kubernetes, this step of starting the Cassandra node may seem counter-intuitive because pods/containers cannot exist in a stopped state. See The Cassandra Container section at the end of the post for more information.
Update Status of First Seed Node Pod
In a subsequent reconciliation loop the operator finds that multi-rack-multi-rack-us-east1-b-sts-2
has been started and records the following event:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal StartedCassandra 11m cass-operator Started Cassandra for pod multi-rack-multi-rack-us-east1-b-sts-2
Then the cassandra.datastax.com/node-state
label is updated to a value of Started
to indicate the Cassandra node is now running. The event is recorded and the labeled is updated only when the Cassandra container’s readiness probe passes. If the readiness probe fails, the operator will requeue the request, ending the current reconciliation loop.
Start One Node Per Rack
After the first node, multi-rack-multi-rack-us-east1-b-sts-2
, is running, the operator makes sure there is a node per rack running. Here is the sequence of events for a given node:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal StartingCassandra 8m43s cass-operator Starting Cassandra for pod multi-rack-multi-rack-us-east1-d-sts-1
Normal StartedCassandra 7m47s cass-operator Started Cassandra for pod multi-rack-multi-rack-us-east1-d-sts-1
Normal LabeledPodAsSeed 9m44s cass-operator Labeled as seed node pod multi-rack-multi-rack-us-east1-d-sts-1
Let’s break down what is happening here.
- The
cassandra.datastax.com/node-state: Starting
label is applied tomulti-rack-multi-rack-us-east1-d-sts-1
- Cassandra is started
- The request is requeued
- On a subsequent reconciliation loop when Cassandra is running (as determined by the readiness probe), two things happen
- The
cassandra.datastax.com/seed-node="true"
label is applied to the pod, making it a seed node - The
cassandra.datastax.com/node-state
label is updated to a value ofStarted
- The
The operator will repeat this process for another rack which does not yet have a node running.
Now is a good time to discuss how the operator determines how many seeds there should be in total for the datacenter as well as how many seeds there should be per rack.
If the datacenter consists of only one or two nodes, then there will be one or two seeds respectively. If there are more than three racks, then the number of seeds will be set to the number of racks. If neither of those conditions hold, then there will be three seeds.
The seeds per rack are calculated as follows:
seedsPerRack = totalSeedCount / numRacks
extraSeeds = totalSeedCount % numRacks
For the example cluster in this post totalSeedCount
will be three. Then seedsPersRack
will be one, and extraSeeds
will be zero.
Start Remaining Nodes
After we have a Cassandra node up and running in each rack, the operator proceeds to start the remaining non-seed nodes. I will skip over listing events here because they are the same as the previous ones. At this point the operator iterates over the pods without worrying about the racks. For each pod in which Cassandra is not already running, it will start Cassandra following the same process previously described.
Create a PodDisruptionBudget
After all Cassandra nodes have been started, Cass Operator creates a PodDisruptionBudget. It generates an event like this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreatedResource 10m6s cass-operator Created PodDisruptionBudget multi-rack-pdb
Note: This is one of the dropped events.
A PodDisruptionBudget limits the number of pods that can be down from a voluntary disruption. Examples of voluntary disruptions include accidentally deleting a pod or draining a worker node for upgrade or repair.
All Cassandra pods in the CassandraDatacenter
are managed by the disruption budget. When creating the PodDisruptionBudget, Cass Operator sets the .spec.minAvailable
property. This specifies the number of pods that must be available after a pod eviction. Cass Operator sets this to the total number of Cassandra nodes minus one.
Create a Cassandra Super User
The final thing that Cass Operator does is to create a super user in Cassandra:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreatedSuperuser 10m6s cass-operator Created superuser
Earlier in the provisioning process, Cass Operator creates the super user credentials and stores them in a secret. The secret name can be specified by setting .spec.superuserSecretName
.
The username is set <.spec.clusterName>-superuser
which will be multi-rack-superuser
for our example. The password is a random UTF-8 string less than or equal to 55 characters.
Note: Cass Operator disables the default super user, cassandra
.
The Cassandra Container
Each Cassandra pod runs a container named cassandra
. We need to talk about sidecars before we talk about the cassandra
container. The sidecar pattern is a very well-known and used architectural pattern in Kubernetes. A pod consists of one or more containers. The containers in a pod share the same volume and network interfaces. Examples for sidecars include things like log aggregation, gRPC proxy, and backup / restore to name a few. Cass Operator utilizes the sidecar pattern but in a more unconventional manner.
We can take a look at the spec of one of the Cassandra pods to learn more about the cassandra
container. Because we are only focused on this one part, most of the output is omitted.
$ kubectl -n cass-operator get pod multi-rack-multi-rack-us-east1-b-sts-0 -o yaml
apiVersion: v1
kind: Pod
...
spec:
...
containers:
- env:
- name: DS_LICENSE
value: accept
- name: DSE_AUTO_CONF_OFF
value: all
- name: USE_MGMT_API
value: "true"
- name: MGMT_API_EXPLICIT_START
value: "true"
- name: DSE_MGMT_EXPLICIT_START
value: "true"
image: datastax/cassandra-mgmtapi-3_11_6:v0.1.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /api/v0/probes/liveness
port: 8080
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 1
name: cassandra
ports:
- containerPort: 9042
name: native
protocol: TCP
- containerPort: 8609
name: inter-node-msg
protocol: TCP
- containerPort: 7000
name: intra-node
protocol: TCP
- containerPort: 7001
name: tls-intra-node
protocol: TCP
- containerPort: 8080
name: mgmt-api-http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /api/v0/probes/readiness
port: 8080
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /config
name: server-config
- mountPath: /var/log/cassandra
name: server-logs
- mountPath: /var/lib/cassandra
name: server-data
...
There are two lines in the output on which we want to focus. The first line is:
name: cassandra
This is the name of the container. There are other containers listed in the output, but we are only concerned with the cassandra
one.
The second line that we are interested in is:
image: datastax/cassandra-mgmtapi-3_11_6:v0.1.0
The image
property specifies the image that the cassandra
container is running. This is different from the Cassandra images such as the ones found on Docker Hub. This image is for the Management API Sidecar. There have been lots of discussions on the Cassandra community mailing lists about management sidecars. In fact there is even a Cassandra Enhancement Proposal (CEP) for providing an official, community based sidecar. The Management API Sidecar, or management sidecar for short, was not designed specifically for Kubernetes.
The process started in the cassandra container is the management sidecar rather than the CassandraDaemon process. The sidecar is responsible for starting/stopping the node. In addition to providing lifecycle management, the sidecar also provides configuration management, health checks, and per node actions (i.e., nodetool
).
There is plenty to more to say about the management sidecar, but that is for another post.
Wrapping Up
Hopefully this post gives you a better understanding of Cass Operator and Kubernetes in general. While we covered a lot of ground, there is plenty more to discuss like multi-DC clusters and the management sidecar. If you want to hear more about Cassandra and Kubernetes, Patrick McFadin put together a series of interviews where he talks to early adopters in the field. Check out “Why Tomorrow’s Cassandra Deployments Will Be on Kubernetes” It will be available for streams as a part of the DataStax Accelerate online conference https://dtsx.io/3ex1Eop.