Cassandra Certificate Management Part 1 - How to Rotate Keys Without Downtime

Welcome to this three part blog series where we dive into the management of certificates in an Apache Cassandra cluster. For this first post in the series we will focus on how to rotate keys in an Apache Cassandra cluster without downtime.

Usability and Security at Odds

If you have downloaded and installed a vanilla installation of Apache Cassandra, you may have noticed when it is first started all security is disabled. Your “Hello World” application works out of the box because the Cassandra project chose usability over security. This is deliberately done so everyone benefits from the usability, as security requirement for each deployment differ. While only some deployments require multiple layers of security, others require no security features to be enabled.

Security of a system is applied in layers. For example one layer is isolating the nodes in a cluster behind a proxy. Another layer is locking down OS permissions. Encrypting connections between nodes, and between nodes and the application is another layer that can be applied. If this is the only layer applied, it leaves other areas of a system insecure. When securing a Cassandra cluster, we recommend pursuing an informed approach which offers defence-in-depth. Consider additional aspects such as encryption at rest (e.g. disk encryption), authorization, authentication, network architecture, and hardware, host and OS security.

Encrypting connections between two hosts can be difficult to set up as it involves a number of tools and commands to generate the necessary assets for the first time. We covered this process in previous posts: Hardening Cassandra Step by Step - Part 1 Inter-Node Encryption and Hardening Cassandra Step by Step - Part 2 Hostname Verification for Internode Encryption. I recommend reading both posts before reading through the rest of the series, as we will build off concepts explained in them.

Here is a quick summary of the basic steps to create the assets necessary to encrypt connections between two hosts.

  1. Create the Root Certificate Authority (CA) key pair from a configuration file using openssl.
  2. Create a keystore for each host (node or client) using keytool.
  3. Export the Public Certificate from each host keystore as a “Signing Request” using keytool.
  4. Sign each Public Certificate “Signing Request” with our Root CA to generate a Signed Certificate using openssl.
  5. Import the Root CA Public Certificate and the Signed Certificate into each keystore using keytool.
  6. Create a common truststore and import the CA Public Certificate into it using keytool.

Security Requires Ongoing Maintenance

Setting up SSL encryption for the various connections to Cassandra is only half the story. Like all other software out in the wild, there are ongoing maintenance to ensure the SSL encrypted connections continue to work.

At some point you wil need to update the certificates and stores used to implement the SSL encrypted connections because they will expire. If the certificates for a node expire it will be unable to communicate with other nodes in the cluster. This will lead to at least data inconsistencies or, in the worst case, unavailable data.

This point is specifically called out towards the end of the Inter-Node Encryption blog post. The note refers to steps 1, 2 and 4 in the above summary of commands to set up the certificates and stores. The validity periods are set for the certificates and stores in their respective steps.

One Certificate Authority to Rule Them All

Before we jump into how we handle expiring certificates and stores in a cluster, we first need to understand the role a certificate plays in securing a connection.

Certificates (and encryption) are often considered a hard topic. However, there are only a few concepts that you need to bear in mind when managing certificates.

Consider the case where two parties A and B wish to communicate with one another. Both parties distrust each other and each needs a way to prove that they are who they claim to be, as well as verify the other party is who they claim to be. To do this a mutually trusted third party needs to be brought in. In our case the trusted third party is the Certificate Authority (CA); often referred to as the Root CA.

The Root CA is effectively just a key pair; similar to an SSH key pair. The main difference is the public portion of the key pair has additional fields detailing who the public key belongs to. It has the following two components.

  • Certificate Authority Private Signing Key (CA PSK) - Private component of the CA key pair. Used to sign a keystore’s public certificate.
  • Certificate Authority Public Certificate (CA PC) - Public component of the CA key pair. Used to provide the issuer name when signing a keystore’s public certificate, as well as by a node to confirm that a third party public certificate (when presented) has been signed by the Root CA PSK.

When you run openssl to create your CA key pair using a certificate configuration file, this is the command that is run.

$ openssl req \
      -config path/to/ca_certificate.config \
      -new \
      -x509 \
      -keyout path/to/ca_psk \
      -out path/to/ca_pc \
      -days <valid_days>

In the above command the -keyout specifies the path to the CA PSK, and the -out specifies the path to the CA PC.

And in the Darkness Sign Them

In addition to a common Root CA key pair, each party has its own certificate key pair to uniquely identify it and to encrypt communications. In the Cassandra world, two components are used to store the information needed to perform the above verification check and communication encryption; the keystore and the truststore.

The keystore contains a key pair which is made up of the following two components.

  • Keystore Private Signing Key (KS PSK) - Hidden in keystore. Used to sign messages sent by the node, and decrypt messages received by the node.
  • Keystore Public Certificate (KS PC) - Exported for signing by the Root CA. Used by a third party to encrypt messages sent to the node that owns this keystore.

When created, the keystore will contain the PC, and the PSK. The PC signed by the Root CA, and the CA PC are added to the keystore in subsequent operations to complete the trust chain. The certificates are always public and are presented to other parties, while PSK always remains secret. In an asymmetric/public key encryption system, messages can be encrypted with the PC but can only be decrypted using the PSK. In this way, a node can initiate encrypted communications without needing to share a secret.

The truststore stores one or more CA PCs of the parties which the node has chosen to trust, since they are the source of trust for the cluster. If a party tries to communicate with the node, it will refer to its truststore to see if it can validate the attempted communication using a CA PC that it knows about.

For a node’s KS PC to be trusted and verified by another node using the CA PC in the truststore, the KS PC needs to be signed by the Root CA key pair. Futhermore, the CA key pair is used to sign the KS PC of each party.

When you run openssl to sign an exported Keystore PC, this is the command that is run.

$ openssl x509 \
    -req \
    -CAkey path/to/ca_psk \
    -CA path/to/ca_pc \
    -in path/to/exported_ks_pc_sign_request \
    -out paht/to/signed_ks_pc \
    -days <valid_days> \
    -CAcreateserial \
    -passin pass:<ca_psk_password>

In the above command both the Root CA PSK and CA PC are used via -CAkey and -CA respectively when signing the KS PC.

More Than One Way to Secure a Connection

Now that we have a deeper understanding of the assets that are used to encrypt communications, we can examine various ways to implement it. There are multiple ways to implement SSL encryption in an Apache Cassandra cluster. Regardless of the encryption approach, the objective when applying this type of security to a cluster is to ensure;

  • Hosts (nodes or clients) can determine whether they should trust other hosts in cluster.
  • Any intercepted communication between two hosts is indecipherable.

The three most common methods vary in both ease of deployment and resulting level of security. They are as follows.

The Cheats Way

The easiest and least secure method for rolling out SSL encryption can be done in the following way

Generation

  • Single CA for the cluster.
  • Single truststore containing the CA PC.
  • Single keystore which has been signed by the CA.

Deployment

  • The same keystore and truststore are deployed to each node.

In this method a single Root CA and a single keystore is deployed to all nodes in the cluster. This means any node can decipher communications intended for any other node. If a bad actor gains control of a node in the cluster then they will be able to impersonate any other node. That is, compromise of one host will compromise all of them. Depending on your threat model, this approach can be better than no encryption at all. It will ensure that a bad actor with access to only the network will no longer be able to eavesdrop on traffic.

We would use this method as a stop gap to get internode encryption enabled in a cluster. The idea would be to quickly deploy internode encryption with the view of updating the deployment in the near future to be more secure.

Best Bang for Buck

Arguably the most popular and well documented method for rolling out SSL encryption is

Generation

  • Single CA for the cluster.
  • Single truststore containing the CA PC.
  • Unique keystore for each node all of which have been signed by the CA.

Deployment

  • Each keystore is deployed to its associated node.
  • The same truststore is deployed to each node.

Similar to the previous method, this method uses a cluster wide CA. However, unlike the previous method each node will have its own keystore. Each keystore has its own certificate that is signed by a Root CA common to all nodes. The process to generate and deploy the keystores in this way is practiced widely and well documented.

We would use this method as it provides better security over the previous method. Each keystore can have its own password and host verification, which further enhances the security that can be applied.

Fort Knox

The method that offers the strongest security of the three can be rolled out in following way

Generation

  • Unique CA for each node.
  • A single truststore containing the Public Certificate for each of the CAs.
  • Unique keystore for each node that has been signed by the CA specific to the node.

Deployment

  • Each keystore with its unique CA PC is deployed to its associated node.
  • The same truststore is deployed to each node.

Unlike the other two methods, this one uses a Root CA per host and similar to the previous method, each node will have its own keystore. Each keystore has its own PC that is signed by a Root CA unique to the node. The Root CA PC of each node needs to be added to the truststore that is deployed to all nodes. For large cluster deployments this encryption configuration is cumbersome and will result in a large truststore being generated. Deployments of this encryption configuration are less common in the wild.

We would use this method as it provides all the advantages of the previous method and in addition, provides the ability to isolate a node from the cluster. This can be done by simply rolling out a new truststore which excludes a specific node’s CA PC. In this way a compromised node could be isolated from the cluster by simply changing the truststore. Under the previous two approaches, isolation of a compromised node in this fashion would require a rollout of an entirely new Root CA and one or more new keystores. Furthermore, each new Keystore CA would need to be signed by the new Root CA.

WARNING: Ensure your Certificate Authority is secure!

Regardless of the deployment method chosen, the whole setup will depend on the security of the Root CA. Ideally both components should be secured, or at the very least the PSK needs to be secured properly after it is generated since all trust is based on it. If both components are compromised by a bad actor, then that actor can potentially impersonate another node in the cluster. The good news is, there are a variety of ways to secure the Root CA components, however that topic goes beyond the scope of this post.

The Need for Rotation

If we are following best practices when generating our CAs and keystores, they will have an expiry date. This is a good thing because it forces us to regenerate and roll out our new encryption assets (stores, certificates, passwords) to the cluster. By doing this we minimise the exposure that any one of the components has. For example, if a password for a keystore is unknowingly leaked, the password is only good up until the keystore expiry. Having a scheduled expiry reduces the chance of a security leak becoming a breach, and increases the difficulty for a bad actor to gain persistence in the system. In the worst case it limits the validity of compromised credentials.

Always Read the Expiry Label

The only catch to having an expiry date on our encryption assets is that we need to rotate (update) them before they expire. Otherwise, our data will be unavailable or may be inconsistent in our cluster for a period of time. Expired encryption assets when forgotten can be a silent, sinister problem. If, for example, our SSL certificates expire unnoticed we will only discover this blunder when we restart the Cassandra service. In this case the Cassandra service will fail to connect to the cluster on restart and SSL expiry error will appear in the logs. At this point there is nothing we can do without incurring some data unavailability or inconsistency in the cluster. We will cover what to do in this case in a subsequent post. However, it is best to avoid this situation by rotating the encryption assets before they expire.

How to Play Musical Certificates

Assuming we are going to rotate our SSL certificates before they expire, we can perform this operation live on the cluster without downtime. This process requires the replication factor and consistency level to configured to allow for a single node to be down for a short period of time in the cluster. Hence, it works best when use a replication factor >= 3 and use consistency level <= QUORUM or LOCAL_QUORUM depending on the cluster configuration.

  1. Create the NEW encryption assets; NEW CA, NEW keystores, and NEW truststore, using the process described earlier.
  2. Import the NEW CA to the OLD truststore already deployed in the cluster using keytool. The OLD truststore will increase in size, as it has both the OLD and NEW CAs in it.
    $ keytool -keystore <old_truststore> -alias CARoot -importcert -file <new_ca_pc> -keypass <new_ca_psk_password> -storepass <old_truststore_password> -noprompt
    

    Where:

    • <old_truststore>: The path to the OLD truststore already deployed in the cluster. This can be just a copy of the OLD truststore deployed.
    • <new_ca_pc>: The path to the NEW CA PC generated.
    • <new_ca_psk_password>: The password for the NEW CA PSKz.
    • <old_truststore_password>: The password for the OLD truststore.
  3. Deploy the updated OLD truststore to all the nodes in the cluster. Specifically, perform these steps on a single node, then repeat them on the next node until all nodes are updated. Once this step is complete, all nodes in the cluster will be able to establish connections using both the OLD and NEW CAs.
    1. Drain the node using nodetool drain.
    2. Stop the Cassandra service on the node.
    3. Copy the updated OLD truststore to the node.
    4. Start the Cassandra service on the node.
  4. Deploy the NEW keystores to their respective nodes in the cluster. Perform this operation one node at a time in the same way the OLD truststore was deployed in the previous step. Once this step is complete, all nodes in the cluster will be using their NEW SSL certificate to establish encrypted connections with each other.
  5. Deploy the NEW truststore to all the nodes in the cluster. Once again, perform this operation one node at a time in the same way the OLD truststore was deployed in Step 3.

The key to ensuring uptime in the rotation are in Steps 2 and 3. That is, we have the OLD and the NEW CAs all in the truststore and deployed on every node prior to rolling out the NEW keystores. This allows nodes to communicate regardless of whether they have the OLD or NEW keystore. This is because both the OLD and NEW assets are trusted by all nodes. The process still works whether our NEW CAs are per host or cluster wide. If the NEW CAs are per host, then they all need to be added to the OLD truststore.

Example Certificate Rotation on a Cluster

Now that we understand the theory, let’s see the process in action. We will use ccm to create a three node cluster running Cassandra 3.11.10 with internode encryption configured.

As pre-cluster setup task we will generate the keystores and truststore to implement the internode encryption. Rather than carry out the steps manually to generate the stores, we have developed a script called generate_cluster_ssl_stores that does the job for us.

The script requires us to supply the node IP addresses, and a certificate configuration file. Our certificate configuration file, test_ca_cert.conf has the following contents:

[ req ]
distinguished_name     = req_distinguished_name
prompt                 = no
output_password        = mypass
default_bits           = 2048

[ req_distinguished_name ]
C                      = AU
ST                     = NSW
L                      = Sydney
O                      = TLP
OU                     = SSLTestCluster
CN                     = SSLTestClusterRootCA
emailAddress           = info@thelastpickle.com¡

The command used to call the generate_cluster_ssl_stores.sh script is as follows.

$ ./generate_cluster_ssl_stores.sh -g -c -n 127.0.0.1,127.0.0.2,127.0.0.3 test_ca_cert.conf

Let’s break down the options in the above command.

  • -g - Generate passwords for each keystore and the truststore.
  • -c - Create a Root CA for the cluster and sign each keystore PC with it.
  • -n - List of nodes to generate keystores for.

The above command generates the following encryption assets.

$ ls -alh ssl_artifacts_20210602_125353
total 72
drwxr-xr-x   9 anthony  staff   288B  2 Jun 12:53 .
drwxr-xr-x   5 anthony  staff   160B  2 Jun 12:53 ..
-rw-r--r--   1 anthony  staff    17B  2 Jun 12:53 .srl
-rw-r--r--   1 anthony  staff   4.2K  2 Jun 12:53 127-0-0-1-keystore.jks
-rw-r--r--   1 anthony  staff   4.2K  2 Jun 12:53 127-0-0-2-keystore.jks
-rw-r--r--   1 anthony  staff   4.2K  2 Jun 12:53 127-0-0-3-keystore.jks
drwxr-xr-x  10 anthony  staff   320B  2 Jun 12:53 certs
-rw-r--r--   1 anthony  staff   1.0K  2 Jun 12:53 common-truststore.jks
-rw-r--r--   1 anthony  staff   219B  2 Jun 12:53 stores.password

With the necessary stores generated we can create our three node cluster in ccm. Prior to starting the cluster our nodes should look something like this.

$ ccm status
Cluster: 'SSLTestCluster'
-------------------------
node1: DOWN (Not initialized)
node2: DOWN (Not initialized)
node3: DOWN (Not initialized)

We can configure internode encryption in the cluster by modifying the cassandra.yaml files for each node as follows. The passwords for each store are in the stores.password file created by the generate_cluster_ssl_stores.sh script.

node1 - cassandra.yaml

...
server_encryption_options:
  internode_encryption: all
  keystore: /ssl_artifacts_20210602_125353/127-0-0-1-keystore.jks
  keystore_password: HQR6xX4XQrYCz58CgAiFkWL9OTVDz08e
  truststore: /ssl_artifacts_20210602_125353/common-truststore.jks
  truststore_password: 8dPhJ2oshBihAYHcaXzgfzq6kbJ13tQi
...

node2 - cassandra.yaml

...
server_encryption_options:
  internode_encryption: all
  keystore: /ssl_artifacts_20210602_125353/127-0-0-2-keystore.jks
  keystore_password: Aw7pDCmrtacGLm6a1NCwVGxohB4E3eui
  truststore: /ssl_artifacts_20210602_125353/common-truststore.jks
  truststore_password: 8dPhJ2oshBihAYHcaXzgfzq6kbJ13tQi
...

node3 - cassandra.yaml

...
server_encryption_options:
  internode_encryption: all
  keystore: /ssl_artifacts_20210602_125353/127-0-0-3-keystore.jks
  keystore_password: 1DdFk27up3zsmP0E5959PCvuXIgZeLzd
  truststore: /ssl_artifacts_20210602_125353/common-truststore.jks
  truststore_password: 8dPhJ2oshBihAYHcaXzgfzq6kbJ13tQi
...

Now that we configured internode encryption in the cluster, we can start the nodes and monitor the logs to make sure they start correctly.

$ ccm node1 start && touch ~/.ccm/SSLTestCluster/node1/logs/system.log && tail -n 40 -f ~/.ccm/SSLTestCluster/node1/logs/system.log
...
$ ccm node2 start && touch ~/.ccm/SSLTestCluster/node2/logs/system.log && tail -n 40 -f ~/.ccm/SSLTestCluster/node2/logs/system.log
...
$ ccm node3 start && touch ~/.ccm/SSLTestCluster/node3/logs/system.log && tail -n 40 -f ~/.ccm/SSLTestCluster/node3/logs/system.log

In all cases we see the following message in the logs indicating that internode encryption is enabled.

INFO  [main] ... MessagingService.java:704 - Starting Encrypted Messaging Service on SSL port 7001

Once all the nodes have started, we can check the cluster status. We are looking to see that all nodes are up and in a normal state.

$ ccm node1 nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  90.65 KiB  16           65.8%             2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  66.31 KiB  16           65.5%             f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  71.46 KiB  16           68.7%             46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

We will create a NEW Root CA along with a NEW set of stores for the cluster. As part of this process, we will add the NEW Root CA PC to OLD (current) truststore that is already in use in the cluster. Once again we can use our generate_cluster_ssl_stores.sh script to this, including the additional step of adding the NEW Root CA PC to our OLD truststore. This can be done with the following commands.

# Make the password to our old truststore available to script so we can add the new Root CA to it.

$ export EXISTING_TRUSTSTORE_PASSWORD=$(cat ssl_artifacts_20210602_125353/stores.password | grep common-truststore.jks | cut -d':' -f2)
$ ./generate_cluster_ssl_stores.sh -g -c -n 127.0.0.1,127.0.0.2,127.0.0.3 -e ssl_artifacts_20210602_125353/common-truststore.jks test_ca_cert.conf 

We call our script using a similar command to the first time we used it. The difference now is we are using one additional option; -e.

  • -e - Path to our OLD (existing) truststore which we will add the new Root CA PC to. This option requires us to set the OLD truststore password in the EXISTING_TRUSTSTORE_PASSWORD variable.

The above command generates the following new encryption assets. These files are located in a different directory to the old ones. The directory with the old encryption assets is ssl_artifacts_20210602_125353 and the directory with the new encryption assets is ssl_artifacts_20210603_070951

$ ls -alh ssl_artifacts_20210603_070951
total 72
drwxr-xr-x   9 anthony  staff   288B  3 Jun 07:09 .
drwxr-xr-x   6 anthony  staff   192B  3 Jun 07:09 ..
-rw-r--r--   1 anthony  staff    17B  3 Jun 07:09 .srl
-rw-r--r--   1 anthony  staff   4.2K  3 Jun 07:09 127-0-0-1-keystore.jks
-rw-r--r--   1 anthony  staff   4.2K  3 Jun 07:09 127-0-0-2-keystore.jks
-rw-r--r--   1 anthony  staff   4.2K  3 Jun 07:09 127-0-0-3-keystore.jks
drwxr-xr-x  10 anthony  staff   320B  3 Jun 07:09 certs
-rw-r--r--   1 anthony  staff   1.0K  3 Jun 07:09 common-truststore.jks
-rw-r--r--   1 anthony  staff   223B  3 Jun 07:09 stores.password

When we look at our OLD truststore we can see that it has increased in size. Originally, it was 1.0K and it is now 2.0K in size after adding the new Root CA PC it.

$ ls -alh ssl_artifacts_20210602_125353/common-truststore.jks
-rw-r--r--  1 anthony  staff   2.0K  3 Jun 07:09 ssl_artifacts_20210602_125353/common-truststore.jks

We can now roll out the updated OLD truststore. In a production Cassandra deployment we would copy the updated OLD truststore to a node and restart the Cassandra service. Then repeat this process on the other nodes in the cluster, one node at a time. In our case, our locally running nodes are already pointing to the updated OLD truststore. We need to only restart the Cassandra service.

$ for i in $(ccm status | grep UP | cut -d':' -f1); do echo "restarting ${i}" && ccm ${i} stop && sleep 3 && ccm ${i} start; done
restarting node1
restarting node2
restarting node3

After the restart, our nodes are up and in a normal state.

$ ccm node1 nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  140.35 KiB  16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  167.23 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  173.7 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

Our nodes are using the updated OLD truststore which has the old Root CA PC and the new Root CA PC. This means that nodes will be able to communicate using either the old (current) keystore or the new keystore. We can now roll out the new keystore one node at a time and still have all our data available.

To do the new keystore roll out we will stop the Cassandra service, update its configuration to point to the new keystore, and then start the Cassandra service. A few notes before we start:

  • The node will need to point to the new keystore located in the directory with the new encryption assets; ssl_artifacts_20210603_070951.
  • The node will still need to use the OLD truststore, so its path will remain unchanged.

node1 - stop Cassandra service

$ ccm node1 stop
$ ccm node2 status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
DN  127.0.0.1  140.35 KiB  16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  142.19 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  148.66 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

node1 - update keystore path to point to new keystore in cassandra.yaml

...
server_encryption_options:
  internode_encryption: all
  keystore: /ssl_artifacts_20210603_070951/127-0-0-1-keystore.jks
  keystore_password: V3fKP76XfK67KTAti3CXAMc8hVJGJ7Jg
  truststore: /ssl_artifacts_20210602_125353/common-truststore.jks
  truststore_password: 8dPhJ2oshBihAYHcaXzgfzq6kbJ13tQi
...

node1 - start Cassandra service

$ ccm node1 start
$ ccm node2 status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  179.23 KiB  16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  142.19 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  148.66 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

At this point we have node1 using the new keystore while node2 and node3 are using the old keystore. Our nodes are once again up and in a normal state, so we can proceed to update the certificates on node2.

node2 - stop Cassandra service

$ ccm node2 stop
$ ccm node3 status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  224.48 KiB  16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
DN  127.0.0.2  188.46 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  194.35 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

node2 - update keystore path to point to new keystore in cassandra.yaml

...
server_encryption_options:
  internode_encryption: all
  keystore: /ssl_artifacts_20210603_070951/127-0-0-2-keystore.jks
  keystore_password: 3uEjkTiR0xI56RUDyo23TENJjtMk8VbY
  truststore: /ssl_artifacts_20210602_125353/common-truststore.jks
  truststore_password: 8dPhJ2oshBihAYHcaXzgfzq6kbJ13tQi
...

node2 - start Cassandra service

$ ccm node2 start
$ ccm node3 status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  224.48 KiB  16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  227.12 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  194.35 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

At this point we have node1 and node2 using the new keystore while node3 is using the old keystore. Our nodes are once again up and in a normal state, so we can proceed to update the certificates on node3.

node3 - stop Cassandra service

$ ccm node3 stop
$ ccm node1 nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  225.42 KiB  16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  191.31 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
DN  127.0.0.3  194.35 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

node3 - update keystore path to point to new keystore in cassandra.yaml

...
server_encryption_options:
  internode_encryption: all
  keystore: /ssl_artifacts_20210603_070951/127-0-0-3-keystore.jks
  keystore_password: hkjMwpn2y2aYllePAgCNzkBnpD7Vxl6f
  truststore: /ssl_artifacts_20210602_125353/common-truststore.jks
  truststore_password: 8dPhJ2oshBihAYHcaXzgfzq6kbJ13tQi
...

node3 - start Cassandra service

$ ccm node3 start
$ ccm node1 nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  225.42 KiB  16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  191.31 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  239.3 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

The keystore rotation is now complete on all nodes in our cluster. However, all nodes are still using the updated OLD truststore. To ensure that our old Root CA can no longer be used to intercept messages in our cluster we need to roll out the NEW truststore to all nodes. This can be done in the same way we deployed the new keystores.

node1 - stop Cassandra service

$ ccm node1 stop
$ ccm node2 status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
DN  127.0.0.1  225.42 KiB  16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  191.31 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  185.37 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

node1 - update truststore path to point to new truststore in cassandra.yaml

...
server_encryption_options:
  internode_encryption: all
  keystore: /ssl_artifacts_20210603_070951/127-0-0-1-keystore.jks
  keystore_password: V3fKP76XfK67KTAti3CXAMc8hVJGJ7Jg
  truststore: /ssl_artifacts_20210603_070951/common-truststore.jks
  truststore_password: 0bYmrrXaKIPJQ5UrtQQTFpPLepMweaLc
...

node1 - start Cassandra service

$ ccm node1 start
$ ccm node2 status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  150 KiB    16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  191.31 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  185.37 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

Now we update the truststore for node2.

node2 - stop Cassandra service

$ ccm node2 stop
$ ccm node3 nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  150 KiB    16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
DN  127.0.0.2  191.31 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  185.37 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

node2 - update truststore path to point to NEW truststore in cassandra.yaml

...
server_encryption_options:
  internode_encryption: all
  keystore: /ssl_artifacts_20210603_070951/127-0-0-2-keystore.jks
  keystore_password: 3uEjkTiR0xI56RUDyo23TENJjtMk8VbY
  truststore: /ssl_artifacts_20210603_070951/common-truststore.jks
  truststore_password: 0bYmrrXaKIPJQ5UrtQQTFpPLepMweaLc
...

node2 - start Cassandra service

$ ccm node2 start
$ ccm node3 nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  150 KiB    16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  294.05 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  185.37 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

Now we update the truststore for node3.

node3 - stop Cassandra service

$ ccm node3 stop
$ ccm node1 nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  150 KiB    16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  208.83 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
DN  127.0.0.3  185.37 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

node3 - update truststore path to point to NEW truststore in cassandra.yaml

...
server_encryption_options:
  internode_encryption: all
  keystore: /ssl_artifacts_20210603_070951/127-0-0-3-keystore.jks
  keystore_password: hkjMwpn2y2aYllePAgCNzkBnpD7Vxl6f
  truststore: /ssl_artifacts_20210603_070951/common-truststore.jks
  truststore_password: 0bYmrrXaKIPJQ5UrtQQTFpPLepMweaLc
...

node3 - start Cassandra service

$ ccm node3 start
$ ccm node1 nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  150 KiB    16           100.0%            2661807a-d8d3-4bba-8639-6c0fada2ac88  rack1
UN  127.0.0.2  208.83 KiB  16           100.0%            f3db4bbe-1f35-4edb-8513-cb55a05393a7  rack1
UN  127.0.0.3  288.6 KiB  16           100.0%            46c2f4b5-905b-42b4-8bb9-563a03c4b415  rack1

The rotation of the certificates is now complete and all while having only a single node down at any one time! This process can be used for all three of the deployment variations. In addition, it can be used to move between the different deployment variations without incurring downtime.

Conclusion

Internode encryption plays an important role in securing the internal communication of a cluster. When deployed, it is crucial that certificate expiry dates be tracked so the certificates can be rotated before they expire. Failure to do so will result in unavailability and inconsistencies.

Using the process discussed in this post and combined with the appropriate tooling, internode encryption can be easily deployed and associated certificates easily rotated. In addition, the process can be used to move between the different encryption deployments.

Regardless of the reason for using the process, it can be executed without incurring downtime in common Cassandra use cases.

apache cassandra certificates ssl security keys encryption