Alex Dejanovski

Introduction to cstar

01 Oct 2018

Spotify is a long time user of Apache Cassandra at very large scale. It is also a creative company which tries to open source most of the tools they build for internal needs. They released Cassandra Reaper a few years ago to give the community a reliable way of repairing clusters, which we now love and actively maintain. Their latest open sourced tool for Cassandra is cstar, a parallel-ssh equivalent (distributed shell) that is Cassandra topology aware. At TLP, we love it already and are sure you soon will too.

What is cstar?

Running distributed databases requires good automation, especially at scale. But even with small clusters, running the same command or roll restarting a cluster can quickly get tedious. Sure, you can use tools like dsh and pssh, but they run commands on all servers at the same time (or just a given number) and you need to keep a list of the nodes to connect to locally. Each time your cluster scales out/in or if nodes get replaced you need to update the list. If you forget to update you may run commands that won’t touch the whole cluster without noticing.

All commands cannot run on all nodes at the same time either. For instance upgrading sstables, running cleanup, major compaction or restarting nodes will have an impact on either latencies or availability and require more granularity of execution.

Cstar doesn’t suffer any of the above problems. It will discover the topology of the cluster dynamically and tune concurrency based on replication settings. In addition, cstar will run from a single machine (not necessarily within the cluster) that has SSH access to all nodes in the cluster, and perform operations through SSH and SFTP. It requires no dependency, other than nodetool, to be installed on the Cassandra nodes.

Installing cstar

You’ll need to have Python 3 and pip3 installed on your server/laptop and then follow the README instructions which will, in the simplest case, boil down to:

pip3 install cstar

Running cstar

Cstar is built with Python 3 and offers a straightforward way to run simple commands or complex scripts on an Apache Cassandra cluster using a single contact point.

The following command, for example, will perform a rolling restart of Cassandra in the cluster, one node at a time using the one strategy:

cstar run --command="sudo service cassandra restart" --seed-host=<contact_point_ip> --strategy=one

During the execution, cstar will update progress with a clear and pleasant output:

 +  Done, up      * Executing, up      !  Failed, up      . Waiting, up
 -  Done, down    / Executing, down    X  Failed, down    : Waiting, down
Cluster: Test Cluster
DC: dc1
+....
....
....
DC: dc2
+....
....
....
DC: dc3
*....
....
....
2 done, 0 failed, 1 executing

If we want to perform cleanup with topology awareness and have only one replica at a time, running the command for each token range (leaving a quorum of unaffected replicas at RF=3), we can use the default topology strategy:

cstar run --command="nodetool cleanup" --seed-host=<contact_point_ip> --strategy=topology

This way, we’ll have several nodes processing the command to minimize the overall time spent on the operation and still ensure low impact on latencies:

 +  Done, up      * Executing, up      !  Failed, up      . Waiting, up
 -  Done, down    / Executing, down    X  Failed, down    : Waiting, down
Cluster: Test Cluster
DC: dc1
****.
....
....
DC: dc2
++++.
****
....
DC: dc3
+****
....
....
5 done, 0 failed, 12 executing

Finally, if we want to run a command that doesn’t involve pressure on latencies and display the outputs locally, we can use strategy all and add the -v flag to display the command outputs:

cstar run --command="nodetool getcompactionthroughput" --seed-host=<contact_point_ip> --strategy=all -v

Which will give us the following output:

 +  Done, up      * Executing, up      !  Failed, up      . Waiting, up
 -  Done, down    / Executing, down    X  Failed, down    : Waiting, down
Cluster: Test Cluster
DC: dc1
*****
****
****
DC: dc2
*****
****
****
DC: dc3
*****
****
****
0 done, 0 failed, 39 executing
Host node1.mycompany.com finished successfully
stdout:
Current compaction throughput: 0 MB/s

Host node21.mycompany.com finished successfully
stdout:
Current compaction throughput: 0 MB/s

Host node10.mycompany.com finished successfully
stdout:
Current compaction throughput: 0 MB/s

...
...

Host node7.mycompany.com finished successfully
stdout:
Current compaction throughput: 0 MB/s

Host node18.mycompany.com finished successfully
stdout:
Current compaction throughput: 0 MB/s

 +  Done, up      * Executing, up      !  Failed, up      . Waiting, up
 -  Done, down    / Executing, down    X  Failed, down    : Waiting, down
Cluster: Test Cluster
DC: dc1
+++++
++++
++++
DC: dc2
+++++
++++
++++
DC: dc3
+++++
++++
++++
39 done, 0 failed, 0 executing
Job cff7f435-1b9a-416f-99e4-7185662b88b2 finished successfully

How cstar does its magic

When you run a cstar command it will first connect to the seed node you provided and run a set of nodetool commands through SSH.

First, nodetool ring will give it the cluster topology with the state of each node. By default, cstar will stop the execution if one node in the cluster is down or unresponsive. If you’re aware that nodes are down and want to run a command nonetheless, you can add the --ignore-down-nodes flag to bypass the check.

Then cstar will list the keyspaces using nodetool cfstats and build a map of the replicas for all token ranges for each of them. This will allow it to identify which nodes contain the same token ranges, using nodetool describering, and apply the topology strategy accordingly. As shown before, the topology strategy will not allow two nodes that are replicas for the same token to run the command at the same time. If the cluster does not use vnodes, the topology strategy will run the command every RF node. If the cluster uses vnodes but is not using NetworkTopologyStrategy (NTS) for all keyspaces nor spreading across racks, chances are only one node will be able to run the command at once, even with the topology strategy.If both NTS and racks are in use, the topology strategy will run the command on a whole rack at a time.

By default, cstar will process the datacenters in parallel, so 2 nodes being replicas for the same tokens but residing in different datacenters can be processed at the same time.

Once the cluster has been fully mapped execution will start in token order. Cstar is very resilient because it uploads a script on each remote node through SFTP and runs it using nohup. Each execution will write output (std and err) files along with the exit code for cstar to check on regularly. If the command is interrupted on the server that runs cstar, it can be resumed safely as cstar will first check if the script is still running or has finished already on each node that hasn’t gone through yet.
Note that interrupting the command on the cstar host will not stop it on the remote nodes that are already running it.
Resuming an interrupted command is done simply by executing : cstar continue <job_id>

Each time a node finishes running the command cstar will check if the cluster health is still good and if the node is up. This way, if you perform a rolling restart and one of the nodes doesn’t come back up properly, although the exit code of the restart command is 0, cstar will wait indefinitely to protect the availability of the cluster. That is unless you specified a timeout on the job. In such a case, the job will fail. Once the node is up after the command has run, cstar will look for the next candidate node in the ring to run the command.

A few handy flags

Two steps execution

Some commands may be scary to run on the whole cluster and you may want to run them on a subset of the nodes first, check that they are in the expected state manually, and then continue the execution on the rest of the cluster. The --stop-after=<number-of-nodes> flag will do just that. Setting it to --stop-after=1 will run the command on a single node and exit. Once you’ve verified that you’re happy with the execution on that one node you can process the rest of the cluster using cstar continue <job_id>.

Retry failed nodes

Some commands might fail mid-course due to transient problems. By default, cstar continue <job_id> will halt if there is any failed execution in the history of the job. In order to resume the job and retry the execution on the failed nodes, add the --retry-failed flag.

Run the command on a specific datacenter

To process only a specific datacenter add the --dc-filter=<datacenter-name> flag. All other datacenters will be ignored by cstar.

Datacenter parallelism

By default, cstar will process the datacenters in parallel. If you only want only one datacenter to process the command at a time, add the --dc-serial flag.

Specifying a maximum concurrency

You can forcefully limit the number of nodes running the command at the same time, regardless of topology, by adding the --max-concurrency=<number-of-nodes> flag.

Wait between each node

You may want to delay executions between nodes in order to give some room for the cluster to recover from the command. The --node-done-pause-time=<time-in-seconds> flag will allow to specify a pause time that cstar will apply before looking for the next node to run the command on.

Run the command regardless down nodes

If you want to run a command while nodes are down in the cluster add the --ignore-down-nodes flag to cstar.

Run on specific nodes only

If the command is meant to run on some specific nodes only you can use either the --host or the --host-file flags.

Control the verbosity of the output

By default, cstar will only display the progress of the execution as shown above in this post. To get the output of the remote commands, add the -v flag. If you want to get more verbosity on the executions and get debug loggings use either -vv (very verbose) or -vvv (extra verbose).

You haven’t installed it already?

Cstar is the tool that all Apache Cassandra operators have been waiting for to manage clusters of all sizes. We were happy to collaborate closely with Spotify to help them open source it. It has been built and matured at one of the most smart and successful start-ups in the world and was developed to manage hundreds of clusters of all sizes. It requires no dependency to be installed on the cluster and uses SSH exclusively. Thus, it will comply nicely with any security policy and you should be able to run it within minutes on any cluster of any size.

We love cstar so much we are already working on integrating it with Reaper as you can see in the following video :

We’ve seen in this blog post how to run simple one line commands with cstar, but there is much more than meets the eye. In an upcoming blog post we will introduce complex command scripts that perform operations like upgrading a Cassandra cluster, selectively clearing snapshots, or safely switching compaction strategies in a single cstar invocation.

cassandra cstar distributed shell operations topology