Running Cassandra on DC/OS

Mesophere recently opened sourced their DataCenter Operating System (DC/OS), a platform to manage Data Center resources. DCOS is built on Apache Mesos which provides tooling to “Program against your datacenter like it’s a single pool of resources”. While Mesos provides primitives to request resources from a pool, DC/OS provides common applications as packages in a repository called the Universe. It also provides a web UI and a CLI to manage these resources. One helpful way to understand Mesos and DC/OS is imagine if you were to package an application as a container: You want tools to deploy and configure this container without having to deal directly with provisioning and system level configuration.

Apache Cassandra is a good example of an application that can be managed with DC/OS. To run Casandra we require software to be installed and configured across one or more nodes. DC/OS has a package for Cassandra that gives a wizard based install and after the process is up and running tools to help with configuration and management tasks. This blog will show how to setup DC/OS in the Amazon cloud, how to install Apache Cassandra on a DC/OS cluster, and finally new ways to interact with and Apache Cassandra after it is installed.

Setting up DCOS

For this blog we installed DC/OS using the DCOS on AWS Documentation, however as you would expect it supports multiple deployment scenarios. While the instructions are easy to work through, we have mentioned a few steps below that took a little more though (or were just interesting). There are a few places in the process that require the user to complete an OAuth dialog we skipped those for brevity.

In the instructions listed above, Step 2 provides a link to the DC/OS Cloud Formation template to launch a cluster in AWS. When you click that your first choice is choosing which Amazon Region you wish to launch the Cloud Formation template. You normally will want to pick the same Region as the rest of your infrastructure.

AWS Location

After you have clicked on one of the links above you will be taken to Amazon Cloud formation wizard to launch the Stack. A Stack is the Cloud Formation term for a set of related resources that can be managed as a single unit. As you proceed through the installation wizard you will be prompted with a settings screen. The SlaveInstanceCount setting is used to control the number of “follower nodes”, the nodes DC/OS will install managed services like Cassandra on. DC/OS treats these nodes as a pool of resources which have RAM, CPU, and hard disk space. For testing purposes I lowered the value of SlaveInstanceCount from 5 to 3, because to get the full distributed Cassandra experience you need a minimum of three nodes.

Cloud Formation

Keep in mind that that the DC/OS CloudFormation template creates multiple servers with moderate computing capabilities. The template currently uses m3.xlarge instances, resulting in the configuration above costing approximately $1000.00 USD to run for a month, so destroy this stack when you are done evaluating. Also if using this template for a production installation to remember chose instance types that make the most sense for your applications.

After you complete the rest of the wizard, Amazon begins to prepare the systems described in the template. Once the Status column reaches CREATE_COMPLETE, the entire stack is set up. At that point you can connect to the cluster and manage it.

Take the address from the Output tab listed as DnsAddress and open it using your web browser.

Cloud Formation

After passing the OAuth dialog process you are presented with the main dashboard.

Main Dashboard

DCOS CLI

You may have noticed that DC/OS is provisioning servers on non internet routable address space of 10.X.X.X. Now that Cassandra is installed we wish to interact with it. A great way to do this is using the DC/OS CLI. Click the email address on the bottom left of the DC/OS display (in the image above) and chose the option to Install CLI. A popup window appears with install steps to run on your management station (I am managing from my laptop).

$ mkdir -p dcos && cd dcos && 
>   curl -O https://downloads.dcos.io/dcos-cli/install.sh && 
>   bash ./install.sh . https://dcos-stag-elasticl-LONGNAME.us-east-1.elb.amazonaws.com && 
>   source ./bin/env-setup
Installing DCOS CLI from PyPI...

Modify your bash profile to add DCOS to your PATH? [yes/no] yes
Finished installing and configuring DCOS CLI.

We now can use the DC/OS command line tool (from my laptop).

$ dcos help
Command line utility for the Mesosphere Datacenter Operating
System (DCOS). The Mesosphere DCOS is a distributed operating
system built around Apache Mesos. This utility provides tools
for easy management of a DCOS installation.

Available DCOS commands:

auth           	Authenticate to DCOS cluster
config         	Manage the DCOS configuration file
help           	Display help information about DCOS
marathon       	Deploy and manage applications to DCOS
node           	Administer and manage DCOS cluster nodes
package        	Install and manage DCOS software packages
service        	Manage DCOS services
task           	Manage DCOS tasks

Get detailed command description with 'dcos <command> --help'.

One helpful command is node which shows the nodes in the entire DC/OS cluster .

$ dcos node
 HOSTNAME       IP                         ID                    
10.0.2.150  10.0.2.150  3da74652-c497-4625-81ab-5a61e93d2100-S2  
10.0.2.151  10.0.2.151  3da74652-c497-4625-81ab-5a61e93d2100-S0  
10.0.2.152  10.0.2.152  3da74652-c497-4625-81ab-5a61e93d2100-S3  
10.0.7.78   10.0.7.78   3da74652-c497-4625-81ab-5a61e93d2100-S1 

Installing Cassandra

Now we can begin installing our favorite NoSQL database, Apache Cassandra.

The Universe is a distributed repository of all the packages a user can install on the DC/OS cluster. After clicking Universe in the left menu, you can see the Cassandra package which, at the time of writing, had the version number 1.0.2-2.2.5. The first number is the version of the DC/OS installer package (i.e. 1.0.2), the second number is the version of Cassandra that will be installed (i.e. 2.2.5). (Notice that not every package follows this scheme).

Cassandra install

After selecting to install Cassandra, chose the advanced options. This allows you to adjust a number of the common settings found in the cassandra.yaml file. Additionally, you have the chance to set the properties for the container such as how much memory and disk space the container will have.

Settings

You will notice that each setting has a (i) icon next to it. This will explain the meaning of the setting. Once you have chosen the settings you like, quickly move to the Nodes view in the DC/OS left menu and you can see Cassandra containers being provisioned before your eyes!

Settings

Installing and using the Cassandra specific CLI functions

Some DC/OS packages have a companion helper library to install. The helper library extends the commands available in the DC/OS CLI. Cassandra has such a helper library that needs to be separately installed.

$ dcos package install cassandra  --cli
[core.dcos_acs_token]: set
Installing CLI subcommand for package [cassandra] version [1.0.2-2.2.5]
New command available: dcos cassandra

We can use the help options to see what new commands are available.

$ dcos cassandra --help
Usage: dcos-cassandra cassandra [OPTIONS] COMMAND [ARGS]...

Options:
  --info / --no-info
  --name TEXT         Name of the Cassandra instance to query.
  --config-schema     Prints the config schema for Cassandra.
  --help              Show this message and exit.

Commands:
  backup      Backup Cassandra data
  cleanup     Cleanup old token mappings
  connection  Provides connection information
  node        Manage Cassandra nodes
  repair      Perform primary range repair.
  restore     Restore Cassandra cluster from backup
  seeds       Retrieve seed node information

First, we can determine a list of what nodes have had Cassandra installed on them.

$ dcos cassandra node list
[
    "node-0",
    "node-1",
    "node-2"
]

Next, we can check the status of the nodes, in this case the library is extracting the output from the Cassandra’s nodetool info command.

$ dcos cassandra node status 0
{
    "data_center": "dc1",
    "endpoint": "10.0.2.150",
    "gossip_initialized": true,
    "gossip_running": true,
    "host_id": "78a93f1b-0721-47c3-b718-3781924b4192",
    "joined": true,
    "mode": "NORMAL",
    "native_transport_running": true,
    "rack": "rac1",
    "rpc_running": true,
    "token_count": 256,
    "version": "2.2.5"
}

If you study the output above you will notice that the Cassandra node has an endpoint of 10.0.2.150. If you are familiar with computer networking you know that 10.X.X.X is a non routable IP range. Normally you would have to manually setup an SSH proxy or some type of VPN to connect to this system. DC/OS provides helpers that provide you a proxied SSH connection. This makes it easy to connect to the containers the system provisions.

$ dcos node --mesos-id 3da74652-c497-4625-81ab-5a61e93d2100-S2 --master-proxy ssh 
Running `ssh -A -t core@XXXX ssh -A -t core@10.0.2.150`
The authenticity of host '10.0.2.150 (10.0.2.150)' can't be established.
ED25519 key fingerprint is ef:cc:48:14:4f:5a:66:f2:cc:23:ee:4f:75:66:23:0e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.2.150' (ED25519) to the list of known hosts.
CoreOS stable (766.5.0)
Update Strategy: No Reboots
core@ip-10-0-2-150 ~ $

Now they we have connected to the system you may notice that even thought it is running Cassandra it does not seem to have client programs like cqlsh or the cli on the path. This is because this container is stripped down to essentials. Luckily we can use the docker run command to retrieve an container that has cqlsh and launch that container with a single (all be it long) command.

core@ip-10-0-2-150 ~ $ docker run -i -t --net=host --entrypoint=/usr/bin/cqlsh spotify/cassandra -e "select * from system.schema_keyspaces" 10.0.2.151 9160
Unable to find image 'spotify/cassandra:latest' locally
latest: Pulling from spotify/cassandra
1e58eecba27a: Pull complete 
a0e9fe2f8803: Pull complete 
Digest: sha256:8bf8303c57f2a1f1a6c2faa880e9d2f9a1843777097e53847e8725d92e400503
Status: Downloaded newer image for spotify/cassandra:latest

 keyspace_name      | durable_writes | strategy_class | strategy_options
--------------------+----------------+----------------+----------------------------
        system_auth |           True | SimpleStrategy | {"replication_factor":"1"}
 system_distributed |           True | SimpleStrategy | {"replication_factor":"3"}
             system |           True | LocalStrategy  |                         {}
      system_traces |           True | SimpleStrategy | {"replication_factor":"2"}

Subsequent invocations will not need to re-download the container. A simple CQL shell an be started like this:

core@ip-10-0-2-150 ~ $ docker run -i -t --net=host --entrypoint=/usr/bin/cqlsh spotify/cassandra

Wrap up

In addition to Apache Cassandra, DC/OS provides many other packages in the universe repository including Apache Spark, Apache Kafka, and many others. DC/OS is a nice way to ‘hit the ground’ running especially if you need a staging or development environment and you do not want to get into the particulars of installing and configuring a S.M.A.C.K stack to prototype an application. Managing applications as containers also helps from the administrative and utilization standpoint by making installations repeatable and enabling applications to cohabitate on servers with the proper dedicated resources.

cassandra mesos