Running Cassandra on DC/OS
Mesophere recently opened sourced their DataCenter Operating System (DC/OS), a platform to manage Data Center resources. DCOS is built on Apache Mesos which provides tooling to “Program against your datacenter like it’s a single pool of resources”. While Mesos provides primitives to request resources from a pool, DC/OS provides common applications as packages in a repository called the Universe. It also provides a web UI and a CLI to manage these resources. One helpful way to understand Mesos and DC/OS is imagine if you were to package an application as a container: You want tools to deploy and configure this container without having to deal directly with provisioning and system level configuration.
Apache Cassandra is a good example of an application that can be managed with DC/OS. To run Casandra we require software to be installed and configured across one or more nodes. DC/OS has a package for Cassandra that gives a wizard based install and after the process is up and running tools to help with configuration and management tasks. This blog will show how to setup DC/OS in the Amazon cloud, how to install Apache Cassandra on a DC/OS cluster, and finally new ways to interact with and Apache Cassandra after it is installed.
Setting up DCOS
For this blog we installed DC/OS using the DCOS on AWS Documentation, however as you would expect it supports multiple deployment scenarios. While the instructions are easy to work through, we have mentioned a few steps below that took a little more though (or were just interesting). There are a few places in the process that require the user to complete an OAuth dialog we skipped those for brevity.
In the instructions listed above, Step 2 provides a link to the DC/OS Cloud Formation template to launch a cluster in AWS. When you click that your first choice is choosing which Amazon Region you wish to launch the Cloud Formation template. You normally will want to pick the same Region as the rest of your infrastructure.
After you have clicked on one of the links above you will be taken to Amazon Cloud formation wizard to launch the Stack. A Stack is the Cloud Formation term for a set of related resources that can be managed as a single unit. As you proceed through the installation wizard you will be prompted with a settings screen. The SlaveInstanceCount
setting is used to control the number of “follower nodes”, the nodes DC/OS will install managed services like Cassandra on. DC/OS treats these nodes as a pool of resources which have RAM, CPU, and hard disk space. For testing purposes I lowered the value of SlaveInstanceCount
from 5 to 3, because to get the full distributed Cassandra experience you need a minimum of three nodes.
Keep in mind that that the DC/OS CloudFormation template creates multiple servers with moderate computing capabilities. The template currently uses m3.xlarge
instances, resulting in the configuration above costing approximately $1000.00 USD to run for a month, so destroy this stack when you are done evaluating. Also if using this template for a production installation to remember chose instance types that make the most sense for your applications.
After you complete the rest of the wizard, Amazon begins to prepare the systems described in the template. Once the Status column reaches CREATE_COMPLETE
, the entire stack is set up. At that point you can connect to the cluster and manage it.
Take the address from the Output
tab listed as DnsAddress
and open it using your web browser.
After passing the OAuth dialog process you are presented with the main dashboard.
DCOS CLI
You may have noticed that DC/OS is provisioning servers on non internet routable address space of 10.X.X.X
. Now that Cassandra is installed we wish to interact with it. A great way to do this is using the DC/OS CLI. Click the email address on the bottom left of the DC/OS display (in the image above) and chose the option to Install CLI. A popup window appears with install steps to run on your management station (I am managing from my laptop).
$ mkdir -p dcos && cd dcos &&
> curl -O https://downloads.dcos.io/dcos-cli/install.sh &&
> bash ./install.sh . https://dcos-stag-elasticl-LONGNAME.us-east-1.elb.amazonaws.com &&
> source ./bin/env-setup
Installing DCOS CLI from PyPI...
Modify your bash profile to add DCOS to your PATH? [yes/no] yes
Finished installing and configuring DCOS CLI.
We now can use the DC/OS command line tool (from my laptop).
$ dcos help
Command line utility for the Mesosphere Datacenter Operating
System (DCOS). The Mesosphere DCOS is a distributed operating
system built around Apache Mesos. This utility provides tools
for easy management of a DCOS installation.
Available DCOS commands:
auth Authenticate to DCOS cluster
config Manage the DCOS configuration file
help Display help information about DCOS
marathon Deploy and manage applications to DCOS
node Administer and manage DCOS cluster nodes
package Install and manage DCOS software packages
service Manage DCOS services
task Manage DCOS tasks
Get detailed command description with 'dcos <command> --help'.
One helpful command is node
which shows the nodes in the entire DC/OS cluster .
$ dcos node
HOSTNAME IP ID
10.0.2.150 10.0.2.150 3da74652-c497-4625-81ab-5a61e93d2100-S2
10.0.2.151 10.0.2.151 3da74652-c497-4625-81ab-5a61e93d2100-S0
10.0.2.152 10.0.2.152 3da74652-c497-4625-81ab-5a61e93d2100-S3
10.0.7.78 10.0.7.78 3da74652-c497-4625-81ab-5a61e93d2100-S1
Installing Cassandra
Now we can begin installing our favorite NoSQL database, Apache Cassandra.
The Universe is a distributed repository of all the packages a user can install on the DC/OS cluster. After clicking Universe in the left menu, you can see the Cassandra package which, at the time of writing, had the version number 1.0.2-2.2.5
. The first number is the version of the DC/OS installer package (i.e. 1.0.2
), the second number is the version of Cassandra that will be installed (i.e. 2.2.5
). (Notice that not every package follows this scheme).
After selecting to install Cassandra, chose the advanced options
. This allows you to adjust a number of the common settings found in the cassandra.yaml file. Additionally, you have the chance to set the properties for the container such as how much memory and disk space the container will have.
You will notice that each setting has a (i) icon next to it. This will explain the meaning of the setting. Once you have chosen the settings you like, quickly move to the Nodes
view in the DC/OS left menu and you can see Cassandra containers being provisioned before your eyes!
Installing and using the Cassandra specific CLI functions
Some DC/OS packages have a companion helper library to install. The helper library extends the commands available in the DC/OS CLI. Cassandra has such a helper library that needs to be separately installed.
$ dcos package install cassandra --cli
[core.dcos_acs_token]: set
Installing CLI subcommand for package [cassandra] version [1.0.2-2.2.5]
New command available: dcos cassandra
We can use the help options to see what new commands are available.
$ dcos cassandra --help
Usage: dcos-cassandra cassandra [OPTIONS] COMMAND [ARGS]...
Options:
--info / --no-info
--name TEXT Name of the Cassandra instance to query.
--config-schema Prints the config schema for Cassandra.
--help Show this message and exit.
Commands:
backup Backup Cassandra data
cleanup Cleanup old token mappings
connection Provides connection information
node Manage Cassandra nodes
repair Perform primary range repair.
restore Restore Cassandra cluster from backup
seeds Retrieve seed node information
First, we can determine a list of what nodes have had Cassandra installed on them.
$ dcos cassandra node list
[
"node-0",
"node-1",
"node-2"
]
Next, we can check the status of the nodes, in this case the library is extracting the output from the Cassandra’s nodetool info
command.
$ dcos cassandra node status 0
{
"data_center": "dc1",
"endpoint": "10.0.2.150",
"gossip_initialized": true,
"gossip_running": true,
"host_id": "78a93f1b-0721-47c3-b718-3781924b4192",
"joined": true,
"mode": "NORMAL",
"native_transport_running": true,
"rack": "rac1",
"rpc_running": true,
"token_count": 256,
"version": "2.2.5"
}
If you study the output above you will notice that the Cassandra node has an endpoint of 10.0.2.150
. If you are familiar with computer networking you know that 10.X.X.X
is a non routable IP range. Normally you would have to manually setup an SSH proxy or some type of VPN to connect to this system. DC/OS provides helpers that provide you a proxied SSH connection. This makes it easy to connect to the containers the system provisions.
$ dcos node --mesos-id 3da74652-c497-4625-81ab-5a61e93d2100-S2 --master-proxy ssh
Running `ssh -A -t core@XXXX ssh -A -t core@10.0.2.150`
The authenticity of host '10.0.2.150 (10.0.2.150)' can't be established.
ED25519 key fingerprint is ef:cc:48:14:4f:5a:66:f2:cc:23:ee:4f:75:66:23:0e.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.0.2.150' (ED25519) to the list of known hosts.
CoreOS stable (766.5.0)
Update Strategy: No Reboots
core@ip-10-0-2-150 ~ $
Now they we have connected to the system you may notice that even thought it is running Cassandra it does not seem to have client programs like cqlsh or the cli on the path. This is because this container is stripped down to essentials. Luckily we can use the docker run command to retrieve an container that has cqlsh and launch that container with a single (all be it long) command.
core@ip-10-0-2-150 ~ $ docker run -i -t --net=host --entrypoint=/usr/bin/cqlsh spotify/cassandra -e "select * from system.schema_keyspaces" 10.0.2.151 9160
Unable to find image 'spotify/cassandra:latest' locally
latest: Pulling from spotify/cassandra
1e58eecba27a: Pull complete
a0e9fe2f8803: Pull complete
Digest: sha256:8bf8303c57f2a1f1a6c2faa880e9d2f9a1843777097e53847e8725d92e400503
Status: Downloaded newer image for spotify/cassandra:latest
keyspace_name | durable_writes | strategy_class | strategy_options
--------------------+----------------+----------------+----------------------------
system_auth | True | SimpleStrategy | {"replication_factor":"1"}
system_distributed | True | SimpleStrategy | {"replication_factor":"3"}
system | True | LocalStrategy | {}
system_traces | True | SimpleStrategy | {"replication_factor":"2"}
Subsequent invocations will not need to re-download the container. A simple CQL shell an be started like this:
core@ip-10-0-2-150 ~ $ docker run -i -t --net=host --entrypoint=/usr/bin/cqlsh spotify/cassandra
Wrap up
In addition to Apache Cassandra, DC/OS provides many other packages in the universe repository including Apache Spark, Apache Kafka, and many others. DC/OS is a nice way to ‘hit the ground’ running especially if you need a staging or development environment and you do not want to get into the particulars of installing and configuring a S.M.A.C.K stack to prototype an application. Managing applications as containers also helps from the administrative and utilization standpoint by making installations repeatable and enabling applications to cohabitate on servers with the proper dedicated resources.