Joaquin Casares

Docker Meet Cassandra. Cassandra Meet Docker.

23 Jan 2018

After having spent quite a bit of time learning Docker and after hearing strong community interest for the technology even though few have played with it, I figured it’d be be best to share what I’ve learned. Hopefully the knowledge transfer helps newcomers get up and running with Cassandra in a concise, yet deeply informed manner.

A few years ago I finally started playing with Docker by way of Vagrant. That entire experience was weird. Don’t do it.

Later Docker Compose was released and all the roadblocks I previously encountered immediately melted away and the power of Docker was made very aware to me. Since then I’ve been like my cat, but instead of “Tuna Tuna Tuna Tuna” it’s more like: “Docker Docker Docker Docker.”

But the more I spoke about Docker and asked around about Docker, the sadder I became since:

Few really used Docker.
Fewer had even heard of Docker Compose.
Everyone was worried about how Docker performance would be in production.
Some were waiting for the Mesos and Kubernetes war to play out.
- Kubernetes won by the way. Read any news around Docker-Kubernetes and AWS-Kubernetes to make your own judgements.

Within The Last Pickle, I advocate for Docker as best I can. Development project? “Why not use Docker?” Quick test? “cough Docker cough.” Learn everything you can about Grafana, Graphite, and monitoring dashboards you ask? “Okay, Docker it is!”

About a year later, we’re here and guess what? Now you get to be here with me as well! :tada:

Docker Cassandra Bootstrap

In October, Nick Bailey invited me to present at the local Austin Cassandra Users Meetup and I figured this was the time to consolidate my recent knowledge and learnings into a simplified project. I figured if I had already spent time on such an intricate project I could save others time and give them a clean environment they could play with, develop on, then move into production.

That’s how the docker-cassandra-bootstrap project was born.

I will stay away from how to run the Docker Cassandra Bootstrap project within this blog post since the instructions are already within the Github project’s README.md. Instead, I’ll focus on the individual components, what’s hidden in which nooks, and which stubs are commented out in which crannies for future use and development.

docker-compose.yml

The docker-compose.yml is the core building block for any Docker Compose project.

Building

What Docker provided was the Dockerfile which allowed image definitions to run containers. (Note the differentiation that containers are images that are running.) Building an image using Docker was pretty straight forward:

docker build .

However, building many images would require tagging and additional docker parameters. And that can get confusing really quickly and definitely isn’t user-friendly.

Instead, Docker Compose lets you build entire ecosystems of services with a simple command:

docker-compose build

Now with Docker Compose, you don’t have to keep track of image tagging, Dockerfile location, or anything else that I gratefully have too little experience with. Instead, images are defined by the image (from Docker Hub) and build (from a local Dockerfile) parameters within a Docker Compose service.

Environmental Variables

Along with the simplification of image definitions, Docker Compose introduces the env_file parameter which is a not-quite-bash environmental definition file. It’s slightly different, bash commands will not resolve, you can’t use an envar within another’s definition, and don’t use quotes since those will be considered part of the envar’s value. While these env_files come with their own limitations, env_files means I no longer have to have ugly, long, complicated lines like:

docker -e KEY=VAL -e KEY2=VAL2 -e WHAT=AMI -e DOING=NOW ...

Through the use of multiple docker-compose.yml files, one can create:

An env_file called settings.env which has all generalized settings.
An env_file called dev.env which has dev-specific settings.
An env_file called prod.env which has production-specific settings.
An env_file called nz.env which will define variables to flip all gifs by 180-degrees to counteract the fact that New Zealanders are upside down.

At the end of the day, the filenames, segregation, and environmental variable values are for you to use and define within your own ecosystem.

But that’s getting ahead of ourselves. Just know that you can place whatever environmental variables you want in these files as you create production-specific env_files which may not get used today, but will be utilized when you move into production.

Volumes

Within Docker everything within a container that is not stored in volumes is temporary. This means that if we launch a new container using any given static Docker image, we can manipulate multiple aspects of system configuration, data, file placement, etc, without a concern on changing our stable static environment. If something ever breaks, we simply kill the container, launch a new container based off the same image, and we’re back to our familiar static and stable state. However, if we ever want to persist data, storing this valuable data within Docker volumes ensures that the data is accessible across container restarts.

The above statement probably makes up about 90% of my love for Docker (image layer caching probably makes up a majority of the remaining 10%).

What my declaration of love means is that while a container is running: it is your pet. It does what you ask of it (unless it’s a cat) and it will live with you by your side crunching on the code you fed it. But once your pet has finished processing the provided code, it will vanish into the ether and you can replace it like you would cattle.

This methodology is a perfect fit for short-lived microservice workers. However, if you want to persist data, or provide configuration files to the worker microservices, you’ll want to use volumes.

While you can use named volumes to allow Docker to handle the data location for you, not using named volumes will put you in full control of where the data directory will resolve to, can provide performance benefits, and will remove one layer of indirection in case anything should go wrong.

When people ask about the performance of Docker in production, volumes are the key component. If your service relies on disk access, use volumes to ensure a higher level of performance. For all other cases, if there even is a CPU-based performance hit, it should be mild and you can always scale horizontally. The ultimate performance benefit of using Docker is that applications are containerized and are extremely effective at horizontal scaling. Also, consider the time and effort costs to test, deploy, and rollback any production changes when using containers. Although this isn’t essentially performance related, it does increase codebase velocity which may be as valuable as raw performance metrics.

Entrypoints

Back to looking at the docker-compose.yml, Docker entrypoints define which program will be executed when a machine begins running. You can think of SSH’s entrypoint as being bash, or zsh.

Under the cqlsh service, the default entrypoint is overwritten by cqlsh cassandra, where cassandra is the name of the Cassandra Docker Compose service. This means that we want to use the cassandra:3.11 image, but not use the bash script that sets up the cassandra.yaml and other Cassandra settings. Instead, the service will utilize cassandra:3.11’s image and start the container with cqlsh cassandra. This allows the following shorthand command to be run, all within a Docker container, without any local dependencies other than Docker and Docker Compose:

docker-compose run cqlsh

The above command starts up the cqlsh service, calls the cqlsh binary, and provides the cassandra hostname as the contact point.

The nodetool service is good example of creating a shorthand command for an otherwise complicated process. Instead of having:

docker-compose run --entrypoint bash cassandra
$ nodetool -h cassandra -u cassandraUser -pw cassandraPass status

We can simply run:

docker-compose run nodetool status

Any parameters following the service name are appended to the defined entrypoint’s command and replaces the service’s command parameter. For the nodetool service the default command is help, but in the above line, the command that is appended to the entrypoint is status.

Links

If we have a service that will rely on communication with another service, the way that the cassandra-reaper service must be in contact with the cassandra service that Reaper will be monitoring, we can use the links service parameter to define both the hostname and servicename within this link.

I like to be both simple and explicit, which is why I use the same name for the hostname and service name like:

links:
  - cassandra:cassandra

The above line will allow the cassandra-reaper service’s container to contact the cassandra service by way of the cassandra hostname.

Ports

Ports are a way to expose a service’s port to another service, or bind a service’s port to a local port.

Because the cassandra-reaper service is meant to be used via its web UI, we bind the service’s 8080 and 8081 ports from within the service onto the local machine’s 8080 and 8081 ports using the following lines:

ports:
  - "8080:8080"
  - "8081:8081"

This means if we visit http://localhost:8080/webui/ from our local machine, we’ll be processing code from within a container to service that web request.

Restart

The restart command is more of a Docker Compose-specific scheduler that dictates what a service should do if the container is abruptly terminated.

In the case of the grafana and logspout services, if Grafana or Logspout ever die, the containers will exit and the grafana or logspout services will automatically restart and come back online.

While this parameter may be ideal for some microservices, it may not be ideal for services that power data stores.

The `cassandra` service

The docker-compose.yml defines the cassandra service as having a few mounted configuration files.

The two configuration files that are enabled by default include configurations for:

collectd.
Prometheus JMX exporter.

The two configuration files that are disabled by default are for:

The Graphite metrics reporter.
The Filebeat log reporter for ELK.

collectd

The cassandra/config/collectd.cassandra.conf configuration file loads a few plugins that TLP has found to be useful for enterprise metric dashboards.

A few packages need to be installed for the collectd to be fully installed with the referenced plugins. The service must also be started from within the cassandra/docker-entrypoint.sh or just by using service collectd start on hardware.

The metrics that are collected include information on:

Disk.
CPU load.
Network traffic.
Memory usage.
System logs.

collectd is then configured to write to a Prometheus backend by default. Commented code is included for writing to a Graphite backend.

For further information on each of the plugins, visit the collectd wiki.

Prometheus JMX Exporter

The cassandra/config/prometheus.yml configuration file defines the JMX endpoints that will be collected by the JMX exporter and exposed via a REST API for Prometheus ingestion.

The following jar needs to be used and referenced within cassandra-env.sh:

cassandra/lib/jmx_prometheus_javaagent-0.9.jar

Only the most important enterprise-centric list of metrics are being collected for Prometheus ingestion.

The resulting name formats did not follow the standard Prometheus naming scheme, but instead something between that of the Graphite dot-naming scheme with the use of Prometheus-styled labels when relevant. Do note that Prometheus will automatically convert all dot-separated metrics to underscore-separated metrics since Prometheus does not understand the concept of hierarchical metric keys.

Graphite Metrics Reporter

The cassandra/config/graphite.cassandra.yaml configuration file is included and commented out by default.

For reporting to work correctly, this file requires the following jars to be placed into Cassandra’s lib directory and have Java 8 installed:

Once properly enabled, the cassandra service can contact a Graphite host with enterprise-centric metrics. However, we did not include the graphite service within this project because our experience with Prometheus has been smoother than dealing with Graphite and because Prometheus seems to scale better than Graphite in production environments. As of the most recent Grafana releases, Prometheus has also received plenty of Prometheus-centric features and support as well.

Filebeat Log Reporter

The cassandra/config/filebeat.yml configuration file is set up to contact a pre-existing ELK stack.

The Filebeat package is required and can be started within a Docker container by referencing the Filebeat service from within cassandra-env.sh.

The Filebeat configuration file is by no means complete (and assume incorrect!), but it does provide a good starting point for learning how to ingest log files into Logstash using Filebeat. Please consider this configuration file as fully experimental.

JMX Authentication

Because Reaper for Apache Cassandra will be contacting the JMX port of this cassandra service we will also need to add authentication files for JMX in two locations and set the LOCAL_JMX variable to no to expose the JMX port externally while requiring authentication.

The `cqlsh` Service

The cqlsh service is a helper service that simply uses the cassandra:3.11 image hosted on Docker Hub to provide the cqlsh binary while mounting the local ./cassandra/schema.cql file into the container for simple schema creation and data querying.

The defined setup allows us to run the following command with ease:

docker-compose run cqlsh -f /schema.cql

The `nodetool` Service

The nodetool service is another helper service that is provided to automatically fill in the host, username, and password parameters.

By default, it includes the help command to provide a list of options for nodetool. Running the following command will contact the cassandra node, automatically authenticate, and show a list of options:

docker-compose run nodetool

By including an additional parameter within the command line we will overwrite the help command and run that requested command instead, like status:

docker-compose run nodetool status

The `cassandra-reaper` Service

The cassandra-reaper service does not use a locally built and customized image, but instead uses an image hosted on Docker Hub. The image that is specifically used is ab0fff2. However, you can choose to use the latest release or master image if you want the bleeding edge version.

The configuration is all handled via environmental variables for easy Docker consumption within cassandra-reaper/cassandra-reaper.env. For a list of all configuration options, see the Reaper documentation.

The ports that are exposed onto localhost are 8080 and 8181 for the web UI and administration UI, respectively.

The `grafana` Service

The grafana service uses the grafana/grafana image from Docker Hub and is configured using the grafana/grafana.env. For a list of all configuration options via environmental variables, visit the Grafana documentation.

The two scripts included in grafana/bin/ are referenced in the README.md and are used to create data sources that rely on the Prometheus data store and uploads all the grafana/dashboard JSON files into Grafana.

The JSON files were meticulously created for two of our enterprise customer’s production deployments (shared here with permission). They include 7 dashboards that highlight specific Cassandra metrics in a drill-down fashion:

Overview
Read Path
Write Path
Client Connections
Alerts
Reaper
Big Picture

At TLP we typically always start off with the Overview dashboard. Depending on our metrics we’ll switch to look at the Read Path, Write Path, or Client Connection dashboards for further investigation. Do keep in mind that while these dashboards are a work in progress, they do include proper x/y-axis labeling, units, and tooltips/descriptions. The tooltips/descriptions should provide info in the following order, when relevant:

Description
Values
False Positives
Required Actions
Warning

The text is meant to accompany any alerts that are triggered via the auto-generated Alerts dashboard. This way Slack, PagerDuty, or email alerts contain some context into why the alert was triggered, what most likely culprits may be involved, and how to resolve the alert.

Do note that while the other dashboards may have triggers set up, only the Alerts dashboard will fire the triggers since all the dashboards make use of Templating Variables for easy Environment, Data Center, and Host selection, which Grafana alerts do not yet support.

If there are ever any issues around repair, take a look at the Reaper dashboard as well to monitor the Reaper for Apache Cassandra service.

The Big Picture dashboard provides a 30,000 foot view of the cluster and is nice to reason about, but ultimately provides very little value other than showing the high-level trends of the Cassandra deployment being monitored.

The `logspout` Service

The logspout service is a unique service across multiple aspects. While this section will cover some of the special use cases, feel free to skim this section but do try to intake as much information as possible since these sort of use cases will arise in your future docker-compose.yml creations, even if not today.

At the top of the docker-compose.yml under the logspout section, we define a build parameter. This build parameter references logspout/Dockerfile which has no real information since the FROM image that our local image refers to uses a few ONBUILD commands. These commands are defined in the parent image and make use of logspout/modules.go. Our custom logspout/modules.go installs the required Logstash dependencies for use with pre-existing Logstash deployments.

While logspout/build.sh was not required to be duplicated since the parent image already had the file pre-baked, I did so for safekeeping.

The logspout/logspout.env file includes a few commented out settings I felt were interesting to look at if I were to experiement with Logstash in the future. These might be a good starting point for further investigation.

The logspout service also uses the restart: always setting to ensure that any possible issues with the log redirection service will automatically be resolved by restarting the service immediately after failure.

Within the logspout service, we are redirecting the container’s port 80 to our localhost’s port 8000. This allows us to curl http://localhost:8000/logs from our local machine and grab the logs that the container was displaying on its own REST API under port 80.

In order for any of the logspout container magic to work, we need to bind our localhost’s /var/run/docker.sock into the container as a read-only mount. Even though the mount is read-only, there are still security concerns for doing such an operation. Since this line is still required for allowing this logging redirection to occur, I did include two links to further clarify the security risks involved with including this container within production environments:

https://raesene.github.io/blog/2016/03/06/The-Dangers-Of-Docker.sock/
http://stackoverflow.com/questions/40844197

Perhaps in the future the Docker/Moby team will provide a more secure workaround.

The last line that has not been mentioned is the command option which is commented out by default within the docker-compose.yml. By uncommenting the command option, we stop using the default command provided in the parent Dockerfile which exposes the logs via an REST API and we begin to send our logs to the PaperTrail website which provides 100 MB/month of hosted logs for free.

In order to isolate checked-in code from each developer’s $PAPERTRAIL_PORT, each developer must locally create a copy of .env.template within their project’s directory under the filename .env. Within this .env file we can define environmental variables that will be used by docker-compose.yml. Once you have an account for PaperTrail, set PAPERTRAIL_PORT to the port assigned by PaperTrail to begin seeing your logs within PaperTrail.com. And since .env is setup to be ignored via .gitignore we do not have to worry about sending multiple developer’s logs to the same location.

The `prometheus` Service

The prometheus service is one of the simpler services within this Docker Compose environment.

The prometheus service:

Uses a Docker Hub image.
Will require access to both the cassandra and cassandra-reaper services to monitor their processes.
Will expose the container’s port 9090 onto localhost’s port 9090.
Will persist all data within the container’s /prometheus directory onto our local ./data/prometheus directory.
Will configure the container using a locally-stored copy of the prometheus.yml.

The prometheus.yml defines which REST endpoints Prometheus will consume to gather metrics from as well as a few other documented configurations. Prometheus will collect metrics from the following services:

Prometheus.
Cassandra.
- Cassandra internal metrics.
- collectd metrics.
Reaper for Apache Cassandra.

The `pickle-factory` Sample Write Application Service

The pickle-factory, which is a misnomer and should be “pickle-farm” in hindsight, is a sample application that shows an ideal write pattern.

For production and development purposes, the pickle-factory/Dockerfile uses the line COPY . . to copy all of the “microservices” code into the Docker image. The docker-compose.yml uses the volume key to overwrite any pre-baked code to allow for a simpler development workflow.

The ideal development to production workflow would be to modify the pickle-factory/factory.py file directly and have each saved copy refreshed within the running container. Once all changes were tested and committed to master, a continuous integration (CI) job will build the new Docker images and push those images to Docker Hub for both development and production consumption. The next developer to modify pickle-factory/factory.py will grab the latest image but use their local copy of pickle-factory/factory.py. The next time that the production image gets updated it will include the latest copy of pickle-factory/factory.py, as found in master, directly in the Docker image.

Included in the Dockerfile is commented out code for installing gosu, a sudo replacement for Docker written in Go. gosu allows the main user to spawn another thread under a non-root user to better contain attackers. Theoretically, if an attacker gains access into the pickle-factory container, they would do so under a non-privileged user and not be able to potentially breakout of the container into the Docker internals and out onto the host machine. Practically, gosu limits the possible attack surface on a Docker container. To fully use the provided functionality of gosu, pickle-factory/docker-entrypoint.sh needs to be modified as well.

The pickle-factory/factory.py makes a few important design decisions:

Grabs configurations from environmental variables instead of configuration files for easier Docker-first configurations.
The Cluster() object uses the DCAwareRoundRobinPolicy in preparation for multiple data centers.
The C-based LibevConnection event loop provider is used since it’s more performant than the Python-native default event loop provider.
All statements are prepared in advance to minimize request payload sizes.
- In the background, the Cassandra nodes will send the statements around to each node in the cluster.
- The statements will be indexed via a simple integer.
- All subsequent requests will map the simple integer to the pre-parsed CQL statement.
Employee records are generated and written asynchronously instead of synchronously in an effort to improve throughput.
Workforce data is denormalized and written asynchronously with a max-buffer to ensure we don’t overload Cassandra with too many in-flight requests.
If any requests did not complete successfully the future.result() call will throw the matching exception.

The `pickle-shop` Sample Read Application Service

Much like the pickle-factory microservice, the pickle-shop service follows a similar workflow. However, the pickle-shop service definition within docker-compose.yml will not overwrite the application code within /usr/src/app and instead require a rebuilding the local image by using:

docker-compose build

This workflow was chosen as a way to differentiate between a development workflow (pickle-factory) and a production workflow (pickle-shop). For production images, all code should ideally be baked into the Docker image and shipped without any external dependencies such as a codebase dependency.

pickle-shop/shop.py follows the same overall flow as pickle-factory/factory.py. After preparing the statements, the read-heavy “microservice” completes the following actions:

Performs a synchronous read request to grab all the employee IDs.
10 asynchronous read queries are performed for 10 employees.
We then process all results at the roughly the same time.

If using a non-asynchronous driver, one would probably follow the alternate workflow:

Perform 1 synchronous read query for 1 of 10 employees.
Process the results of the employee query.
Repeat.

However, following the synchronous workflow will take roughly:

O(N), where N is the number of employees waiting to be processed

Following the asynchronous workflow we will roughly take:

O(max(N)), where max(N) is the maximum amount of time an employee query will take.

Simplified, the asynchronous workflow will roughly take:

O(1)

Conclusion

At the end of this series of information, we have covered:

Docker
Docker Compose
A few docker-compose.yml settings.
Dockerized Cassandra.
Helper Docker Compose services.
Dockerized Reaper for Apache Cassandra.
Dockerized Grafana.
Logspout for Docker-driven external logging.
Dockerized Prometheus.
A sample write-heavy asynchronous application using Cassandra.
A sample read-heavy asynchronous application using Cassandra.

TLP’s hope is that the above documentation and minimal Docker Compose ecosystem will provide a starting point for future community Proof of Concepts utilizing Apache Cassandra. With this project, each developer can create, maintain, and develop within their own local environment without any external overhead other than Docker and Docker Compose.

Once the POC has reached a place of stability, simply adding the project to a Continuous Integration workflow to publish tested images will allow for proper Release Management. After that point, the responsibility typically falls onto the DevOps team to grab the latest Docker image, replace any previously existing Docker containers, and launch the new Docker image. This would all occur without any complicated hand off consisting of dependencies, configuration files (since we’re using environmental variables), and OS-specific requirements.

Hopefully you will find Docker Compose to be as powerful as I’ve found it to be! Best of luck in cranking out the new POC you’ve been itching to create!

cassandra docker docker compose grafana logspout prometheus microservices applications