In this post we’ll explore a new compaction strategy available in Apache Cassandra. We’ll dig into it’s use cases, limitations, and share our experiences of using it with various production clusters.
In this post I’ll introduce you to an advanced option in Apache Cassandra called user defined compaction. As the name implies, this is a process by which we tell Cassandra to create a compaction task for one or more tables explicitly. This task is then handed off to the Cassandra runtime to be executed like any other compaction.
Two weeks ago marked another Cassandra summit. As usual I submitted a handful of talks, and surprisingly they all got accepted. The first talk (video linked) I gave was an introduction to a tool I started back at DataStax called Dataset Manager for Apache Cassandra, further referred to as CDM. CDM started as a a simple question - what can we do to help people learn how to use Apache Cassandra? How can new users avoid the headaches of incorrect data modeling, repeated production deployments, and costly schema migrations.
As explained “in extenso” by Alain in his installment on how Apache Cassandra deletes data, removing rows or cells from a table is done by a special kind of write called a tombstone. But did you know that inserting a null value into a field from a CQL statement also generates a tombstone? This happens because Cassandra cannot decide whether inserting a null value means that we are trying to void a field that previously had a value or that we do not want to insert a value for that specific field.