Two weeks ago marked another Cassandra summit. As usual I submitted a handful of talks, and surprisingly they all got accepted. The first talk (video linked) I gave was an introduction to a tool I started back at DataStax called Dataset Manager for Apache Cassandra, further referred to as CDM. CDM started as a a simple question - what can we do to help people learn how to use Apache Cassandra? How can new users avoid the headaches of incorrect data modeling, repeated production deployments, and costly schema migrations.
As explained “in extenso” by Alain in his installment on how Apache Cassandra deletes data, removing rows or cells from a table is done by a special kind of write called a tombstone. But did you know that inserting a null value into a field from a CQL statement also generates a tombstone? This happens because Cassandra cannot decide whether inserting a null value means that we are trying to void a field that previously had a value or that we do not want to insert a value for that specific field.
There’s nothing fun about installing some software for the first time and feeling like it doesn’t work at all. Unfortunately if you’ve just installed Cassandra 2.2 or 3.0 on a recent Linux distribution, you may run into a not-so-friendly
Connection error when trying to use
Deleting distributed and replicated data from a system such as Apache Cassandra is far trickier than in a relational database. The process of deletion becomes more interesting when we consider that Cassandra stores its data in immutable files on disk. In such a system, to record the fact that a delete happened, a special value called a “tombstone” needs to be written as an indicator that previous values are to be considered deleted. Though this may seem quite unusual and/or counter-intuitive (particularly when you realize that a delete actually takes up space on disk), we’ll use this blog post to explain what is actually happening along side examples that you can follow on your own.