A handy feature was silently added to Apache Cassandra’s
nodetool just over a year ago. The feature added was the
-j (jobs) option. This little gem controls the number of compaction threads to use when running either a
upgradesstables. The option was added to
nodetool via CASSANDRA-11179 to version 3.5. It has been back ported to Apache Cassandra versions 2.1.14, 2.2.6, and 3.5.
One of the big challenges people face when starting out working with Cassandra and time series data is understanding the impact of how your write workload will affect your cluster. Writing too quickly to a single partition can create hot spots that limit your ability to scale out. Partitions that get too large can lead to issues with repair, streaming, and read performance. Reading from the middle of a large partition carries a lot of overhead, and results in increased GC pressure. Cassandra 4.0 should improve the performance of large partitions, but it won’t fully solve the other issues I’ve already mentioned. For the foreseeable future, we will need to consider their performance impact and plan for them accordingly.
Since we created our hard fork of Spotify’s great repair tool, Reaper, we’ve been committed to make it the “de facto” community tool to manage repairing Apache Cassandra clusters.
This required Reaper to support all versions of Apache Cassandra (starting from 1.2) and some features it lacked like incremental repair.
Another thing we really wanted to bring in was to remove the dependency on a Postgres database to store Reaper data. As Apache Cassandra users, it felt natural to store these in our favorite database.
Auto bootstrapping is a handy feature when it comes to growing an Apache Cassandra cluster. There are some unknowns about how this feature works which can lead to data inconsistencies in the cluster. In this post I will go through a bit about the history of the feature, the different knobs and levers available to operate it, and resolving some of the common issues that may arise.