One of the usual suspects for performance issues in the read path of Apache Cassandra is the presence of tombstones. We are used to check how many tombstones are accessed per read early in the process, to identify the possible cause of excessive GC pauses or high read latencies.
While trying to understand unexpected high read latencies for a customer a few months ago, we found out that one special (although fairly common) kind of tombstone was not counted in the metrics nor traced in the logs : primary key deletes.
Apache Cassandra versions 3.x and below have an all or nothing approach when it comes the datacenter user authorization security model. That is, a user has access to all datacenters in the cluster or no datacenters in the cluster. This has changed to something a little more fine grained for versions 4.0 and above, all thanks to Blake Eggleston and the work he has done on CASSANDRA-13985.
This is our second post in our series on performance tuning with Apache Cassandra. In the first post, we examined a fantastic tool for helping with performance analysis, the flame graph. We specifically looked at using Swiss Java Knife to generate them.
Data is critical to modern business and operational teams need to have a Disaster Recovery Plan (DRP) to deal with the risks of potential data loss. At TLP, we are regularly involved in the data recovery and restoration process and in this post we will share information we believe will be useful for those interested in initiating or improving their backup and restore strategy for Apache Cassandra. We will consider some common solutions, and detail the solution we consider the most efficient in AWS + EBS environments as it allows the best Recovery Time Objective (RTO) and is relatively easy to implement.