One of the challenges of running large scale distributed systems is being able to pinpoint problems. It’s all too common to blame a random component (usually a database) whenever there’s a hiccup even when there’s no evidence to support the claim. We’ve already discussed the importance of monitoring tools, graphing and alerting metrics, and using distributed tracing systems like ZipKin to correctly identify the source of a problem in a complex system.

Packed on By

Contains cassandra, performance tuning, flame graphs

What impact on latency should you expect from applying the kernel patches for the Meltdown security vulnerability?

Packed on By

Contains cassandra,meltdown,cassandra-stress,ccm,ubuntu

After seeing a lot of questions surrounding incremental repair on the mailing list and after observing several outages caused by it, we figured it would be good to write down our advices in a blog post.

Packed on By

Contains cassandra, repair, incremental

We had the pleasure to release our monitoring dashboards designed for Apache Cassandra on Datadog last week. It is a nice occasion to share our thoughts around Cassandra Dashboards design as it is a recurrent question in the community.

Packed on By

Contains cassandra, operations, monitoring, datadog, grafana, metrics

All blog posts

All speaking engagements