Aaron Morton

Cassandra Reading 02-01-2012

02 Jan 2012

Recent Cassandra reading.

CQL: SQL In Cassandra

By Eric Evans

Good examples of simple statements.
Performance comparisons show CQL has 5% to 10% lower throughput and higher latency.
List of drivers and where to get them.
More improvements coming.

By P. Taylor Goetz

By Edward Capriolo

10TB of data and growing.
Latency requirements of less than 120ms.
“Distributed not Duplicated” is a great phrase.
Ed still loves the Cacti :)
Good idea to use JMX to modify the (Cassandra) cache settings for a single node to compare against the others.
Pay attention to how the cache hit rate varies with regard to the cache size, at some point may be better to accept a (say) 90% hit rate and give more memory to another CF.
Tuning IO performance for peak and off-peak by modifying nodetool setcompactionlimit to improve compaction performance.
Running night time major compactions like Urban Airship. I wonder how leveled compaction would work with the mixed workload?

By Daniel Abadi

A discussion about latency and consistency.
“there’s no way to perform consistent replication across database replicas without some level of synchronous network communication.”
Not sure I agree that in Dynamo / Cassandra (not sure about Riak) “updates generally go to the same node, and are then propagated synchronously to W other nodes (case (2)(c))”.
I think how inconsistencies are handled during read requests is another source of latency.

By Nathan Milford

14 nodes in two DC’s
In productions since Cassandra version 0.4, awesome!
A good list of things to monitor in a cluster.
Good best practices for shutting down a node that give the fastest startup, also see.

By Dan Pritchett

By B. Todd Burruss

By Matthew Dennis

I always check out Matthew’s data model presentations to see what the best practices are.
“Usually better to keep a record that something happened as opposed to changing a value”.
Good advice on time series and the XACT_LOG.

By Walt Jones

By Sam Overton

cassandra