Cassandra Reading 02-01-2012

Recent Cassandra reading.

CQL: SQL In Cassandra

And a blog post

By Eric Evans

  • Good examples of simple statements.
  • Performance comparisons show CQL has 5% to 10% lower throughput and higher latency.
  • List of drivers and where to get them.
  • More improvements coming.

Storm Cassandra Integration

By P. Taylor Goetz

  • A storm bolt to persist data to Cassandra.

Cassandra in Online Advertising: Real Time Bidding

By Edward Capriolo

  • 10TB of data and growing.
  • Latency requirements of less than 120ms.
  • “Distributed not Duplicated” is a great phrase.
  • Ed still loves the Cacti :)
  • Good idea to use JMX to modify the (Cassandra) cache settings for a single node to compare against the others.
  • Pay attention to how the cache hit rate varies with regard to the cache size, at some point may be better to accept a (say) 90% hit rate and give more memory to another CF.
  • Tuning IO performance for peak and off-peak by modifying nodetool setcompactionlimit to improve compaction performance.
  • Running night time major compactions like Urban Airship. I wonder how leveled compaction would work with the mixed workload?

Replication and the latency-consistency tradeoff

By Daniel Abadi

  • A discussion about latency and consistency.
  • “there’s no way to perform consistent replication across database replicas without some level of synchronous network communication.”
  • Not sure I agree that in Dynamo / Cassandra (not sure about Riak) “updates generally go to the same node, and are then propagated synchronously to W other nodes (case (2)(c))”.
  • I think how inconsistencies are handled during read requests is another source of latency.

Cassandra for sys admins

By Nathan Milford

  • 14 nodes in two DC’s
  • In productions since Cassandra version 0.4, awesome!
  • A good list of things to monitor in a cluster.
  • Good best practices for shutting down a node that give the fastest startup, also see.

Cassandra for LOBS

By Dan Pritchett

  • Expensive SAN storage bombshell.

Expedia Hotel Price Cache

By B. Todd Burruss

  • Pre-calculate, trading space for time.
  • A rolling window of 2.8 billion data points.
  • Test, measure, tune.

Data Modeling Examples

By Matthew Dennis

  • I always check out Matthew’s data model presentations to see what the best practices are.
  • “Usually better to keep a record that something happened as opposed to changing a value”.
  • Good advice on time series and the XACT_LOG.

Cassandra In Production: Things We Learned

By Walt Jones

  • Using Cassandra 0.7, there are a lot of improvements in Cassandra 1.0.
  • 12 AWS EC2 m1.xlarge nodes with 5TB of data.
  • S3 archive
  • The memory footprint issues has been eliminated in Cassandra 1.0.
  • Please avoid using Super Columns.
  • Please use the Random Partitioner.

Data Modeling with Cassandra

By Sam Overton

  • De-normalize for a brighter future.
  • No SQL “Hello World” twitter example.
cassandra