Cassandra 2.0 Meetup Recap
Here in ATX (Austin, Texas) we have the distinct pleasure of having the majority of DataStax’s engineering team in-town. This includes Apache Cassandra project chair Jonathan Ellis, who last night gave a presentation at our monthly Cassandra meetup entitled: “You got your transactions in my NoSQL - An Introduction to Cassandra 2.0.” For those of you not based in Austin I’ve uploaded a video of the presentation to You Tube.
Though it contained a bit more re-hash of previous Cassandra 1.2 presentations than I would have liked (but was probably suitable for the audience - most of whom does not do this every day), Jonathan did spend a good bit of time on the details of what will make the 2.0 release special.
Most important among these - as the provocative title suggests - is “transactions.” For some background, and a little light reading on the theory of distributed transactions, take a look at CASSANDRA-5062 (or at least know it’s there as it’s one of the longer issues i’ve encountered on ASF’s Jira instance). Long story short, we now have a new ConsistencyLevel.SERIAL
which will trigger a paxos-style (somewhat - really, read the issue above if you want details) transaction.
Also coming, and much needed from my perspective, are a first cut at triggers. Again, since this is just a summary post, i’ll kindly direct you to CASSANDRA-1311 for details. But the short take is that you can now grab a row operation in mid-flight and return a collection of RowMutation objects which will be processed by the storage engine.
I will say that this is a double-edged sword and should only be for the brave. Here’s why: this is the first externally exposed API to Cassandra internals. It’s no mistake that the documentation, when it comes out on DataStax’s site, will say “experimental” on it. These internal APIs have a tendency to move around a lot - necessarily so most times, but just know what you are getting into if triggers sound appealing to you.
Moving on, 2.0 will also see CQL cursors and something called “eager retries.” The former is a pretty direct analog to the RDBMS concept of cursors and solves a major PITA from the perspective of folks doing lots of client paging operations. The latter is more complicated. For brevity, I’ll again direct the reader to the issue: CASSANDRA-4705. A quick synopsis is simply that you now have an additional tuning knob on read repair that will be really useful on some cases where load can be highly transient (my initial take as well, I still haven’t played with this yet - see the last couple of comments on the issue above if you want to set up some experiments).
Finally, there are some improvements with the compaction process and a number of general performance improvements which should help everyone. Again, I’m still coming up to speed myself, so hopefully I’ll have more out on 2.0 in the near future. As always, keep you eye on Planet Cassandra and if you really can’t wait, see the changelog on the casssandra-2.0 branch.