The k8ssandra team is happy to announce the release of Medusa for Apache Cassandra™ v0.16. This is a special release as we did a major overhaul of the storage classes implementation. We now have less code (and less dependencies) while providing much faster and resilient storage communications.
Back to ~basics~ official SDKs
Medusa has been an open source project for about 4 years now, and a private one for a few more. Over such a long time, other software it depends upon (or doesn’t) evolves as well. More specifically, the SDKs of the major cloud providers evolved a lot. We decided to check if we could replace a lot of custom code doing asynchronous parallelisation and calls to the cloud storage CLI utilities with the official SDKs.
Our storage classes so far relied on two different ways of interacting with the object storage backends:
- Apache Libcloud, which provided a Python API for abstracting ourselves from the different protocols. It was convenient and fast for uploading a lot of small files, but very inefficient for large transfers.
- Specific cloud vendors CLIs, which were much more efficient with large file transfers, but invoked through subprocesses. This created an overhead that made them inefficient for small file transfers. Relying on subprocesses also created a much more brittle implementation which led the community to create a lot of issues we’ve been struggling to fix.
To cut a long story short, we did it!
- We started by looking at S3, where we went for the official boto3. As it turns out, boto3 does all the chunking, throttling, retries and parallelisation for us. Yay!
- Next we looked at GCP. Here we went with TalkIQ’s gcloud-aio-storage. It works very well for everything, including the large files. The only thing missing is the throughput throttling.
- Finally, we used Azure’s official SDK to cover Azure compatibility. Sadly, this still works without throttling as well.
Right after finishing these replacements, we spotted the following improvements:
- The integration tests duration against the storage backends dropped from ~45 min to ~15 min.
- This means Medusa became far more efficient.
- There is now much less time spent managing storage interaction thanks to it being asynchronous to the core.
- The Medusa uncompressed image size we bundle into k8ssandra dropped from ~2GB to ~600MB and its build time went from 2 hours to about 15 minutes.
- Aside from giving us much faster feedback loops when working on k8ssandra, this should help k8ssandra itself move a little bit faster.
- The file transfers are now much faster.
- We observed up to several hundreds of MB/s per node when moving data from a VM to blob storage within the same provider. The available network speed is the limit now.
- We are also aware that consuming the whole network throughput is not great. That’s why we now have proper throttling for S3 and are working on a solution for this in other backends too.
The only compromise we had to make was to drop Python 3.6 support. This is because the Pythons asyncio features only come in Python 3.7.
The other good stuff
Even though we are the happiest about the storage backends, there is a number of changes that should not go without mention:
- We fixed a bug with hierarchical storage containers in Azure. This flavor of blob storage works more like a regular file system, meaning it has a concept of directories. None of the other backends do this (including the vanilla Azure ones), and Medusa was not dealing gracefully with this.
- We are now able to build Medusa images for multiple architectures, including the arm64 one.
- Medusa can now purge backups of nodes that have been decommissioned, meaning they are no longer present in the most recent backups. Use the new
medusa purge-decommissionedcommand to trigger such a purge.
We encourage all Medusa users to upgrade to version 0.16 to benefit from all these storage improvements, making it much faster and reliable.
Medusa v0.16 is the default version in the newly released k8ssandra-operator v1.9.0, and it can be used with previous releases by setting the
.spec.medusa.containerImage.tag field in your K8ssandraCluster manifests.