It’s been a while since we introduced our tlp-stress tool, and we wanted to give a status update to showcase the improvements we’ve made since our first blog post in October. In this post we’ll show you what we’ve been working on these last 8 months.
Duration based test runs
One of the first things we realized was tlp-stress needed a way of specifying the length of time, rather than for a number of operations. We added a
-dflag which can be combined with human readable shorthand, such as
4hfor four hours or
3dfor three days. The feature allows the user to specify any combination of days (d), hours (h), minutes (m), and second (s). This has been extremely helpful when running tests over several days to evaluate the performance during development of Cassandra 4.0.
tlp-stress run KeyValue -d 3d
Dynamic Workload Parameters
When writing a workload, it became very apparent that we would have to either create thousands of different workloads or make workload parameters customizable to the user. We opted for the latter. Workloads can now annotate their variables to be overridden at runtime, allowing for extremely flexible operations.
For example, one of our workloads,
BasicTimeSeries, has a
SELECTthat looks like this:
getPartitionHead = session.prepare("SELECT * from sensor_data_udt WHERE sensor_id = ? LIMIT ?")
The limit clause above was hard coded initially:
var limit = 500
Changing it required recompiling tlp-stress. This was clearly not fun. We’ve added an annotation that can be applied to variables in the stress workload:
@WorkloadParameter("Number of rows to fetch back on SELECT queries") var limit = 500
Once this annotation is applied, the field can be overridden at run time via the
--workloadcommand line parameter:
$ tlp-stress run BasicTimeSeries --workload.limit=1000
You can find out what workload parameters are available by running tlp-stress info with the workload name:
$ tlp-stress info BasicTimeSeries CREATE TABLE IF NOT EXISTS sensor_data ( sensor_id text, timestamp timeuuid, data text, primary key(sensor_id, timestamp)) WITH CLUSTERING ORDER BY (timestamp DESC) Default read rate: 0.01 (override with -r) Dynamic workload parameters (override with --workload.name=X) Name | Description | Type limit | Number of rows to fetch back on SELECT queries | kotlin.Int
When tlp-stress finally matured to be usable for testing Cassandra 4.0, we immediately realized that the biggest problem with long running tests was if they were constantly inserting new data, we’d eventually run out of disk. This happens sooner rather than later - tlp-stress can easily saturate a 1GB/s network card. We needed an option to add a TTL to the table schema. Providing a
--ttlwith a time in seconds will add a
default_time_to_liveoption to the tables created. This is to limit the amount of data that will live in the cluster and allow for long running tests.
Row and Key Cache Support
On many workloads we found disabling key cache can improve performance by up to 20%. Being able to dynamically configure this to test different workloads has been extremely informative. The nuance here is that the data model matters a lot. A key value table with a 100 byte text field has a much different performance profile than a time series table spanning 12 SSTables. Figuring out the specific circumstances where these caches can be beneficial (or harmful) is necessary to correctly optimize your cluster.
You can configure row cache or key cache by doing the following when running tlp-stress:
$ tlp-stress run KeyValue --rowcache 'ALL' --keycache 'NONE'
We’ll be sure to follow up with a detailed post on this topic to show the results of our testing of these caches.
Populate Data Before Test
It’s not useful to run a read heavy workload without any data, so we added a
--populateflag that can load up the cluster with data before starting a measured workload. Consider a case where we want to examine heap allocations when running a read dominated workload across ten thousand partitions:
$ tlp-stress run KeyValue --populate 10k -p 10k -r .99 -d 1d
The above will run the KeyValue workload, performing 99% reads for one day, but only after inserting the initial ten thousand rows.
Workloads can define their own custom logic for prepopulating fields or allow tlp-stress to do it on its own.
Coordinator Only Mode
This allows us to test a lesser known and used feature known as “coordinator nodes”. Eric Lubow described how using dedicated nodes as coordinators could improve performance at the 2016 Cassandra summit (slides here). We thought it would be useful to be able to test this type of workload. Passing the
--coordinatoronlyflag allows us to test this scenario.
We will post a follow up with detailed information on how and why this works.
Pass Extra CQL
We know sometimes it would be helpful to test a specific feature in the context of an existing workload. For example, we might want to test how a Materialized View or a SASI index works with a time series workload. If we were to create an index on our keyvalue table we might want to do something like this:
--cql "CREATE CUSTOM INDEX fn_prefix ON keyvalue(value) USING 'org.apache.cassandra.index.sasi.SASIIndex';
For now, we won’t be able to query the table using this workload. We’re only able to see the impact of index maintenance, but we’re hoping to improve this in the future.
Random Access Workload
As you can see above, the new workload parameters have already been extremely useful, allowing you customize workloads even further than before. This will allow us to model a new pattern and data model: the random access workload.
Not every Cassandra use case is time series. In fact, we see a significant number of non time series workloads. A friends list is a good example of this pattern. We may want to select a single friend out of a friends list, or read the entire partition out at once. This new workload allows us to do either.
We’ll check the workload to see what workload specific parameters exist:
$ tlp-stress info RandomPartitionAccess CREATE TABLE IF NOT EXISTS random_access ( partition_id text, row_id int, value text, primary key (partition_id, row_id) ) Default read rate: 0.01 (override with -r) Dynamic workload parameters (override with --workload.name=X) Name | Description | Type rows | Number of rows per partition, defaults to 100 | kotlin.Int select | Select random row or the entire partition. Acceptable values: row, partition | kotlin.String
We can override the number of rows per partition as well as how we’ll read the data. We can either select the entire row (a fast query) or the entire partition, which gets slower and more memory hungry as the partition gets bigger. If we were to consider a case where the users in our system have 500 friends, and we want to test the performance of selecting the entire friends list, we can do something like this:
$ tlp-stress run RandomPartitionAccess --workload.rows=500 --workload.select=partition
We could, of course, run a second workload that queries individual rows if we wanted to, throttling it, having it be read heavy, or whatever we needed that will most closely simulate our production environment.
LWT Locking Workload
This workload is a mix of a pattern and a feature - using LWT for locking. We did some work researching LWT performance for a customer who was already making heavy use of them. The goal was to identify the root cause of performance issues on clusters using a lot of lightweight transactions. We were able to find some very interesting results which lead to the creation of CASSANDRA-15080.
We will dive into the details of tuning a cluster for LWTs in a separate post, there’s a lot going on there.
Sometimes it’s easiest to run basic reports across CSV data rather than using reporting dashboards, especially if there’s only a single stress server. One stress server can push a 9 node cluster pretty hard so for small tests, you’ll usually only use a single instance. CSV is easily consumed from Python with Pandas or gnuplot, so we added it in as an option. Use the
--csvoption to log all results to a text file in CSV format which you can save an analyze later.
Prometheus has quickly become the new standard in metrics collection, and we wanted to make sure we could aggregate and graph statistics from multiple stress instances.
We expose an HTTP endpoint on port 9500 to let prometheus scrape metrics.
Special Bonus: We have a mailing list!
Last, but not least, we’ve set up a mailing list on Google Groups for Q&A as well as development discussion. We’re eager to hear your questions and feedback, so please join and help us build a better tool and community.
This list is also used to discuss another tool we’ve been working on, tlp-cluster, which we’ll cover in a follow up blog post.
We’re committed to improving the tlp-stress tooling and are looking forward to our first official release of the software. We’ve got a bit of cleanup and documentation to write. We don’t expect the architecture or functionality to significantly change before then.