tlp-stress is a workload-centric stress tool, written in Kotlin. Workloads are easy to write and because they are written in code, you have the ultimate flexibility. Workloads can be tweaked via command line parameters to make them fit your environment more closely.
One of the goals of tlp-stress is to provide enough pre-designed workloads out of the box so it’s unnecessary to code up a workload for most use cases. For instance, it’s very common to have a key value workload, and want to test that. tlp-stress allows you to customize a pre-configured key-value workload, using simple parameters to modify the workload to fit your needs. Several workloads are included, such as:
-
Time Series
-
Key / Value
-
Materialized Views
-
Collections (maps)
-
Counters
The tool is flexible enough to design workloads which leverage multiple (thousands) of tables, hitting them as needed. Statistics are automatically captured by the Dropwizard metrics library.
Quickstart Example
The goal of this project is to be testing common workloads (time series, key value) against a Cassandra cluster in 15 minutes or less.
Installation
Installing a Package
The easiest way to get started is to use your favorite package manager.
The current version is 4.0.0.
Deb Packages
$ echo "deb https://dl.bintray.com/thelastpickle/tlp-tools-deb weezy main" | sudo tee -a /etc/apt/sources.list
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 2895100917357435
$ sudo apt update
$ sudo apt install tlp-stress
RPM Packages
You’ll need the bintray repo set up on your machine. Create this /etc/yum.repos.d/tlp-tools.repo
:
[bintraybintray-thelastpickle-tlp-tools-rpm]
name=bintray-thelastpickle-tlp-tools-rpm
baseurl=https://dl.bintray.com/thelastpickle/tlp-tools-rpm
gpgcheck=0
repo_gpgcheck=0
enabled=1
Then run the following to install:
$ yum install tlp-stress
Further information can be found on the BinTray website.
Tarball Install
If you’re using mac, for now you’ll need to grab our tarball using:
$ curl -L -O "https://dl.bintray.com/thelastpickle/tlp-tools-tarball/tlp-stress-4.0.0.tar
$ tar -xzf tlp-stress-4.0.0.tar
Building / Using the Stress Tool from Source
This is advisable only if you’re comfortable debugging the bash scripts, gradle, and Kotlin yourself and want to be either on the bleeding edge or add a feature. If not, that’s OK, we recommend using one of the above packages instead.
First you’ll need to clone and build the repo. You can grab the source here and build via the included gradle script:
$ git clone https://github.com/thelastpickle/tlp-stress.git
$ cd tlp-stress
$ ./gradlew shadowJar
You can now run the stress tool via the bin/tlp-stress
script. This is not the same script you’ll be running if you’ve installed from a package or the tarball.
Run Your First Stress Workload
Assuming you have either a CCM cluster or are running a single node locally, you can run this quickstart.
Either add the bin
directory to your PATH or from within tlp-stress
run the following command to execute 10,000 queries:
$ bin/tlp-stress run KeyValue -n 10000
You’ll see the output of the keyspaces and tables that are created as well as some statistical information regarding the workload:
Creating tlp_stress:
CREATE KEYSPACE
IF NOT EXISTS tlp_stress
WITH replication = {'class': 'SimpleStrategy', 'replication_factor':3 }
Creating schema
Executing 10000 operations with consistency level LOCAL_ONE
Connected
Creating Tables
CREATE TABLE IF NOT EXISTS keyvalue (
key text PRIMARY KEY,
value text
) WITH caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND default_time_to_live = 0
Preparing queries
Initializing metrics
Connecting
Creating generator random
Preparing statements.
1 threads prepared.
Starting main runner
Running
[Thread 0]: Running the profile for 10000 iterations...
Writes Reads Deletes Errors
Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | Count 1min (errors/s)
5101 56.02 0 | 4899 28.78 0 | 0 0 0 | 0 0
5101 56.02 0 | 4899 28.78 0 | 0 0 0 | 0 0
Stress complete, 1.
If you’ve made it this far, congrats! You’ve run your first workload.
Usage
You’ll probably want to do a bit more than simply run a few thousand queries against a KeyValue table with default settings. The nice part about tlp-stress is that it not only comes with a variety of workloads that you can run to test your cluster, but that it allows you to change many of the parameters. In the quickstart example we used the -n
flag to change the total number of operations tlp-stress
will execute against the database. There are many more options available, this section will cover some of them.
General Help
tlp-stress
will display the help if the tlp-stress
command is run without any arguments or if the --help
flag is passed:
$ bin/tlp-stress
Usage: tlp-stress [options] [command] [command options]
Options:
--help, -h
Shows this help.
Default: false
Commands:
run Run a tlp-stress profile
Usage: run [options]
Options:
--cl
Consistency level for reads/writes (Defaults to LOCAL_ONE).
Default: LOCAL_ONE
Possible Values: [ANY, ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, LOCAL_ONE]
--compaction
Compaction option to use. Double quotes will auto convert to
single for convenience. A shorthand is also available: stcs, lcs,
twcs. See the full documentation for all possibilities.
Default: <empty string>
--compression
Compression options
Default: <empty string>
--concurrency, -c
Concurrent queries allowed. Increase for larger clusters.
Default: 100
--coordinatoronly, --co
Coordinator only made. This will cause tlp-stress to round robin
between nodes without tokens. Requires using -Djoin_ring=false in
cassandra-env.sh. When using this option you must only provide a
coordinator to --host.
Default: false
--cql
Additional CQL to run after the schema is created. Use for DDL
modifications such as creating indexes.
Default: []
--csv
Write metrics to this file in CSV format.
Default: <empty string>
--dc
The data center to which requests should be sent
Default: <empty string>
--deleterate, --deletes
Deletion Rate, 0-1. Workloads may have their own defaults.
Default is dependent on workload.
--drop
Drop the keyspace before starting.
Default: false
--duration, -d
Duration of the stress test. Expressed in format 1d 3h 15m
Default: 0
--field.
Override a field's data generator. Example usage:
--field.tablename.fieldname='book(100,200)'
Syntax: --field.key=value
Default: {}
-h, --help
Show this help
--host
Default: 127.0.0.1
--id
Identifier for this run, will be used in partition keys. Make
unique for when starting concurrent runners.
Default: 001
--iterations, -i, -n
Number of operations to run.
Default: 0
--keycache
Key cache setting
Default: ALL
--keyspace
Keyspace to use
Default: tlp_stress
--max-requests
Sets the max requests per connection
Default: 32768
--core-connections
Sets the number of core connections per host
Default:4
--max-connections
Sets the number of max connections per host
Default:8
--no-schema
Skips schema creation
Default: false
--paging
Override the driver's default page size.
--partitiongenerator, --pg
Method of generating partition keys. Supports random, normal
(gaussian), and sequence.
Default: random
--partitions, -p
Max value of integer component of first partition key.
Default: 1000000
--password, -P
Default: cassandra
--populate
Pre-population the DB with N rows before starting load test.
Default: 0
--port
Override the cql port. Defaults to 9042.
Default: 9042
--prometheusport
Override the default prometheus port.
Default: 9500
--rate
Rate limiter, accepts human numbers. 0 = disabled
Default: 0
--readrate, --reads, -r
Read Rate, 0-1. Workloads may have their own defaults. Default
is dependent on workload.
--replication
Replication options
Default: {'class': 'SimpleStrategy', 'replication_factor':3 }
--rowcache
Row cache setting
Default: NONE
--ssl
Enable SSL
Default: false
--threads, -t
Threads to run
Default: 1
--ttl
Table level TTL, 0 to disable.
Default: 0
--username, -U
Default: cassandra
--workload., -w.
Override workload specific parameters.
Syntax: --workload.key=value
Default: {}
info Get details of a specific workload.
Usage: info
list List all workloads.
Usage: list
fields null
Usage: fields
Listing All Workloads
$ bin/tlp-stress list
Available Workloads:
AllowFiltering
BasicTimeSeries
CountersWide
KeyValue
LWT
Locking
Maps
MaterializedViews
RandomPartitionAccess
Sets
UdtTimeSeries
You can run any of these workloads by running tlp-cluster run WORKLOAD.
Getting information about a workload
It’s possible to get (some) information about a workload by using the info command. This area is a bit lacking at the moment. It currently only provides the schema and default read rate.
$ bin/tlp-stress info KeyValue
CREATE TABLE IF NOT EXISTS keyvalue (
key text PRIMARY KEY,
value text
)
Default read rate: 0.5 (override with -r)
No dynamic workload parameters.
Running a Customized Stress Workload
Whenever possible we try to use human friendly numbers. Typing out -n 1000000000
is error prone and hard to read, -n 1B
is much easier.
Suffix | Implication | Example | Equivalent |
---|---|---|---|
k |
Thousand |
1k |
1,000 |
m |
Million |
1m |
1,000,000 |
b |
Billion |
1b |
1,000,000,000 |
Running a stress test for a given duration instead of a number of operations
You might need to run a stress test for a given duration instead of providing a number of operations, especially in case of multithreaded stress runs. This is done by providing the duration in a human readable format with the -d
argument. The minimum duration is 1 minute.
For example, running a test for 1 hours and 30 minutes will be done as follows:
$ tlp-stress run KeyValue -d "1h 30m"
To run a test for 1 days, 3 hours and 15 minutes (why not?), run tlp-stress as follows:
$ tlp-stress run KeyValue -d "1d 3h 15m"
Partition Keys
A very useful feature is controlling how many partitions are read and written to for a given stress test. Doing a billion operations across a billion partitions is going to have a much different performance profile than writing to one hundred partitions, especially when mixed with different compaction settings. Using -p
we can control how many partition keys a stress test will leverage. The keys are randomly chosen at the moment.
Read Rate
It’s possible to specify the read rate of a test as a double. For example, if you want to use 1% reads, you’d specify -r .01
. The sum of the read rate and delete rate must be less than or equal to 1.0.
Delete Rate
It’s possible to specify the delete rate of a test as a double. For example, if you want to use 1% deletes, you’d specify --deleterate .01
. The sum of the read rate and delete rate must be less than or equal to 1.0.
Compaction
It’s possible to change the compaction strategy used with the --compaction flag
. At the moment this changes the compaction strategy of every table in the test. This will be addressed in the future to be more flexible.
The --compaction
flag can accept a raw string along these lines:
--compaction "{'class':'LeveledCompactionStrategy'}"
Alternatively, a shortcut format exists as of version 2.0:
--compaction lcs
The following shorthand formats are available:
Syntax |
Expansion |
stcs |
|
stcs,4,32 |
|
lcs |
|
lcs,160 |
|
lcs,160,10 |
|
twcs |
|
twcs,1,days |
|
Compression
It’s possible to change the compression options used. At the moment this changes the compression options of every table in the test. This will be addressed in the future to be more flexible.
Customizing Fields
To some extent, workloads can be customized by leveraging the --fields
flag. For instance, if we look at the KeyValue workload, we have a table
called keyvalue
which has a value
field.
To customize the data we use for this field, we provide a generator at
the command line. By default, the value
field will use 100-200
characters of random text. What if we’re storing blobs of text instead?
Ideally we’d like to tweak this workload to be closer to our production
use case. Let’s use random sections from various books:
$ tlp-stress run KeyValue --field.keyvalue.value='book(20,40)`
Instead of using random strings of garbage, the KeyValue workload will now use 20-40 words extracted from books.
There are other generators available, such as names, gaussian numbers, and cities. Not every generator applies to every type. It’s up to the workload to specify which fields can be used this way.
Logging
tlp-stress
uses the Log4J 2 logging framework.
You can find the default log4j config in conf
. This should be suitable for most use cases.
To use your own logging configuration, simply set the shell variable TLP_STRESS_LOG4J
to the path of the new logging configuration before running tlp-stress
to point to the config file of your choice.
For more information on how to configure Log4J 2 please see the configuration documentation.
Debian Package
The Debian package installs a basic configuration file to /etc/tlp-stress/log4j2.yaml
.
Exporting Metrics
tlp-stress automatically runs an HTTP server exporting metrics in Prometheus format on port 9501.
Workload Restrictions
The BasicTimeSeries
workload only supports Cassandra versions 3.0 and above. This is because range deletes are used by this workload during runtime. Range deletes are only support in Cassandra versions 3.0. An exception will is thrown if this workload is used and a Cassandra version less than 3.0 is detected during runtime.
Developer Docs
Building the documentation
First generate the command examples with the following shell script:
$ manual/generate_examples.sh
There’s a docker service to build the HTML manual:
$ docker-compose up docs
Writing a Custom Workload
tlp-stress
is a work in progress. Writing a stress workload isn’t documented yet as it is still changing.