The Fine Print When Using Multiple Data Directories
One of the longest lived features in Cassandra is the ability to allow a node to store data on more than one than one directory or disk. This feature can help increase cluster capacity or prevent a node from running out space if bootstrapping a new one will take too long to complete. Recently I was working on a cluster and saw how this feature has the potential to silently cause problems in a cluster. In this post we will go through some fine print when configuring Cassandra to use multiple disks.
Jay… what?
The feature which allows Cassandra to store data on multiple disks is commonly referred to as JBOD [pronounced jay-bod] which stands for “Just a Bunch Of Disks/Drives”. In Cassandra this feature is controlled by the data_file_directories
setting in the cassandra.yaml file. In relation to this setting, Cassandra also allows its behaviour on disk failure to be controlled using the disk_failure_policy
setting. For now I will leave the details of the setting alone, so we can focus exclusively on the data_file_directories
setting.
Simple drives, simple pleasures
The data_file_directories
feature is fairly straight forward in that it allows Cassandra to use multiple directories to store data. To use it just specify the list of directories you want Cassandra to use for data storage. For example.
data_file_directories:
- /var/lib/cassandra/data
- /var/lib/cassandra/data-extra
The feature has been around from day one of Cassandra’s life and the way in which Cassandra uses multiple directories has mostly stayed the same. There are no special restrictions to the directories, they can be on the same volume/disk or a different volume/disk. As far as Cassandra is concerned, the paths specified in the setting are just the directories it has available to read and write data.
At a high level, the way the feature works is Cassandra tries to evenly split data into each of the directories specified in the data_file_directories
setting. No two directories will ever have an identical SSTable file name in them. Below is an example of what you could expect to see if you inspected each data directory when using this feature. In this example the node is configured to use two directories: …/data0/ and …/data1/
$ ls .../data0/music/playlists-3b90f8a0a50b11e881a5ad31ff0de720/
backups mc-5-big-Digest.crc32 mc-5-big-Statistics.db
mc-5-big-CompressionInfo.db mc-5-big-Filter.db mc-5-big-Summary.db
mc-5-big-Data.db mc-5-big-Index.db mc-5-big-TOC.txt
$ ls .../data1/music/playlists-3b90f8a0a50b11e881a5ad31ff0de720/
backups mc-6-big-Digest.crc32 mc-6-big-Statistics.db
mc-6-big-CompressionInfo.db mc-6-big-Filter.db mc-6-big-Summary.db
mc-6-big-Data.db mc-6-big-Index.db mc-6-big-TOC.txt
Data resurrection
One notable change which modified how Cassandra uses the data_file_directories
setting was CASSANDRA-6696. The change was implemented in Cassandra version 3.2. To explain this problem and how it was fixed, consider the case where a node has two data directories A and B. Prior to this change in Cassandra, you could have a node that had data for a specific token in one SSTable that was on disk A. The node could also have a tombstone associated with that token in another SSTable on disk B. If the gc_grace_seconds
passed, and no compactions were processed to reclaim the data tombstone there would be an issue if disk B failed. In this case if disk B did fail, the tombstone is lost and the data on disk A is still present! Running a repair in this case would resurrect the data by propagating it to other replicas! To fix this issue, CASSANDRA-6696 changed Cassandra so that a token range was always stored on a single disk.
This change did make Cassandra more robust when using the data_file_directories
setting, however this change was no silver bullet and caution still needs to be taken when it is used. Most notably, consider the case where each data directory is mounted to a dedicated disk and the cluster schema results in wide partitions. In this scenario one of the disks could easily reach its maximum capacity due to the wide partitions while the other disk still has plenty of storage capacity.
How to lose a volume and influence a node
For a node running Cassandra version less than 3.2 and using the data_file_directories
setting there are a number vulnerabilities to watch out for. If each data directory is mounted to a dedicated disk, and one of the disk dies or the mount disappears then this can silently cause problems. To explain this problem, consider the case where we installed and the data is located in /var/lib/cassandra/data. Say we want to add another directory to store data in only this time the data will be on another volume. It makes sense to have the data directories in the same location, so we create the directory /var/lib/cassandra/data-extra. We then mount our volume so that /var/lib/cassandra/data-extra points to it. If the disk backing /var/lib/cassandra/data-extra died or we forgot to put the mount information in fstab and lose the mount on a restart, then we will effectively lose system table data. Cassandra will start because the directory /var/lib/cassandra/data-extra exists however it will be empty.
Similarly, I have seen cases where a directory was manually added to a node that was managed by chef. In this case the node was running out disk space and there was no time to wait for new node to bootstrap. To avoid a node going down an additional volume was attached, mounted, and the data_file_directories
setting in the cassandra.yaml modified to include the new data directory. Some time later chef was executed on the node to deploy an update, and as a result it reset cassandra.yaml configuration. Resetting the cassandra.yaml cleared the additional data directory that was listed under data_file_directories
setting. When the node was restarted, the Cassandra process never knew that there was another data directory it had to read from.
Either of these cases can lead to more problems in the cluster. Remember how earlier I mentioned that a complete SSTable file will be always stored in a single data directory when using the data_file_directories
setting? This behaviour applies to all data stored by Cassandra including its system data! So that means, in the above two scenarios Cassandra could potentially lose system table data. This is a problem because the system data table stores information about what data the node owns, what the schema is, and whether the node has bootstrapped. If the system table is lost and the node restarted the node will think it is a new node, take on a new identity and new token ranges. This results in a token range movement in the cluster. We have covered this topic in more detail in our auto bootstrapping blog post. This problem gets worse when a seed node loses its system table and comes back as a new node. This is because seed nodes never stream data and if a cleanup is run cluster wide, data is then lost.
Testing the theory
We can test these above scenarios for different versions of Cassandra using ccm
. I created the following script to setup a three node cluster in ccm
with each node configured to be a seed node and use two data directories. We use seed nodes to show the worst case scenario that can occur when a node using multiple data directories loses one of the directories.
#!/bin/bash
# This script creates a three node CCM cluster to demo the data_file_directories
# feature in different versions of Cassandra.
set -e
CLUSTER_NAME="${1:-TestCluster}"
NUMBER_NODES="3"
CLUSTER_VERSION="${2:-3.0.15}"
echo "Cluster Name: ${CLUSTER_NAME}"
echo "Cluster Version: ${CLUSTER_VERSION}"
echo "Number nodes: ${NUMBER_NODES}"
ccm create ${CLUSTER_NAME} -v ${CLUSTER_VERSION}
# Modifies the configuration of a node in the CCM cluster.
function update_node_config {
CASSANDRA_YAML_SETTINGS="cluster_name:${CLUSTER_NAME} \
num_tokens:32 \
endpoint_snitch:GossipingPropertyFileSnitch \
seeds:127.0.0.1,127.0.0.2,127.0.0.3"
for key_value_setting in ${CASSANDRA_YAML_SETTINGS}
do
setting_key=$(echo ${key_value_setting} | cut -d':' -f1)
setting_val=$(echo ${key_value_setting} | cut -d':' -f2)
sed -ie "s/${setting_key}\:\ .*/${setting_key}:\ ${setting_val}/g" \
~/.ccm/${CLUSTER_NAME}/node${1}/conf/cassandra.yaml
done
# Create and configure the additional data directory
extra_data_dir="/Users/anthony/.ccm/${CLUSTER_NAME}/node${1}/data1"
sed -ie '/data_file_directories:/a\'$'\n'"- ${extra_data_dir}
" ~/.ccm/${CLUSTER_NAME}/node${1}/conf/cassandra.yaml
mkdir ${extra_data_dir}
sed -ie "s/dc=.*/dc=datacenter1/g" \
~/.ccm/${CLUSTER_NAME}/node${1}/conf/cassandra-rackdc.properties
sed -ie "s/rack=.*/rack=rack${1}/g" \
~/.ccm/${CLUSTER_NAME}/node${1}/conf/cassandra-rackdc.properties
# Tune Cassandra memory usage so we can run multiple nodes on the one machine
sed -ie 's/\#MAX_HEAP_SIZE=\"4G\"/MAX_HEAP_SIZE=\"500M\"/g' \
~/.ccm/${CLUSTER_NAME}/node${1}/conf/cassandra-env.sh
sed -ie 's/\#HEAP_NEWSIZE=\"800M\"/HEAP_NEWSIZE=\"120M\"/g' \
~/.ccm/${CLUSTER_NAME}/node${1}/conf/cassandra-env.sh
# Allow remote access to JMX without authentication. This is for
# demo purposes only - Never do this in production
sed -ie 's/LOCAL_JMX=yes/LOCAL_JMX=no/g' \
~/.ccm/${CLUSTER_NAME}/node${1}/conf/cassandra-env.sh
sed -ie 's/com\.sun\.management\.jmxremote\.authenticate=true/com.sun.management.jmxremote.authenticate=false/g' \
~/.ccm/${CLUSTER_NAME}/node${1}/conf/cassandra-env.sh
}
for node_num in $(seq ${NUMBER_NODES})
do
echo "Adding 'node${node_num}'"
ccm add node${node_num} \
-i 127.0.0.${node_num} \
-j 7${node_num}00 \
-r 0 \
-b
update_node_config ${node_num}
# Set localhost aliases - Mac only
echo "ifconfig lo0 alias 127.0.0.${node_num} up"
sudo ifconfig lo0 alias 127.0.0.${node_num} up
done
sed -ie 's/use_vnodes\:\ false/use_vnodes:\ true/g' \
~/.ccm/${CLUSTER_NAME}/cluster.conf
I first tested Cassandra version 2.1.20 using the following process.
Run the script and check the nodes were created.
$ ccm status
Cluster: 'mutli-dir-test'
-------------------------
node1: DOWN (Not initialized)
node3: DOWN (Not initialized)
node2: DOWN (Not initialized)
Start the cluster.
$ for i in $(seq 1 3); do echo "Starting node${i}"; ccm node${i} start; sleep 10; done
Starting node1
Starting node2
Starting node3
Check the cluster is up and note the Host IDs.
$ ccm node1 nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 47.3 KB 32 73.5% 4682088e-4a3c-4fbc-8874-054408121f0a rack1
UN 127.0.0.2 80.35 KB 32 71.7% b2411268-f168-485d-9abe-77874eef81ce rack2
UN 127.0.0.3 64.33 KB 32 54.8% 8b55a1c6-f971-4e01-a34b-bb37dd55bb89 rack3
Insert some test data into the cluster.
$ ccm node1 cqlsh
Connected to TLP-578-2120 at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.20 | CQL spec 3.2.1 | Native protocol v3]
Use HELP for help.
cqlsh> CREATE KEYSPACE music WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };
cqlsh> CREATE TABLE music.playlists (
... id uuid,
... song_order int,
... song_id uuid,
... title text,
... artist text,
... PRIMARY KEY (id, song_id));
cqlsh> INSERT INTO music.playlists (id, song_order, song_id, artist, title)
... VALUES (62c36092-82a1-3a00-93d1-46196ee77204, 1,
... a3e64f8f-bd44-4f28-b8d9-6938726e34d4, 'Of Monsters and Men', 'Little Talks');
cqlsh> INSERT INTO music.playlists (id, song_order, song_id, artist, title)
... VALUES (62c36092-82a1-3a00-93d1-46196ee77205, 2,
... 8a172618-b121-4136-bb10-f665cfc469eb, 'Birds of Tokyo', 'Plans');
cqlsh> INSERT INTO music.playlists (id, song_order, song_id, artist, title)
... VALUES (62c36092-82a1-3a00-93d1-46196ee77206, 3,
... 2b09185b-fb5a-4734-9b56-49077de9edbf, 'Lorde', 'Royals');
cqlsh> exit
Write the data to disk by running nodetool flush
on all the nodes.
$ for i in $(seq 1 3); do echo "Flushing node${i}"; ccm node${i} nodetool flush; done
Flushing node1
Flushing node2
Flushing node3
Check we can retrieve data from each node.
$ for i in $(seq 1 3); do ccm node${i} cqlsh -e "SELECT id, song_order, song_id, artist, title FROM music.playlists"; done
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
Look for a node that has all of the system.local SSTable files in a single directory. In this particular test, there were no SSTable files in data directory data0 of node1.
$ ls .../node1/data0/system/local-7ad54392bcdd35a684174e047860b377/
$
$ ls .../node1/data1/system/local-7ad54392bcdd35a684174e047860b377/
system-local-ka-5-CompressionInfo.db system-local-ka-5-Summary.db system-local-ka-6-Index.db
system-local-ka-5-Data.db system-local-ka-5-TOC.txt system-local-ka-6-Statistics.db
system-local-ka-5-Digest.sha1 system-local-ka-6-CompressionInfo.db system-local-ka-6-Summary.db
system-local-ka-5-Filter.db system-local-ka-6-Data.db system-local-ka-6-TOC.txt
system-local-ka-5-Index.db system-local-ka-6-Digest.sha1
system-local-ka-5-Statistics.db system-local-ka-6-Filter.db
Stop node1 and simulate a disk or volume mount going missing by removing the data1 directory entry from the data_file_directories
setting.
$ ccm node1 stop
Before the change the setting entry was:
data_file_directories:
- .../node1/data0
- .../node1/data1
After the change the setting entry was:
data_file_directories:
- .../node1/data0
Start node1 again and check the logs. From the logs we can see the messages where the node has generated a new Host ID and took ownership of new tokens.
WARN [main] 2018-08-21 12:34:57,111 SystemKeyspace.java:765 - No host ID found, created c62c54bf-0b85-477d-bb06-1f5d696c7fef (Note: This should happen exactly once per node).
INFO [main] 2018-08-21 12:34:57,241 StorageService.java:844 - This node will not auto bootstrap because it is configured to be a seed node.
INFO [main] 2018-08-21 12:34:57,259 StorageService.java:959 - Generated random tokens. tokens are [659824738410799181, 501008491586443415, 4528158823720685640, 3784300856834360518, -5831879079690505989, 8070398544415493492, -2664141538712847743, -303308032601096386, -553368999545619698, 5062218903043253310, -8121235567420561418, 935133894667055035, -4956674896797302124, 5310003984496306717, -1155160853876320906, 3649796447443623633, 5380731976542355863, -3266423073206977005, 8935070979529248350, -4101583270850253496, -7026448307529793184, 1728717941810513773, -1920969318367938065, -8219407330606302354, -795338012034994277, -374574523137341910, 4551450772185963221, -1628731017981278455, -7164926827237876166, -5127513414993962202, -4267906379878550578, -619944134428784565]
Check the cluster status again. From the output we can see that the Host ID for node1 changed from 4682088e-4a3c-4fbc-8874-054408121f0a
to c62c54bf-0b85-477d-bb06-1f5d696c7fef
$ ccm node2 nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 89.87 KB 32 100.0% c62c54bf-0b85-477d-bb06-1f5d696c7fef rack1
UN 127.0.0.2 88.69 KB 32 100.0% b2411268-f168-485d-9abe-77874eef81ce rack2
UN 127.0.0.3 106.66 KB 32 100.0% 8b55a1c6-f971-4e01-a34b-bb37dd55bb89 rack3
Check we can retrieve data from each node again.
for i in $(seq 1 3); do ccm node${i} cqlsh -e "SELECT id, song_order, song_id, artist, title FROM music.playlists"; done
Cassandra 2.1.20 Results
When we run the above test against a cluster using Apache Cassandra version 2.1.20 and remove the additional data directory data1 from node1, we can see that our cql
statement fails when retrieving data from node1. The error produced shows that the song_order
column is unknown to the node.
$ for i in $(seq 1 3); do ccm node${i} cqlsh -e "SELECT id, song_order, song_id, artist, title FROM music.playlists"; done
<stdin>:1:InvalidRequest: code=2200 [Invalid query] message="Undefined name song_order in selection clause"
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
An interesting side note, if nodetool drain
is run node1 before it is shut down then the above error never occurs. Instead the following output appears when we run our cql
statement to retrieve data from the nodes. As we can see below the query that failed now returns no rows of data.
$ for i in $(seq 1 3); do ccm node${i} cqlsh -e "SELECT id, song_order, song_id, artist, title FROM music.playlists"; done
id | song_order | song_id | artist | title
----+------------+---------+--------+-------
(0 rows)
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
Cassandra 2.2.13 Results
When we run the above test against a cluster using Apache Cassandra version 2.1.20 and remove the additional data directory data1 from node1, we can see that the cql
statement fails retrieving data from node1. The error produced is similar to that produced in version 2.1.20 where id
column name is unknown.
$ for i in $(seq 1 3); do ccm node${i} cqlsh -e "SELECT id, song_order, song_id, artist, title FROM music.playlists"; done
<stdin>:1:InvalidRequest: Error from server: code=2200 [Invalid query] message="Undefined name id in selection clause"
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
Unlike Cassandra version 2.1.20, node1 never generated a new Host ID or calculated new tokens. This is because it replayed the commitlog and recovered most of the writes that had gone missing.
INFO [main] ... CommitLog.java:160 - Replaying .../node1/commitlogs/CommitLog-5-1534865605274.log, .../node1/commitlogs/CommitLog-5-1534865605275.log
WARN [main] ... CommitLogReplayer.java:149 - Skipped 1 mutations from unknown (probably removed) CF with id 5bc52802-de25-35ed-aeab-188eecebb090
...
INFO [main] ... StorageService.java:900 - Using saved tokens [-1986809544993962272, -2017257854152001541, -2774742649301489556, -5900361272205350008, -5936695922885734332, -6173514731003460783, -617557464401852062, -6189389450302492227, -6817507707445347788, -70447736800638133, -7273401985294399499, -728761291814198629, -7345403624129882802, -7886058735316403116, -8499251126507277693, -8617790371363874293, -9121351096630699623, 1551379122095324544, 1690042196927667551, 2403633816924000878, 337128813788730861, 3467690847534201577, 419697483451380975, 4497811278884749943, 4783163087653371572, 5213928983621160828, 5337698449614992094, 5502889505586834056, 6549477164138282393, 7486747913914976739, 8078241138082605830, 8729237452859546461]
Cassandra 3.0.15 Results
When we run the above test against a cluster using Apache Cassandra version 3.0.15 and remove the additional data directory data1 from node1, we can see that the cql
statement returns no data from node1.
$ for i in $(seq 1 3); do ccm node${i} cqlsh -e "SELECT id, song_order, song_id, artist, title FROM music.playlists"; done
id | song_order | song_id | artist | title
----+------------+---------+--------+-------
(0 rows)
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
id | song_order | song_id | artist | title
-------------+------------+-------------+---------------------+--------------
62c36092... | 2 | 8a172618... | Birds of Tokyo | Plans
62c36092... | 3 | 2b09185b... | Lorde | Royals
62c36092... | 1 | a3e64f8f... | Of Monsters and Men | Little Talks
(3 rows)
Cassandra 3.11.3 Results
When we run the above test against a cluster using Apache Cassandra version 3.11.3 and remove the additional data directory data1 from node1, the node fails to start and we can see the following error message in the logs.
ERROR [main] 2018-08-21 16:30:53,489 CassandraDaemon.java:708 - Exception encountered during startup
java.lang.RuntimeException: A node with address /127.0.0.1 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.
at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:558) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:804) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:664) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:613) ~[apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:379) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:602) [apache-cassandra-3.11.3.jar:3.11.3]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:691) [apache-cassandra-3.11.3.jar:3.11.3]
In this case, the cluster reports node1 as down and still shows its original Host ID.
$ ccm node2 nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN 127.0.0.1 191.96 KiB 32 100.0% 35a3c8ff-fa20-4f10-81cd-7284caeb00bd rack1
UN 127.0.0.2 191.82 KiB 32 100.0% 2ebe4f0b-dc8f-4f46-93cd-37c410174a49 rack2
UN 127.0.0.3 170.46 KiB 32 100.0% 0384793e-7f59-40aa-a487-97f410dded4b rack3
After inspecting the SSTables in both data directories we can see that a new Host ID 7d910c98-f69b-41b4-988a-f432b2e54b38
has been assigned to the node even though it failed to start.
$ ./tools/bin/sstabledump .../node1/data0/system/local-7ad54392bcdd35a684174e047860b377/mc-12-big-Data.db | grep host_id
{ "name" : "host_id", "value" : "35a3c8ff-fa20-4f10-81cd-7284caeb00bd", "tstamp" : "2018-08-21T06:24:08.106Z" },
$ ./tools/bin/sstabledump .../node1/data1/system/local-7ad54392bcdd35a684174e047860b377/mc-10-big-Data.db | grep host_id
{ "name" : "host_id", "value" : "7d910c98-f69b-41b4-988a-f432b2e54b38" },
Take away messages
As we have seen from testing there are potential dangers with using multiple directories in Cassandra. By simply removing one of the data directories in the setting a node can become a brand new node and affect the rest of the cluster. The JBOD feature can be useful in emergencies where disk space is urgently needed, however its usage in this case should be temporary.
The use of multiple disks in a Cassandra node, I feel is better done at the OS or hardware layer. Systems like LVM and RAID were designed to allow multiple disks to be used together to make up a volume. Using something like LVM or RAID rather than Cassandra’s JBOD feature reduces the complexity of the Cassandra configuration and the number of moving parts on the Cassandra side that can go wrong. By using the JBOD feature in Cassandra, it subtlety increases operational complexity and reduces the nodes ability to fail fast. In most cases I feel it is more useful for a node to fail out right rather than limp on and potentially impact the cluster in a negative way.
As a final thought, I think one handy feature that could be added to Apache Cassandra to help prevent issues associated with JBOD is the ability to check if the data, commitlog, saved_caches and hints are all empty prior to bootstrapping. If they are empty, then the node proceeds as normal. If they contain data, then perhaps the node could fail to start and print an error message in the logs.