Best Cassandra interview questions for experienced candidates
Cassandra is an open-source data storage system becoming more popular each day. Therefore, there are high chances of a related job opening coming up often. If you see such an opportunity, it would be best to grab it. However, you won’t be the only one eyeing the job. Under such circumstances, you have no choice but to prove you are the best. These Cassandra interview questions for experienced candidates increase your chances of getting the job. They help you prepare well for the interview. You also become confident in knowing what to expect and how to respond. Read on to learn the interview questions and the corresponding answers.
Which values do a Cassandra Column store?
It stores 3 values which are;
- Timestamp
- Column name
- Value
What does the abbreviation ACID stand for in Cassandra?
The for letters represent the following;
- Atomicity: A transaction can have two possible outcomes; fail and commit.
- Consistency: It is an emphasis on data being consistent. The actual definition depends on the context. No wonder it differs from one application to another and one software to another.
- Isolation: Each piece of data needs to be separated and isolated
- Durability: This aspect is based on the database, given that it receives data and facilitates processing. Therefore, if a database fails, it is important to ensure that data isn’t lost
Differentiate a Super Column and a Column
A Cassandra column has three values, timestamp, name, and value. On the other hand, a super column has a name and a value but lacks a time stamp. Equally important, the value of a column is a string, whereas that of a super column is a column’s map with various data types.
Tell us about the classification of Primary Keys in Cassandra
Interestingly, Cassandra has three types of Primary Keys. They are;
- Single Primary Key: As the name suggests, only one column has a primary key in this case. Other people also call it a partitioning key. It is crucial when partitioning data or spreading it on various nodes. After all, that’s the column where developers base these actions.
- Compound Primary Key: In this case, data isn’t just partitioned, but it is also clustered. So, there are two keys, the partitioning one, and its clustering counterpart. For clarification, clustering is sorting data in a certain partition.
- Composite Partitioning Key: It is suitable when a certain partition has a lot of data. The key creates several partitions for that data to handle it easily.
Discuss Cassandra Query Language (CQL) Collections
You can use them in various ways, as follows;
- MAP: This data type stores a pair of elements, usually key and value
- SET: It stores and sorts a cluster of components
- List: It is ideal when data has to remain in a certain order, and the list contains only distinctive elements
How does Cassandra delete data?
It is no secret that you can’t change SSTables. So, what happens once you delete a row? Cassandra has a unique way of dealing with the deletion, given how these tables work. It assigns Tombstone, which is a special value assigned to the column value, and that means the data has been deleted.
Explain the Gossip Protocol
It is a peer-to-peer communication protocol in Cassandra. It gives the different nodes the liberty to choose the other nodes with which it will exchange state information. Since nodes exist in clusters, it becomes easy and fast to learn about each. The nodes not only share information about themselves but also that of the nodes they have gossiped about. Its working can be explained in a few steps;
- One node selects another from a list of nodes familiar to it randomly
- If it is node A that has selected node B, the former will send a message to the latter, which contains data from that node
- Upon receiving the message, node B will update its data based on the new information
- On the other hand, node B will also send its data to node A
- In the same vein, node A updates its data based on the information it receives from node B
It is important to note that gossip protocol helps one detect failure by acknowledging messages. Only a node working well will send and receive messages, and an acknowledgment is received. So, if there are no acknowledgments, one can tell that the node is either failing or already down.
What do these terms, replication factor, and strategy, mean in Cassandra?
The data of every node is replicated. It is copied to another node for fault tolerance purposes. When replication happens, the nodes send different copies of data to various nodes. The number of these copies is known as the replication factor. Changing it on a live cluster is possible, but you will need to repair it since you will be working on existing data.
On the other hand, a replication strategy is the technique applied when placing replicas in a cluster. The strategies are usually two: simple strategy and network topology strategy.
What are the uses of Alter Keyspace?
It can change various properties, including;
- The durable_write of a keyspace
- Number of replicas
Define the Cassandra-Cqlsh
It is a query language that helps users to communicate with a Cassandra database. Some of the interactions you can do with Cassandra-Cqlsh include the following;
- Defining a schema
- Executing a query
- Inserting data
What’s the role of the Capture shell command and its Consistency counterpart?
Capture and Consistency are some of the Cqlsh shell commands. As the name suggests, Capture captures the command’s output before adding it to a file. On the other hand, Consistency shows the current consistency level or sets a new one depending on what’s appropriate at that point.
As you monitor Cassandra, which tools would you consider?
Despite its built-in tolerance features, monitoring Cassandra is important if you want excellent effects. As you monitor databases, the following tools will ensure that you do so efficiently;
- Dynatrace
- Instaclustr
- Solarwind server and application monitor
- Machine engine applications manager
- AppDynamics
- Instana
Define a CQL and state its main roles
Gone are the days when Cassandra relied on API for basic tasks such as get, insert and delete. These basic queries have been improved and are now known as the Cassandra Query Language (CQL). It comprises built-in data types while allowing applications to make custom data types.
Developers have to assign various users different roles based on their requirements. Consequently, it enhances each database user’s security regardless of the role. The key roles include;
- Create a role
- Alter a role
- Drop a role
- Grant a role
- Revoke a role
- List a role
List the various CRUD operations in a Cassandra database
First, CRUD operations are ideal for making changes. They include:
- Delete or drop operation
- Update operation
- Read operation
- Create operation
Define a keyspace in Cassandra
A keyspace is a cluster’s part responsible for controlling data replication in a Cassandra database. It is important to note that every cluster has a single keyspace in each node.
Tell us about the column family and its characteristics in Cassandra.
It comprises rows that are systematic and ordered in a certain way. It represents stored structured data. Every column family is stored in a keyspace. A column family can have several characteristics, such as;
- Rows cached
- Key cached
- Preload row cache
What are the different types of keyspace in Cassandra?
There are three types of operations in a Cassandra keyspace, including:
- Alter keyspace
- Create keyspace
- Drop keyspace
How would you iterate all rows in a particular Column Family?
If you want to iterate all the rows in a certain Column Family, use get_range_slices. At the beginning of the iteration, one can start with an empty string. As for the rest of the iterations, the key read in the last iteration becomes the start key of the next one.
Which ports does Cassandra use?
By default, Cassandra uses the following ports
- 7000 port – Cluster management
- 8080 port – JMX
- 9160 port – Thrift Clients
Nevertheless, one can edit these default settings since they are TCP ports by editing the bin/Cassandra.in.sh configuration file
Explain the various types of partitioners in Cassandra
The three types are;
- Murmur3Partitioner: It is the default partitioner that’s relatively fast and more improved than the RandomPartitioner. It distributes data uniformly as per the MurmurHash function. The 64-bit has a value partition key ranges from 263 to 263-1
- RandomPartitioner: Until the release of Cassandra 1.2, RandomPartitioner served as the default partitioner. It also supported uniform distribution and was used with vnodes. The partitioner uses MD5 hash values ranging from 0 to 2127-1
- ByteOrderedPartioner: As the name suggests, it is suitable for ordered partitioning. It will order rows based on their key bytes. It works like a conventional index.
How is Truncate different from Drop in CQLSH?
Using the Drop table command will drop the corresponding table and its data from its keyspace. However, the Truncate table command truncates the specified table and deletes all its rows permanently.
Which are the various logging levels used in Cassandra?
They include;
- OFF: It is designed to turn off logging, and it is also the highest rank in this context
- ERROR: It outlines the error events that may not stop an application from running despite occurring
- WARN: It designates any potentially harmful situation
- INFO: Designates informational messages highlighting the progress in a course-grained level
- DEBUG: Designates fine-grained informational events that are important while debugging an application
- TRACE: Its work is similar to that of the DEBUG logging level, but its information events are finer-grained
- ALL: Custom levels and all the other levels
Discuss the various types of repairs
They include;
- Anti Entropy: In this repair, one compares the data in every replica, identifies the latest data version, and uses the Merkle tree to update them. This repair requires a manual trigger. It is a process with two phases whereby one has to build a Merkle tree for every replica first. Then, there’s the comparison of the Merkle tree to detect any difference. It is advisable to run anti-entropy repair periodically to ensure that data is in sync.
- Read Repair: Upon making a read request, there may be inconsistencies in the replica nodes. This type of repair will fix them. You only perform the repair on the affected nodes so that the entire node ring can be consistent. How does the repair work? It identifies the nodes responding to inconsistent data differently from the new nodes. After identifying these old nodes, the repair is performed on them.
- Nodetool Repair: This repair focuses on a range of tokens. The range is repaired depending on the specified options.
- Full Repair: It works on the entire data in a token range
- Incremental Repair: It focuses on data written after the last incremental repair.
Which are the various consistency levels for Cassandra’s read operations?
They include;
- SERIAL: It prevents unconditional updates through linearizable consistency
- LOCAL_SERIAL: It operates like the serial consistency level but only applies to a local data center
- ANY:
- LOCAL_ONE: This write should be written for one or more replica nodes in the local data center
- ONE: This write needs to be written to a memTable and commitlog of at least a single replica node
- TWO: This write needs to be written to a memTable and commitlog of at least TWO replica nodes
- THREE: This write needs to be written to a memTable and commitlog of at least three replica nodes
- LOCAL_QUORUM: This write needs to be written to a memTable and commitlog on replica nodes’ quorum in every data center
- EACH_QUORUM: This write needs to be written to a memTable and commitlog on replica nodes’ quorum in a single data center
- ALL: This write need to be written to a memTable and commitlog on all replica nodes’ quorum in a cluster. It is quite consistent
Which are the primary components of a data model in Cassandra?
They include;
- Column family
- Column
- Keyspace
- Cluster
What about the other components?
The list contains the following;
- Bloom Filter
- SSTable
- Mem-table
- Commit log
- Cluster
- Data Centre
- Node