🗄️ Cassandra Interview Questions & Answers (2025)
Basic Level Questions
What is Apache Cassandra?▶
Apache Cassandra is a distributed NoSQL database designed to handle large volumes of data with high availability and no single point of failure.
What type of database is Cassandra?▶
Cassandra is a wide-column store NoSQL database.
Who originally developed Cassandra?▶
It was originally developed by Facebook to power its Inbox search feature and later open-sourced.
What is a keyspace?▶
Keyspace is the top-level data container in Cassandra, similar to a database in RDBMS, defining replication settings.
What is a primary key?▶
A primary key uniquely identifies rows in a table, consisting of a partition key and optional clustering columns.
What is a column family?▶
A column family is a structure to store data in columns and rows, similar to a table but more flexible in schema.
Does Cassandra support ACID transactions?▶
Cassandra provides tunable consistency and lightweight transactions (partial ACID), but not full ACID for multi-row operations.
What is a node in Cassandra?▶
A node is an individual instance of Cassandra running in a cluster, storing part of the data.
What is replication in Cassandra?▶
Replication is storing copies of data on multiple nodes to ensure reliability and high availability.
What language is used to query Cassandra?▶
Cassandra uses CQL (Cassandra Query Language), similar in syntax to SQL but without joins.
Intermediate Level Questions
What is a cluster?▶
A cluster is a collection of nodes arranged to distribute data evenly and provide redundancy.
What is the consistency level in Cassandra?▶
It defines how many replica nodes must acknowledge a read/write operation before it’s considered successful.
Explain partition key and clustering key.▶
Partition key determines data distribution; clustering key orders rows within a partition.
What is a SSTable?▶
Sorted String Table is an immutable data file where Cassandra stores data after memtable flush.
What is a memtable?▶
An in-memory data structure where writes are first stored before flushing to disk as SSTables.
Explain compaction in Cassandra.▶
Compaction merges SSTables, discards old data, and enhances read efficiency.
What is hinted handoff?▶
A mechanism to temporarily store hints for failed nodes and deliver missed writes when they recover.
What is read repair?▶
A background process that fixes inconsistencies between replicas during a read request.
What is a coordinator node?▶
The node that receives a client request and forwards it to appropriate replica nodes for action.
Describe gossip protocol.▶
A peer-to-peer communication method used by Cassandra nodes to share information about cluster state.
Can Cassandra run across data centers?▶
Yes, it supports multi–data center replication with configurable strategies.
Explain the role of snitches.▶
Snitches determine network topology to route requests and place replicas efficiently.
What is tunable consistency?▶
It allows clients to choose consistency level per query balancing speed and data accuracy.
What query limitations exist in Cassandra?▶
No joins, no subqueries, requires using partition key for most queries.
What is eventual consistency?▶
Guarantee that, in the absence of further writes, all replicas will eventually converge to the same value.
How is data modeled in Cassandra?▶
Modeling is query-driven, denormalized, storing data to suit access patterns.
What is a materialized view?▶
Automatically updated view of data based on an existing table with different primary key.
How do you monitor Cassandra?▶
Using JMX metrics, OpsCenter, Prometheus, Grafana.
Can Cassandra encrypt data?▶
Yes, it supports client-to-node, node-to-node, and at-rest encryption.
What is the tunable read consistency level?▶
Examples: ONE, QUORUM, ALL — number of replicas that must respond for read to succeed.
Advanced Level Questions
How does Cassandra handle node failures?▶
Through replication, hinted handoff, read repair, and anti-entropy repair processes to ensure consistency.
Describe the write path in Cassandra.▶
Write goes to commit log, then memtable; later, memtable flushes to SSTable on disk.
Explain the read path in Cassandra.▶
Coordinator queries replicas, merges results from memtable, row cache, and SSTables, returns latest data based on timestamp.
What is vnode (Virtual Node)?▶
Vnodes allow each node to own multiple token ranges, improving load distribution and rebalance speed.
How does Cassandra scale horizontally?▶
Adding nodes automatically redistributes data without downtime.
What is anti-entropy repair?▶
Process to synchronize data between replicas to fix inconsistencies.
Explain lightweight transactions.▶
They use Paxos consensus to provide compare-and-set operations for conditional updates.
What is hinted handoff delay?▶
The time for which hints are stored for a down node before they are discarded.
How to choose partition keys effectively?▶
Choose keys that evenly distribute data to avoid hotspots, considering query patterns.
Describe speculative retry in Cassandra.▶
A performance feature that retries reads on another replica if the original is slow to respond.
What is row cache?▶
Stores entire rows in memory to speed up reads, but increases memory usage.
How does multi-DC replication work?▶
Replication strategy is configured to replicate specified number of copies to each data center.
What’s the role of commit log in crash recovery?▶
It’s used to replay unflushed writes during restart after a crash to prevent data loss.
Explain tombstones in Cassandra.▶
Markers for deleted data that remain until compaction permanently removes them.
How can you secure a Cassandra cluster?▶
Enable authentication, encryption, set proper file permissions, and limit network access.
What’s the role of seed nodes?▶
Seed nodes are contact points for new or restarted nodes to join the cluster’s gossip ring.
How to backup and restore data in Cassandra?▶
Use snapshots and incremental backups; restore by loading SSTables from backups.