🗄️ Cassandra Interview Questions & Answers (2025)
Basic Level Questions
▶
What is Apache Cassandra?Apache Cassandra is a distributed NoSQL database designed to handle large volumes of data with high availability and no single point of failure.
▶
What type of database is Cassandra?Cassandra is a wide-column store NoSQL database.
▶
Who originally developed Cassandra?It was originally developed by Facebook to power its Inbox search feature and later open-sourced.
▶
What is a keyspace?Keyspace is the top-level data container in Cassandra, similar to a database in RDBMS, defining replication settings.
▶
What is a primary key?A primary key uniquely identifies rows in a table, consisting of a partition key and optional clustering columns.
▶
What is a column family?A column family is a structure to store data in columns and rows, similar to a table but more flexible in schema.
▶
Does Cassandra support ACID transactions?Cassandra provides tunable consistency and lightweight transactions (partial ACID), but not full ACID for multi-row operations.
▶
What is a node in Cassandra?A node is an individual instance of Cassandra running in a cluster, storing part of the data.
▶
What is replication in Cassandra?Replication is storing copies of data on multiple nodes to ensure reliability and high availability.
▶
What language is used to query Cassandra?Cassandra uses CQL (Cassandra Query Language), similar in syntax to SQL but without joins.
Intermediate Level Questions
▶
What is a cluster?A cluster is a collection of nodes arranged to distribute data evenly and provide redundancy.
▶
What is the consistency level in Cassandra?It defines how many replica nodes must acknowledge a read/write operation before it’s considered successful.
▶
Explain partition key and clustering key.Partition key determines data distribution; clustering key orders rows within a partition.
▶
What is a SSTable?Sorted String Table is an immutable data file where Cassandra stores data after memtable flush.
▶
What is a memtable?An in-memory data structure where writes are first stored before flushing to disk as SSTables.
▶
Explain compaction in Cassandra.Compaction merges SSTables, discards old data, and enhances read efficiency.
▶
What is hinted handoff?A mechanism to temporarily store hints for failed nodes and deliver missed writes when they recover.
▶
What is read repair?A background process that fixes inconsistencies between replicas during a read request.
▶
What is a coordinator node?The node that receives a client request and forwards it to appropriate replica nodes for action.
▶
Describe gossip protocol.A peer-to-peer communication method used by Cassandra nodes to share information about cluster state.
▶
Can Cassandra run across data centers?Yes, it supports multi–data center replication with configurable strategies.
▶
Explain the role of snitches.Snitches determine network topology to route requests and place replicas efficiently.
▶
What is tunable consistency?It allows clients to choose consistency level per query balancing speed and data accuracy.
▶
What query limitations exist in Cassandra?No joins, no subqueries, requires using partition key for most queries.
▶
What is eventual consistency?Guarantee that, in the absence of further writes, all replicas will eventually converge to the same value.
▶
How is data modeled in Cassandra?Modeling is query-driven, denormalized, storing data to suit access patterns.
▶
What is a materialized view?Automatically updated view of data based on an existing table with different primary key.
▶
How do you monitor Cassandra?Using JMX metrics, OpsCenter, Prometheus, Grafana.
▶
Can Cassandra encrypt data?Yes, it supports client-to-node, node-to-node, and at-rest encryption.
▶
What is the tunable read consistency level?Examples: ONE, QUORUM, ALL — number of replicas that must respond for read to succeed.
Advanced Level Questions
▶
How does Cassandra handle node failures?Through replication, hinted handoff, read repair, and anti-entropy repair processes to ensure consistency.
▶
Describe the write path in Cassandra.Write goes to commit log, then memtable; later, memtable flushes to SSTable on disk.
▶
Explain the read path in Cassandra.Coordinator queries replicas, merges results from memtable, row cache, and SSTables, returns latest data based on timestamp.
▶
What is vnode (Virtual Node)?Vnodes allow each node to own multiple token ranges, improving load distribution and rebalance speed.
▶
How does Cassandra scale horizontally?Adding nodes automatically redistributes data without downtime.
▶
What is anti-entropy repair?Process to synchronize data between replicas to fix inconsistencies.
▶
Explain lightweight transactions.They use Paxos consensus to provide compare-and-set operations for conditional updates.
▶
What is hinted handoff delay?The time for which hints are stored for a down node before they are discarded.
▶
How to choose partition keys effectively?Choose keys that evenly distribute data to avoid hotspots, considering query patterns.
▶
Describe speculative retry in Cassandra.A performance feature that retries reads on another replica if the original is slow to respond.
▶
What is row cache?Stores entire rows in memory to speed up reads, but increases memory usage.
▶
How does multi-DC replication work?Replication strategy is configured to replicate specified number of copies to each data center.
▶
What’s the role of commit log in crash recovery?It’s used to replay unflushed writes during restart after a crash to prevent data loss.
▶
Explain tombstones in Cassandra.Markers for deleted data that remain until compaction permanently removes them.
▶
How can you secure a Cassandra cluster?Enable authentication, encryption, set proper file permissions, and limit network access.
▶
What’s the role of seed nodes?Seed nodes are contact points for new or restarted nodes to join the cluster’s gossip ring.
▶
How to backup and restore data in Cassandra?Use snapshots and incremental backups; restore by loading SSTables from backups.