Kafka — Data Engineer Interviews
- Tejas Agrawal
- Nov 29
- 2 min read
1. Kafka Architecture
Core Architecture
Producer → Topic → Partitions → Broker → Consumer Group
Topics are log files divided into partitions.
Each partition is stored on a broker.
Every partition includes:
Leader (responsible for handling reads/writes)
Followers (replicas for fault tolerance)
Key Design Concepts
Distributed → enables horizontal scaling
Append-only log → ensures high speed
Partitioning → supports parallel processing
Replication → prevents data loss
Offset → indicates consumer’s position.

Why Companies Use Kafka
Real-time dashboards
Fraud detection
IoT pipelines for smart cities
Ride events (e.g., Careem)
E-commerce tracking (e.g., Noon, Talabat)
2. Top 8 Smart Kafka Interview Questions
Q1. Explain Kafka architecture in one minute.
Kafka is a distributed log system where producers write data to topics, which are divided into partitions and stored across multiple brokers. Consumers in consumer groups read these partitions in parallel. Replication ensures durability, and offsets track the message position.
Q2. What is a Partition and why is it important?
A partition is a segment of a topic that provides:
Scalability (parallelism)
High throughput
Load balancing
Q3. Leader vs Follower replicas?
Leader handles reads/writes
Followers replicate the data
If a leader fails, a follower becomes the leader
This ensures fault tolerance.
Q4. What is ISR (In-Sync Replicas)?
The set of replicas fully synchronized with the leader. If the ISR shrinks, there is a risk of data loss.
Q5. How does Kafka ensure high performance?
Sequential disk writes, zero-copy transfer, partitioning, batching, and asynchronous I/O.
Q6. What is a Consumer Group?
A group of consumers sharing the load. Kafka guarantees:
One partition is consumed by only one consumer within a group.
Q7. What is Kafka’s delivery guarantee?
At-most-once
At-least-once (default)
Exactly-once (with idempotent producer and transactions)
Q8. How do you design a topic for high throughput?
Increase partitions
Set replication factor to 3
Use batch writes
Enable compression (e.g., snappy, lz4)
Distribute partitions across brokers
3. Final 20-Second Architecture
Kafka is a distributed event streaming platform where producers write to partitioned topics stored across brokers, replicated for fault tolerance. Consumers in groups read data with offset tracking. Partitioning facilitates massively parallel, real-time pipelines, establishing Kafka as the backbone of modern data platforms.



Comments