K8s Interview Series

🎯 K8s Interview Series – Part 3: etcd – The Brain of Kubernetes

If you’re preparing for Kubernetes interviews, one control-plane component you must understand deeply is etcd.

It’s often called the brain or memory of the cluster — and for good reason. 💾


1️⃣ Where does etcd fit into the Kubernetes architecture?

Answer: etcd is part of the control plane, running on the master (control-plane) nodes.

It is the primary data store that maintains the entire cluster state.


2️⃣ What is the main function of etcd?

Answer: etcd is a distributed key-value store that keeps all the configuration data and object states of the cluster — Pods, Deployments, Secrets, ConfigMaps, Nodes, and more.


3️⃣ Which component directly communicates with etcd?

Answer: Only the API Server talks directly to etcd.

All other components (Scheduler, Controller Manager, Kubelet) communicate through the API Server, never directly with etcd.


4️⃣ What kind of data does etcd store?

Answer: Everything about your cluster — resource definitions, configuration objects, RBAC rules, and status information — stored as key-value pairs.

Example structure:

/registry/pods/default/nginx-pod
/registry/deployments/production/web-app
/registry/secrets/kube-system/admin-token

5️⃣ Is etcd a SQL database?

Answer:No. etcd is not a relational database.

It’s a lightweight, distributed key-value store designed for reliability, speed, and consistency.


6️⃣ What ensures data consistency in etcd?

Answer: etcd uses the Raft Consensus Algorithm to maintain consistent data across multiple etcd nodes (leader and followers).

How it works:

  • One node is the leader (handles all writes)
  • Other nodes are followers (replicate data)
  • Leader sends heartbeats to followers
  • Writes require majority approval (quorum)

Example:

3-node cluster:
- Leader: etcd-1 (writes)
- Follower: etcd-2 (replicates)
- Follower: etcd-3 (replicates)

Write accepted when 2 out of 3 nodes agree (quorum)

7️⃣ What happens if etcd fails?

Answer:

If etcd goes down:

  • ❌ The cluster becomes read-only — new Pods or changes cannot be created
  • Existing Pods continue running (handled by kubelet)
  • ❌ Data corruption or loss in etcd means cluster state loss

Backup is critical!


8️⃣ Why are etcd backups so important in Kubernetes?

Answer: Because etcd stores the entire cluster configuration and state, losing it means losing:

  • All Deployments, Services, ConfigMaps, Secrets, and RBAC data
  • Cluster metadata (Nodes, roles, namespaces)

Regular automated etcd snapshots are the only way to restore a failed control plane without rebuilding everything from scratch.


9️⃣ Where should etcd be located in production?

Answer:

  • ✅ Run etcd on dedicated control-plane nodes
  • ✅ Use SSD storage for high I/O performance
  • ✅ Enable TLS encryption for all communication
  • Isolate from workloads for reliability

🔟 What are the best practices for securing and maintaining etcd?

Answer:

  • ✅ Enable TLS encryption between API Server ↔ etcd
  • ✅ Take regular encrypted backups
  • ✅ Use dedicated volumes (SSD)
  • ✅ Enable authentication and RBAC for access
  • ✅ Run odd-numbered etcd clusters (3 or 5) for quorum stability

🚀 Advanced Real-World Scenarios


1️⃣1️⃣ How do you backup and restore etcd in a production cluster?

Answer:

Backup Process:

# Create snapshot
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Verify backup
ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-snapshot.db

Restore Process:

# Stop API Server first
sudo systemctl stop kube-apiserver

# Restore from snapshot
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
  --data-dir=/var/lib/etcd-restore \
  --initial-cluster=etcd-1=https://10.0.1.10:2380 \
  --initial-advertise-peer-urls=https://10.0.1.10:2380

# Update etcd to use new data directory
# Edit /etc/kubernetes/manifests/etcd.yaml
# Change: --data-dir=/var/lib/etcd-restore

# Restart etcd (static pod will restart automatically)
# Then start API Server
sudo systemctl start kube-apiserver

Production Best Practice:

# Automated daily backups with cronjob
0 2 * * * /usr/local/bin/backup-etcd.sh


1️⃣2️⃣ Why must etcd run in odd numbers (3, 5, 7) and not even numbers?

Answer:

Quorum Requirement: etcd needs majority (more than 50%) of nodes to be healthy to function.

Failure Tolerance:

3-node cluster:

Total nodes: 3
Quorum needed: 2 (majority)
Can tolerate: 1 failure

4-node cluster:

Total nodes: 4
Quorum needed: 3 (majority)
Can tolerate: 1 failure  ← Same as 3 nodes!

5-node cluster:

Total nodes: 5
Quorum needed: 3 (majority)
Can tolerate: 2 failures

Why NOT even numbers?

  • 4 nodes = Can tolerate only 1 failure (same as 3 nodes, but more overhead)
  • Wastes resources without improving availability
  • Odd numbers are mathematically optimal

Formula:

Quorum = (N / 2) + 1
Tolerated failures = (N - 1) / 2

Recommendation:

  • Small clusters: 3 etcd nodes
  • Large clusters: 5 etcd nodes
  • Very large: 7 etcd nodes (rare, usually not needed)

1️⃣3️⃣ What is etcd compaction and defragmentation, and why are they needed?

Answer:

The Problem: etcd keeps all historical versions of data for a certain time. Over time, this grows the database size, slowing performance.

Compaction: Removes old versions of keys, keeping only recent history.

# Check current revision
etcdctl endpoint status

# Compact up to specific revision
ETCDCTL_API=3 etcdctl compact <revision-number> \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Defragmentation: After compaction, reclaim disk space.

# Defragment etcd database
ETCDCTL_API=3 etcdctl defrag \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Real-World Scenario:

# Before compaction
etcd database size: 8 GB

# After compaction + defrag
etcd database size: 2 GB

Auto-Compaction (Recommended):

# API Server flag
--etcd-compaction-interval=5m  # Compact every 5 minutes

When to do manual defrag:

  • After large-scale deletions
  • Database size unexpectedly large
  • Performance degradation

1️⃣4️⃣ Your cluster is slow, and you suspect etcd. How do you troubleshoot?

Answer:

Step 1: Check etcd Health

ETCDCTL_API=3 etcdctl endpoint health \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Output should show:
# 127.0.0.1:2379 is healthy: successfully committed proposal: took = 2.5ms

Step 2: Check Database Size

ETCDCTL_API=3 etcdctl endpoint status --write-out=table

# If DB size > 2GB, consider compaction

Step 3: Monitor Latency

# Check API Server to etcd latency
# Look for this metric in API Server logs:
etcd_request_duration_seconds

# High latency indicators:
# - P99 > 100ms (warning)
# - P99 > 500ms (critical)

Step 4: Check Disk I/O

# etcd requires fast disk
iostat -x 1

# Look for:
# - High await (disk latency)
# - %util near 100% (disk saturated)

# Solution: Move etcd to SSD

Step 5: Check Network Between etcd Nodes

# Test latency between etcd nodes
ping -c 10 etcd-node-2

# Should be &lt; 5ms for same datacenter

Step 6: Review etcd Logs

# Check for errors
journalctl -u etcd -f

# Common issues:
# - "took too long" warnings
# - Leader election failures
# - Disk sync duration high

Quick Fixes:

Issue: Database too large

# Compact and defragment
etcdctl compact &lt;revision>
etcdctl defrag

Issue: Slow disk

# Verify disk is SSD
lsblk -d -o name,rota
# rota=0 means SSD, rota=1 means HDD

# Move etcd to SSD if needed

Issue: High memory usage

# Check etcd memory
top -p $(pgrep etcd)

# Reduce watch cache size
# Edit etcd config:
--quota-backend-bytes=2147483648  # 2GB limit


1️⃣5️⃣ What happens during an etcd leader election, and when does it occur?

Answer:

When Leader Election Happens:

  1. Leader node fails (hardware/network issue)
  2. Leader loses quorum (can’t reach majority of nodes)
  3. Network partition splits the cluster
  4. etcd process restarts on leader node

Election Process:

Step 1: Leader Failure Detected

Followers stop receiving heartbeats from leader
After election timeout (default: 1000ms), followers start election

Step 2: Candidate Requests Votes

Follower promotes itself to "Candidate"
Sends RequestVote to all other nodes

Step 3: Voting

Each node votes for one candidate
Candidate needs majority votes to become leader

Step 4: New Leader Elected

Node with majority votes becomes new leader
Starts sending heartbeats to followers
Resumes handling client requests

Real-World Scenario:

3-node cluster:

Before:
- etcd-1: Leader
- etcd-2: Follower  
- etcd-3: Follower

etcd-1 crashes

After election (~1-2 seconds):
- etcd-2: Leader (new)
- etcd-3: Follower
- etcd-1: Down

What You’ll See in Logs:

# On followers:
etcd: leader lost, starting election
etcd: became candidate at term 5
etcd: received vote from etcd-3
etcd: became leader at term 5

# On API Server:
Failed to reach etcd, retrying...
Successfully connected to new leader

Impact:

  • Brief interruption (~1-3 seconds) during election
  • Writes blocked during election
  • Reads may continue (stale reads possible)
  • After election: cluster resumes normal operation

Production Tip:

# Monitor for frequent elections (sign of instability)
# Check metric: etcd_server_leader_changes_seen_total

# Healthy cluster: Very rare leader changes
# Problem: Multiple changes per day

Preventing Unnecessary Elections:

  • Use reliable networking
  • Ensure etcd nodes have adequate resources
  • Keep latency between nodes < 10ms
  • Use dedicated nodes for etcd

🧠 Key Takeaways

Core Concepts:

  • ✅ etcd = Cluster’s database & memory
  • ✅ Belongs to the Control Plane
  • ✅ API Server is the only component that talks to etcd directly
  • ✅ Uses Raft consensus for consistency
  • ✅ Backup and encryption are non-negotiable in production

Operations:

  • Backup: Use etcdctl snapshot save regularly
  • Restore: Stop API Server → restore → update data dir
  • Compact: Remove old versions to save space
  • Defrag: Reclaim disk space after compaction
  • Monitor: Watch latency, DB size, and health

Architecture:

  • ✅ Run odd numbers (3, 5, 7) for quorum
  • ✅ Needs majority to function (quorum)
  • ✅ Can tolerate (N-1)/2 failures
  • ✅ Use SSD storage for performance
  • ✅ Enable TLS for security

Troubleshooting:

  • ✅ Check health with etcdctl endpoint health
  • ✅ Monitor DB size and compact if > 2GB
  • ✅ Verify disk is SSD with good I/O
  • ✅ Check latency between etcd nodes
  • ✅ Watch for frequent leader elections

Real-World:

  • ✅ Automate backups with cronjobs
  • ✅ Test restore procedure regularly
  • ✅ Monitor metrics (latency, size, elections)
  • ✅ Plan for failure scenarios
  • ✅ Keep etcd isolated from workloads

📝 Quick Command Reference

# Health check
etcdctl endpoint health

# Backup
etcdctl snapshot save /backup/snapshot.db

# Restore
etcdctl snapshot restore /backup/snapshot.db

# Check status
etcdctl endpoint status --write-out=table

# Compact
etcdctl compact <revision>

# Defragment
etcdctl defrag

# Check members
etcdctl member list

Remember: All commands need certificates:

--endpoints=https://127.0.0.1:2379
--cacert=/etc/kubernetes/pki/etcd/ca.crt
--cert=/etc/kubernetes/pki/etcd/server.crt
--key=/etc/kubernetes/pki/etcd/server.key


Master etcd, master Kubernetes! 💪

#Kubernetes #K8s #etcd #DevOps #SRE #Interview #CloudNative #DistributedSystems

Leave a Comment

Your email address will not be published. Required fields are marked *

Stay up to date with our blogs.

Subscribe to receive email notifications for new blog posts.