K8s Interview Series
π― K8s Interview Series β Part 4: Controller Manager β The Automation Engine of Kubernetes
In Kubernetes interviews, when asked β π¬ “Who ensures the cluster always matches its desired state?”
π The answer is Controller Manager βοΈ
Let’s simplify this in an interview-focused Q&A format π
1οΈβ£ Where does the Controller Manager fit in the architecture?
π It’s a Control Plane component, running on the master node, alongside the API Server, Scheduler, and etcd.
Its job? To continuously reconcile what you want (desired state) with what’s actually running (current state).
2οΈβ£ What does it actually do?
It runs multiple controller loops that monitor cluster objects via the API Server and take corrective actions automatically.
For example:
- If a Pod fails β ReplicaSet Controller creates a new one
- If a Node goes down β Node Controller marks it NotReady and reschedules Pods
3οΈβ£ Key Built-in Controllers
- π₯ Node Controller β Monitors node health
- π ReplicaSet Controller β Ensures correct Pod count
- π Deployment Controller β Manages rollouts & rollbacks
- π Endpoint Controller β Updates service endpoints
- π¦ Namespace Controller β Handles namespace creation/cleanup
- π DaemonSet Controller, StatefulSet Controller, Job Controller, etc.
4οΈβ£ Does it talk to etcd directly?
β No. It interacts only with the API Server, which in turn talks to etcd.
All reads/writes go through the API layer.
5οΈβ£ Where does it run?
π§© As a kube-controller-manager process on the control plane node.
In managed clusters like EKS, GKE, AKS, it’s managed by the provider β users can’t access it directly.
6οΈβ£ How does it maintain the desired state?
Each controller continuously loops:
1οΈβ£ Watches current state (via API Server)
2οΈβ£ Compares it with desired state (YAML manifests)
3οΈβ£ Corrects any mismatch (creates/deletes resources)
7οΈβ£ Can you create custom controllers?
β Yes! You can extend Kubernetes using custom controllers or Operators (built via Kubebuilder or client-go) β ideal for automating app-specific logic.
8οΈβ£ What if the Controller Manager fails?
π₯ Self-healing stops temporarily.
- β No new Pods/resources will be created automatically
- β However, existing workloads keep running β only automation halts
9οΈβ£ How is high availability achieved?
Multiple instances run, but only one active leader at a time (leader election).
If it fails, another instance seamlessly takes over.
π Common Responsibilities
β
Maintain desired Pod replicas
β
Manage Node lifecycles
β
Handle rollouts/rollbacks
β
Update service endpoints
β
Manage namespaces, tokens & service accounts
π Advanced Real-World Scenarios
1οΈβ£1οΈβ£ Your Deployment isn’t rolling out new Pods. How do you troubleshoot the Controller Manager?
Answer:
Step 1: Check Controller Manager is Running
# Check if controller manager pod is running
kubectl get pods -n kube-system | grep controller-manager
# Should show:
kube-controller-manager-master-1 1/1 Running
Step 2: Check Controller Manager Logs
# View logs
kubectl logs -n kube-system kube-controller-manager-master-1
# Look for errors related to deployment controller
kubectl logs -n kube-system kube-controller-manager-master-1 | grep -i deployment
Step 3: Check if Deployment Controller is Enabled
# Verify deployment controller is in the enabled list
kubectl logs -n kube-system kube-controller-manager-master-1 | grep "Started controller"
# Should see:
Started deployment controller
Started replicaset controller
Step 4: Check Deployment Events
# See what controller observed
kubectl describe deployment <deployment-name>
# Events section will show controller actions:
# Normal ScalingReplicaSet ReplicaSet scaled up to 3
Step 5: Check ReplicaSet Status
# Deployment creates ReplicaSet, which creates Pods
kubectl get rs
# Check ReplicaSet events
kubectl describe rs <replicaset-name>
Common Issues & Fixes:
Issue 1: Controller Manager Crashed
# Restart it
kubectl delete pod -n kube-system kube-controller-manager-master-1
# Or on systemd:
sudo systemctl restart kube-controller-manager
Issue 2: Resource Quotas
# Check if namespace has quota issues
kubectl describe quota -n <namespace>
# If quota exceeded, increase it or delete unused resources
Issue 3: Image Pull Errors
# Controller creates Pod, but kubelet can't pull image
kubectl describe pod <pod-name>
# Events:
Failed to pull image "myapp:latest": ImagePullBackOff
Issue 4: Deployment Paused
# Check if deployment is paused
kubectl get deployment <name> -o yaml | grep paused
# Resume if paused
kubectl rollout resume deployment/<name>
1οΈβ£2οΈβ£ Explain the reconciliation loop – how exactly do controllers watch and react?
Answer:
The Reconciliation Loop (Control Loop):
βββββββββββββββββββββββββββββββββββββββ
β Controller Manager Process β
β β
β ββββββββββββββββββββββββββββββββ β
β β ReplicaSet Controller β β
β β β β
β β while true: β β
β β 1. Watch API Server β β
β β 2. Get current state β β
β β 3. Get desired state β β
β β 4. Compare β β
β β 5. Take action if needed β β
β β 6. Sleep/Wait for events β β
β ββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββ
βοΈ
βββββββββββββββ
β API Server β
βββββββββββββββ
Detailed Example: ReplicaSet Controller
Scenario: You create a Deployment with replicas: 3
Step 1: Watch for Events
# Controller watches ReplicaSets via API Server
Watch: /apis/apps/v1/replicasets
Step 2: Detect Change
# New ReplicaSet created
Event: ADDED
Object: ReplicaSet "nginx-rs" with replicas: 3
Step 3: Get Current State
# Controller queries API Server
Current Pods with label app=nginx: 0
Step 4: Get Desired State
# Read from ReplicaSet spec
Desired Pods: 3
Step 5: Calculate Diff
Desired: 3
Current: 0
Action needed: Create 3 Pods
Step 6: Take Action
# Controller calls API Server to create Pods
POST /api/v1/namespaces/default/pods (3 times)
Step 7: Wait and Watch
# Controller watches for next event
# If Pod deleted β gets DELETED event β creates new Pod
# If replica count changed β gets MODIFIED event β adjusts Pods
Real-World Timing:
T+0ms: kubectl apply -f deployment.yaml
T+10ms: Deployment controller creates ReplicaSet
T+20ms: ReplicaSet controller sees new ReplicaSet
T+30ms: ReplicaSet controller creates 3 Pods
T+50ms: Pods scheduled and running
Watch Mechanism:
# Controllers use watch API (HTTP long-polling)
GET /apis/apps/v1/replicasets?watch=true
# Server keeps connection open
# Sends events as they happen:
{"type":"ADDED","object":{...}}
{"type":"MODIFIED","object":{...}}
{"type":"DELETED","object":{...}}
Key Characteristics:
- β Event-driven: React to changes immediately
- β Declarative: Define desired state, controller handles how
- β Self-healing: Continuously tries to reach desired state
- β Eventually consistent: May take time but will converge
1οΈβ£3οΈβ£ How do you check which controllers are running and view their individual performance?
Answer:
Check Running Controllers:
# View controller manager logs
kubectl logs -n kube-system kube-controller-manager-master-1
# Look for "Started controller" lines
kubectl logs -n kube-system kube-controller-manager-master-1 | grep "Started controller"
# Output shows all active controllers:
Started deployment controller
Started replicaset controller
Started daemonset controller
Started statefulset controller
Started job controller
Started node controller
Started service controller
Started endpoint controller
...
Check Controller Metrics:
# Port-forward to controller manager metrics endpoint
kubectl port-forward -n kube-system kube-controller-manager-master-1 10257:10257
# Get metrics (in another terminal)
curl http://localhost:10257/metrics | grep workqueue
# Key metrics per controller:
workqueue_depth{name="replicaset"} # Items waiting to be processed
workqueue_adds_total{name="replicaset"} # Total items processed
workqueue_retries_total{name="replicaset"} # Failed reconciliations
Monitor Controller Performance:
# Deployment controller processing rate
workqueue_work_duration_seconds{name="deployment"}
# ReplicaSet controller queue depth
workqueue_depth{name="replicaset"}
# If queue depth keeps growing β controller is overwhelmed
Check Controller Health:
# Health check endpoint
curl http://localhost:10257/healthz
# Should return: ok
# Detailed check
curl http://localhost:10257/healthz?verbose
# Shows status of each controller
Common Metrics to Monitor:
# Items in queue (should be near 0 normally)
workqueue_depth
# Processing time
workqueue_work_duration_seconds
# Retry rate (high = issues)
workqueue_retries_total
# Success rate
workqueue_work_duration_seconds_bucket
Identify Slow Controllers:
# Find controllers with high processing time
curl http://localhost:10257/metrics | grep duration_seconds | sort -k2 -n
# Example output:
workqueue_work_duration_seconds{name="deployment"} 0.005
workqueue_work_duration_seconds{name="node"} 0.152 β Slow!
Production Monitoring (Prometheus):
# Alert when controller queue depth is high
- alert: ControllerQueueDepthHigh
expr: workqueue_depth{name="replicaset"} > 100
for: 5m
annotations:
summary: "ReplicaSet controller queue depth is high"
1οΈβ£4οΈβ£ How does leader election work in Controller Manager and why is it needed?
Answer:
Why Leader Election?
In HA clusters, you run multiple Controller Manager instances, but only one should be active to avoid conflicts.
Example Problem Without Leader Election:
Controller Manager 1: Sees 2 Pods, creates 1 more (wants 3)
Controller Manager 2: Sees 2 Pods, creates 1 more (wants 3)
Result: 4 Pods instead of 3! β
How Leader Election Works:
Step 1: Create Lease Object
# Lease stored in kube-system namespace
kubectl get lease -n kube-system
NAME HOLDER AGE
kube-controller-manager master-1_abc123def456 5d
Step 2: Acquire Lease
# Controller Manager tries to acquire lease
# First one to acquire becomes leader
# Lease has TTL (default: 15 seconds)
Step 3: Leader Renews Lease
# Leader renews lease every 10 seconds
# If leader fails to renew, lease expires
Step 4: Leader Failure
# Standby instances detect lease expired
# Race to acquire lease
# Winner becomes new leader
Real-World Scenario:
# Initial state
Controller Manager 1 (master-1): LEADER β
Controller Manager 2 (master-2): STANDBY (watching)
Controller Manager 3 (master-3): STANDBY (watching)
# master-1 fails
Controller Manager 1: DOWN
Controller Manager 2: Detects lease expired β Acquires lease β LEADER β
Controller Manager 3: Failed to acquire (too slow) β STANDBY
# Failover time: ~15 seconds
Check Current Leader:
# View lease
kubectl get lease -n kube-system kube-controller-manager -o yaml
# Output:
spec:
holderIdentity: master-1_abc123def456
leaseDurationSeconds: 15
renewTime: "2025-01-01T10:00:00Z"
View Leader Election in Logs:
kubectl logs -n kube-system kube-controller-manager-master-1 | grep leader
# Successful election:
successfully acquired lease kube-system/kube-controller-manager
# Lost leadership:
failed to renew lease, stepping down
Configuration:
# Controller Manager flags
--leader-elect=true # Enable leader election (default)
--leader-elect-lease-duration=15s # Lease duration
--leader-elect-renew-deadline=10s # Renewal deadline
--leader-elect-retry-period=2s # Retry interval
Impact of Leader Election:
During Normal Operation:
- β Only leader processes events
- β Standbys do nothing (just watch lease)
- β No duplicate actions
During Failover:
- βΈοΈ Brief pause (~15 seconds max)
- βΈοΈ No reconciliation during transition
- β New leader takes over
- β Resumes processing from where old leader stopped
Multiple Components Use Leader Election:
- Controller Manager β
- Scheduler β
- Cloud Controller Manager β
1οΈβ£5οΈβ£ A Pod is stuck in Terminating state. Which controller is responsible and how do you fix it?
Answer:
Which Controller Handles This?
ReplicaSet Controller initiated the deletion, but the Pod stuck due to kubelet or finalizers.
Common Causes:
Cause 1: Finalizers Blocking Deletion
# Check if pod has finalizers
kubectl get pod <pod-name> -o yaml | grep finalizers
# Example output:
finalizers:
- kubernetes.io/pv-protection # Waiting for volume cleanup
Fix:
# Remove finalizer to force deletion
kubectl patch pod <pod-name> -p '{"metadata":{"finalizers":null}}'
# Or edit directly
kubectl edit pod <pod-name>
# Remove the finalizers section
Cause 2: Node NotReady
# Check node status
kubectl get nodes
# If node is NotReady, pods can't terminate properly
kubectl describe node <node-name>
Fix:
# Delete pod forcefully
kubectl delete pod <pod-name> --grace-period=0 --force
# Controller will create new pod on healthy node
Cause 3: Storage Volume Unmount Hanging
# Check pod events
kubectl describe pod <pod-name>
# Events:
Warning FailedUnmount Volume unmount failed: device is busy
Fix:
# SSH to node and kill process holding volume
ssh node-1
lsof | grep <volume-path>
kill -9 <pid>
# Then pod will terminate
Cause 4: Container Process Won’t Stop
# Container ignoring SIGTERM
# Waiting for grace period to expire (default: 30s)
Fix:
# Wait for grace period, or force delete
kubectl delete pod <pod-name> --grace-period=0 --force
Complete Troubleshooting Flow:
# 1. Check pod status
kubectl get pod <pod-name> -o yaml
# 2. Check deletion timestamp
deletionTimestamp: "2025-01-01T10:00:00Z" # Pod marked for deletion
deletionGracePeriodSeconds: 30
# 3. Check finalizers
finalizers:
- kubernetes.io/pv-protection
# 4. Check events
kubectl describe pod <pod-name>
# 5. Check node status
kubectl get nodes
# 6. Force delete if necessary
kubectl delete pod <pod-name> --grace-period=0 --force
How Controller Manager is Involved:
# Deployment controller: Scales down replicas
Deployment: replicas 3 β 2
# ReplicaSet controller: Deletes Pod
ReplicaSet controller: Delete excess Pod
# API Server: Marks pod for deletion
Pod status: deletionTimestamp set
# Kubelet: Terminates container
Kubelet: Send SIGTERM to container
# If kubelet fails/node down:
Pod stays in Terminating forever
# Solution: Force delete
kubectl delete pod --force --grace-period=0
Prevention:
# 1. Set appropriate grace periods
spec:
terminationGracePeriodSeconds: 30 # Default
# 2. Handle SIGTERM in application
# Gracefully shutdown on SIGTERM
# 3. Use preStop hooks
spec:
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 15"]
π§ Key Takeaways
Core Concepts:
- π§© Location: Control Plane
- βοΈ Core Role: Maintains desired vs actual state
- π Talks To: API Server (not etcd)
- π§ Contains: Multiple internal controllers
- π§° Supports: Custom Controllers / Operators
Reconciliation:
- β Watch β Get events from API Server
- β Compare β Current vs Desired state
- β Act β Create/Update/Delete resources
- β Repeat β Continuous loop
Operations:
- β
Check logs:
kubectl logs -n kube-system kube-controller-manager-xxx - β
View metrics: Port-forward to
:10257/metrics - β
Monitor queues:
workqueue_depth - β
Check leader:
kubectl get lease -n kube-system - β Troubleshoot: Check events, logs, and resource states
High Availability:
- β Multiple instances run (usually 3)
- β Only one active leader (via lease)
- β Failover time: ~15 seconds
- β Standbys watch and wait
Troubleshooting:
- β Pods not creating β Check ReplicaSet controller
- β Deployments not rolling out β Check Deployment controller logs
- β Pods stuck Terminating β Check finalizers and force delete
- β Controllers slow β Monitor workqueue metrics
- β Leader election issues β Check lease status
π Quick Command Reference
# Check controller manager
kubectl get pods -n kube-system | grep controller-manager
# View logs
kubectl logs -n kube-system kube-controller-manager-xxx
# Check running controllers
kubectl logs -n kube-system kube-controller-manager-xxx | grep "Started controller"
# View metrics
kubectl port-forward -n kube-system kube-controller-manager-xxx 10257:10257
curl http://localhost:10257/metrics
# Check leader
kubectl get lease -n kube-system kube-controller-manager -o yaml
# Force delete stuck pod
kubectl delete pod <name> --grace-period=0 --force
# Remove finalizers
kubectl patch pod <name> -p '{"metadata":{"finalizers":null}}'
Master Controller Manager, master Kubernetes automation! π
#Kubernetes #K8s #ControllerManager #DevOps #SRE #Interview #CloudNative #Automation