Kubernetes Interview Series
Kubernetes Interview Series – Part 1
🎯 Master the Kubernetes Scheduler: 15 Must-Know Questions
From fundamentals to production scenarios – test your scheduler expertise!
📚 Fundamentals (Questions 1-5)
1️⃣ Where does the Kubernetes Scheduler fit into the Kubernetes architecture?
Answer: The Scheduler is a core control plane component that runs on master/control-plane nodes alongside:
- API Server
- Controller Manager
- etcd
🔑 Key Point: It does NOT run on worker nodes. Worker nodes run kubelet, kube-proxy, and container runtime.
Pro Tip: In HA setups, multiple scheduler instances run, but only one is active (leader election).
2️⃣ What is the main function of the Kubernetes Scheduler?
Answer: Its primary job is Pod-to-Node assignment.
The Process:
- 👀 Watches for unscheduled Pods (no
nodeNameset) - 🔍 Evaluates all nodes based on:
- Resource availability (CPU, memory)
- Constraints (affinity, taints, tolerations)
- Policies and priorities
- ✅ Selects the best-fit node
- 📝 Updates Pod spec with
nodeName
Not the Scheduler’s Job: Actually running containers (that’s kubelet’s responsibility)
3️⃣ How does the Scheduler know which Pods need scheduling?
Answer: The Scheduler uses a watch mechanism on the API Server to monitor Pods with:
spec.nodeName= empty/unsetstatus.phase= Pending
Example Pod in need of scheduling:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
nodeName: "" # Empty = needs scheduling
containers:
- name: nginx
image: nginx
After Scheduling:
spec:
nodeName: "worker-node-1" # Scheduler assigns this
4️⃣ What happens after the Scheduler selects a node for a Pod?
Answer: The Handoff Process:
- Scheduler → Updates Pod object via API Server:
spec: nodeName: "selected-node" - API Server → Persists to etcd
- Kubelet (on selected node) → Sees the assignment:
- Pulls container image
- Creates containers via container runtime
- Manages Pod lifecycle
- Pod starts running → Status updated to
Running
Analogy: Scheduler is like a dispatcher assigning taxis to customers. The driver (kubelet) actually picks them up!
5️⃣ What are the two main phases the Scheduler uses to pick a node?
Answer:
Phase 1: Filtering (Predicates) 🔍 Eliminates nodes that can’t run the Pod:
- ❌ Insufficient CPU/memory
- ❌ Taints without matching tolerations
- ❌ Node selector mismatch
- ❌ Pod affinity/anti-affinity violations
- ❌ Volume binding conflicts
Phase 2: Scoring (Priorities) 📊 Ranks remaining nodes (0-100 score):
- ⚖️ Balanced resource allocation
- 📦 Pod spreading across zones
- 🏷️ Image locality (already pulled)
- 🎯 Priority class weights
Example:
- 100 nodes in cluster
- After filtering: 20 feasible nodes
- After scoring: Node with score 95 is selected
🔧 Deep Dive (Questions 6-10)
6️⃣ Does the Scheduler directly run Pods on nodes?
Answer: ❌ Absolutely NOT!
What Scheduler Does:
- ✅ Decides WHERE a Pod should run
- ✅ Updates Pod spec with
nodeName
What Kubelet Does:
- ✅ Actually creates containers
- ✅ Pulls images
- ✅ Manages container lifecycle
- ✅ Reports back to API Server
Real-World Analogy:
- Scheduler = Airport control tower (decides which gate)
- Kubelet = Ground crew (actually parks the plane)
Common Interview Trap: “Does the Scheduler run containers?” → NO!
7️⃣ What are the key sub-components or plugins inside the Scheduler?
Answer:
Scheduling Framework Components:
1. Scheduling Queue 📥
- Stores Pods waiting to be scheduled
- Priority queue (higher priority Pods scheduled first)
- BackoffQ for failed scheduling attempts
2. Filter Plugins 🔍
NodeResourcesFit: Checks CPU/memoryNodeAffinity: Evaluates node selectorsTaintToleration: Checks taints/tolerationsPodTopologySpread: Distributes Pods evenlyVolumeBinding: Ensures PVC availability
3. Score Plugins 📊
NodeResourcesBalancedAllocation: Prefers balanced usageImageLocality: Prefers nodes with images cachedInterPodAffinity: Considers Pod affinity rulesNodeResourcesLeastAllocated: Spreads Pods across nodes
4. Bind Plugin 🔗
- Final step: binds Pod to selected node
- Updates API Server
Extension Points: PreFilter, Filter, PostFilter, PreScore, Score, Reserve, Permit, PreBind, Bind, PostBind
8️⃣ How does the Scheduler interact with other control plane components?
Answer:
Component Interactions:
🔷 With API Server (Primary Interface):
- Watches for unscheduled Pods
- Reads Node resource info
- Updates Pod bindings
- All communication goes through API Server
🔷 With etcd (Indirect via API Server):
- All scheduling decisions persisted in etcd
- Reads cluster state
🔷 With Controller Manager:
- Controllers create Pods (ReplicaSet, Deployment)
- Scheduler assigns them to nodes
- Controllers handle Pod lifecycle
🔷 With Kubelet (Indirect):
- Scheduler writes
nodeName - Kubelet reads and acts on assignment
Data Flow Example:
Deployment → Controller Manager → Creates Pods → API Server →
Scheduler watches → Assigns node → API Server → Kubelet → Runs Pod
9️⃣ Can a cluster have more than one Scheduler?
Answer: ✅ YES! Multiple schedulers are supported and commonly used.
Use Cases:
1. Custom Scheduling Logic
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
schedulerName: gpu-scheduler # Custom scheduler
containers:
- name: ml-app
image: tensorflow/tensorflow:latest-gpu
2. Default Scheduler (if not specified)
spec:
schedulerName: default-scheduler # Implicit if omitted
3. Multiple Schedulers Running Simultaneously
- Default Kubernetes Scheduler
- Custom GPU scheduler
- Custom batch job scheduler
- Third-party schedulers (Volcano, YuniKorn)
How It Works:
- Each scheduler watches for Pods with matching
schedulerName - Only one scheduler processes each Pod
- Leader election for multiple instances of same scheduler
Real-World Example: ML team uses custom scheduler for GPU workloads while web apps use default scheduler.
🔟 What would happen if the Scheduler goes down?
Answer:
Impact Analysis:
✅ What STILL Works:
- Existing Pods keep running (kubelet manages them)
- Services, Ingress, ConfigMaps continue functioning
- Controllers keep monitoring existing resources
- kubectl commands for existing resources work
❌ What BREAKS:
- New Pods remain in
Pendingstate indefinitely - Scaling operations stuck (new replicas won’t schedule)
- Deployments/StatefulSets can’t create new Pods
- Pod rescheduling after node failures won’t work
Real-World Scenario:
# Before Scheduler fails
$ kubectl get pods
NAME STATUS RESTARTS AGE
app-1 Running 0 5m
# Scheduler goes down
$ kubectl scale deployment app --replicas=3
$ kubectl get pods
NAME STATUS RESTARTS AGE
app-1 Running 0 5m
app-2 Pending 0 30s # ⚠️ Stuck!
app-3 Pending 0 30s # ⚠️ Stuck!
# Check events
$ kubectl describe pod app-2
Events:
Warning FailedScheduling 5s default-scheduler 0/3 nodes available: Scheduler not reachable
Recovery:
- In HA setups: Another scheduler instance takes over (leader election)
- Manual restart:
kubectl -n kube-system delete pod kube-scheduler-xxx - Pending Pods automatically scheduled once Scheduler is back
Pro Tip: Always run multiple scheduler replicas in production!
🚀 Advanced Scenarios (Questions 11-15)
1️⃣1️⃣ How does Priority and Preemption work in the Scheduler?
Answer:
Priority Scheduling allows critical Pods to be scheduled before others, and even evict lower-priority Pods if needed.
Setting Up Priority:
1. Create PriorityClass
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000 # Higher = more important
globalDefault: false
description: "Critical production workloads"
2. Assign to Pod
apiVersion: v1
kind: Pod
metadata:
name: critical-app
spec:
priorityClassName: high-priority
containers:
- name: app
image: my-app
Preemption Process:
Scenario: Cluster at capacity, high-priority Pod arrives
- Scheduler finds no available nodes (all resources used)
- Identifies preemption candidates:
- Lower priority Pods on suitable nodes
- Calculates minimum Pods to evict
- Evicts lower-priority Pods (graceful termination)
- Schedules high-priority Pod once resources free up
Example:
# Before: Cluster full with low-priority Pods
$ kubectl get pods
NAME PRIORITY STATUS
web-1 100 Running
web-2 100 Running
web-3 100 Running
# High-priority Pod arrives
$ kubectl create -f critical-pod.yaml
# After: Low-priority Pod evicted
$ kubectl get pods
NAME PRIORITY STATUS
critical-app 1000000 Running
web-1 100 Running
web-2 100 Terminating # ⚠️ Preempted
web-3 100 Running
Real-World Use Cases:
- Database backups (lower priority) vs live queries (higher priority)
- Batch jobs (can be preempted) vs API services (critical)
- Development workloads vs production workloads
Best Practice: Always set PodDisruptionBudgets to prevent too many Pods being preempted simultaneously!
1️⃣2️⃣ You have a Pod stuck in Pending state with “0/5 nodes are available: Insufficient cpu”. How do you troubleshoot?
Answer:
Step-by-Step Troubleshooting:
1. Check Pod Resource Requests
$ kubectl describe pod stuck-pod
...
Requests:
cpu: 4000m # Requesting 4 CPUs
memory: 8Gi
Events:
Warning FailedScheduling 0/5 nodes are available: 5 Insufficient cpu.
2. Check Node Allocatable Resources
$ kubectl describe nodes | grep -A 5 "Allocated resources"
Allocated resources:
Resource Requests Limits
cpu 3800m (95%) 4000m (100%)
memory 7.5Gi (94%) 8Gi (100%)
3. Identify Resource Hogs
$ kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
worker-node-1 3850m 96% 7680Mi 96%
worker-node-2 3900m 97% 7890Mi 98%
4. Check for Taints/Tolerations
$ kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
NAME TAINTS
worker-node-1 [map[effect:NoSchedule key:node.kubernetes.io/disk-pressure]]
Solutions:
Option 1: Reduce Pod Resource Requests
spec:
containers:
- name: app
resources:
requests:
cpu: 2000m # Reduced from 4000m
memory: 4Gi # Reduced from 8Gi
Option 2: Scale Down Other Pods
$ kubectl scale deployment low-priority-app --replicas=0
Option 3: Add More Nodes
# In cloud environments
$ eksctl scale nodegroup --cluster=my-cluster --name=ng-1 --nodes=3
Option 4: Use Cluster Autoscaler Automatically adds nodes when Pods can’t be scheduled
Diagnostic Commands:
# See all pending Pods and reasons
$ kubectl get events --field-selector involvedObject.kind=Pod --sort-by='.lastTimestamp'
# Check scheduler logs
$ kubectl logs -n kube-system kube-scheduler-xxx
# Simulate scheduling
$ kubectl describe node worker-node-1 | grep -A 10 "Allocated resources"
Real-World Tip: Always set requests lower than actual usage and use limits for safety!
1️⃣3️⃣ What is Pod Topology Spread Constraints and when would you use it?
Answer:
Pod Topology Spread Constraints ensure Pods are evenly distributed across failure domains (zones, nodes, racks) for high availability.
The Problem It Solves:
Without constraints, scheduler might place all replicas on same zone/node:
Zone A: 5 Pods ⚠️ All eggs in one basket!
Zone B: 0 Pods
Zone C: 0 Pods
With Topology Spread:
Zone A: 2 Pods ✅ Distributed
Zone B: 2 Pods ✅ Distributed
Zone C: 1 Pod ✅ Distributed
Configuration Example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 6
template:
spec:
topologySpreadConstraints:
- maxSkew: 1 # Max difference between zones
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule # or ScheduleAnyway
labelSelector:
matchLabels:
app: web
containers:
- name: nginx
image: nginx
Key Parameters:
1. maxSkew
- Maximum allowed difference in Pod count
maxSkew: 1→ Zones can differ by at most 1 Pod- Lower = more even distribution
2. topologyKey
- Node label to use as topology domain
- Common keys:
topology.kubernetes.io/zone(AZs)kubernetes.io/hostname(individual nodes)topology.kubernetes.io/region
3. whenUnsatisfiable
DoNotSchedule: Strict (Pod stays Pending if can’t spread)ScheduleAnyway: Soft (tries to spread but schedules anyway)
Real-World Scenarios:
Scenario 1: High Availability Across AZs
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
# Ensures no single AZ failure takes down all Pods
Scenario 2: Even Load Across Nodes
topologySpreadConstraints:
- maxSkew: 2
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
# Spreads across nodes but doesn't block scheduling
Scenario 3: Multi-Region Distribution
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/region
whenUnsatisfiable: DoNotSchedule
# For global applications
Comparison with Pod Anti-Affinity:
| Feature | Topology Spread | Pod Anti-Affinity |
|---|---|---|
| Granularity | Fine control (maxSkew) | Binary (yes/no) |
| Use Case | Even distribution | Avoid co-location |
| Flexibility | More options | Less flexible |
| Performance | Better at scale | Can be slow |
Best Practice for Production:
topologySpreadConstraints:
# Zone-level spreading (primary)
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
# Node-level spreading (secondary)
- maxSkew: 2
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
Debugging:
# Check Pod distribution
$ kubectl get pods -o wide -l app=web --sort-by=.spec.nodeName
# See why Pod didn't spread
$ kubectl describe pod my-pod | grep -A 10 "Events"
1️⃣4️⃣ How would you implement custom scheduling logic without writing a full custom scheduler?
Answer:
Three Approaches:
Option 1: Scheduler Extender (Webhook-Based) 🔌
Extends default scheduler with HTTP webhooks.
How It Works:
- Default scheduler runs Filter/Score phases
- Calls your extender webhook at specific points
- Your service returns additional filtering/scoring
- Scheduler combines results
Example Extender Configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: scheduler-config
namespace: kube-system
data:
scheduler-config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
extenders:
- urlPrefix: "http://my-extender:8080"
filterVerb: "filter"
prioritizeVerb: "prioritize"
weight: 1
enableHTTPS: false
nodeCacheCapable: true
ignorable: false
Extender Service (Python Example):
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/filter', methods=['POST'])
def filter_nodes():
data = request.json
pod = data['Pod']
nodes = data['Nodes']['items']
# Custom filtering logic
# Example: Only schedule on nodes with GPU
filtered_nodes = [
node for node in nodes
if 'nvidia.com/gpu' in node['status']['allocatable']
]
return jsonify({
'Nodes': {'items': filtered_nodes},
'FailedNodes': {},
'Error': ''
})
@app.route('/prioritize', methods=['POST'])
def prioritize_nodes():
data = request.json
nodes = data['Nodes']['items']
# Custom scoring logic
# Example: Prefer nodes with more available memory
scores = []
for node in nodes:
available_mem = get_available_memory(node)
score = min(int(available_mem / 1000000), 100)
scores.append({'host': node['metadata']['name'], 'score': score})
return jsonify(scores)
Use Cases:
- GPU-aware scheduling
- License-based placement
- Custom cost optimization
- Integration with external systems
Option 2: Scheduling Profiles 📋
Configure different scheduling behaviors without code.
Example:
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
# Profile 1: Default
- schedulerName: default-scheduler
plugins:
score:
enabled:
- name: NodeResourcesBalancedAllocation
weight: 1
# Profile 2: Bin Packing (pack tightly)
- schedulerName: bin-packing-scheduler
plugins:
score:
enabled:
- name: NodeResourcesMostAllocated # Opposite of default!
weight: 1
disabled:
- name: NodeResourcesBalancedAllocation
# Profile 3: GPU Optimized
- schedulerName: gpu-scheduler
plugins:
filter:
enabled:
- name: NodeResourcesFit
score:
enabled:
- name: NodeResourcesMostAllocated
Using Different Profiles:
# Batch job - use bin packing
apiVersion: batch/v1
kind: Job
metadata:
name: data-processing
spec:
template:
spec:
schedulerName: bin-packing-scheduler
containers:
- name: processor
image: data-processor
# ML training - use GPU scheduler
apiVersion: v1
kind: Pod
metadata:
name: ml-training
spec:
schedulerName: gpu-scheduler
containers:
- name: trainer
image: tensorflow/tensorflow:latest-gpu
Option 3: Admission Controllers + Mutating Webhooks 🎯
Modify Pod specs before scheduling.
Example Use Case: Auto-add node selectors based on namespace
Mutating Webhook:
func mutatePod(pod *corev1.Pod) {
// Add node selector for production namespace
if pod.Namespace == "production" {
if pod.Spec.NodeSelector == nil {
pod.Spec.NodeSelector = make(map[string]string)
}
pod.Spec.NodeSelector["tier"] = "production"
pod.Spec.NodeSelector["ssd"] = "true"
}
// Add toleration for batch jobs
if pod.Labels["workload"] == "batch" {
pod.Spec.Tolerations = append(pod.Spec.Tolerations,
corev1.Toleration{
Key: "batch",
Operator: corev1.TolerationOpEqual,
Value: "true",
Effect: corev1.TaintEffectNoSchedule,
})
}
}
Comparison Table:
| Approach | Complexity | Flexibility | Use Case |
|---|---|---|---|
| Scheduler Extender | Medium | High | Custom logic, external integration |
| Scheduling Profiles | Low | Medium | Different policies per workload |
| Admission Webhooks | Medium | High | Pre-scheduling Pod modifications |
Recommendation:
- Simple needs → Scheduling Profiles
- External integration → Scheduler Extender
- Pod mutation → Admission Webhooks
- Complex custom logic → Full custom scheduler
1️⃣5️⃣ In a multi-tenant cluster, how would you ensure fair resource allocation and prevent one team from starving others?
Answer:
Multi-Layered Approach:
Layer 1: ResourceQuotas (Namespace-Level Limits) 🎯
Prevent teams from consuming more than their fair share.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "100" # Total CPU across all Pods
requests.memory: 200Gi # Total memory
limits.cpu: "200" # Max CPU with limits
limits.memory: 400Gi # Max memory
persistentvolumeclaims: "10"
pods: "50" # Max Pod count
services.loadbalancers: "3"
Check Quota Usage:
$ kubectl describe resourcequota -n team-alpha
Name: team-alpha-quota
Resource Used Hard
-------- ---- ----
requests.cpu 85 100 # ⚠️ 85% used!
requests.memory 180Gi 200Gi
pods 45 50
Layer 2: LimitRanges (Pod-Level Defaults) 📏
Prevent individual Pods from being too large.
apiVersion: v1
kind: LimitRange
metadata:
name: pod-limits
namespace: team-alpha
spec:
limits:
# Container limits
- type: Container
max:
cpu: "4" # No single container > 4 CPU
memory: 8Gi # No single container > 8Gi
min:
cpu: 100m # Minimum request
memory: 128Mi
default:
cpu: 500m # Default if not specified
memory: 512Mi
defaultRequest:
cpu: 250m
memory: 256Mi
# Pod limits
- type: Pod
max:
cpu: "8" # No Pod > 8 CPU total
memory: 16Gi
Effect:
# Pod without requests/limits
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app
image: my-image
# No resources specified
# After LimitRange mutation:
spec:
containers:
- name: app
resources:
requests:
cpu: 250m # Auto-added
memory: 256Mi
limits:
cpu: 500m # Auto-added
memory: 512Mi
Layer 3: Priority Classes (Workload Importance) 🏆
Ensure critical workloads get scheduled first.
# Production workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: production-high
value: 1000000
globalDefault: false
description: "Production critical services"
---
# Development workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: development-low
value: 100
preemptionPolicy: Never # Can't preempt others
description: "Development and testing"
Assign to Pods:
# Production Pod
apiVersion: v1
kind: Pod
metadata:
name: api-server
namespace: team-alpha
spec:
priorityClassName: production-high
containers:
- name: api
image: api-server
# Dev Pod
apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: team-beta
spec:
priorityClassName: development-low
containers:
- name: test
image: test-app
Layer 4: Node Pools / Taints (Physical Isolation) 🏭
Dedicate nodes to specific teams.
Setup:
# Label nodes for each team
$ kubectl label nodes node-1 node-2 node-3 team=alpha
$ kubectl label nodes node-4 node-5 node-6 team=beta
# Add taints to prevent other teams
$ kubectl taint nodes node-1 node-2 node-3 team=alpha:NoSchedule
$ kubectl taint nodes node-4 node-5 node-6 team=beta:NoSchedule
Team-A Pods (with toleration):
apiVersion: v1
kind: Pod
metadata:
name: team-a-pod
spec:
nodeSelector:
team: alpha
tolerations:
- key: team
operator: Equal
value: alpha
effect: NoSchedule
containers:
- name: app
image: team-a-image
Result:
- Team A Pods → Only run on nodes 1-3
- Team B Pods → Only run on nodes 4-6
- No cross-team interference
Layer 5: Cluster Autoscaler Configuration ⚙️
Fair autoscaling per team.
# Separate node groups per team with autoscaling
# In cloud provider (AWS EKS example):
# Team Alpha node group
eksctl create nodegroup \
--cluster=my-cluster \
--name=team-alpha-ng \
--node-labels=team=alpha \
--node-taints=team=alpha:NoSchedule \
--nodes-min=3 \
--nodes-max=10
# Team Beta node group
eksctl create nodegroup \
--cluster=my-cluster \
--name=team-beta-ng \
--node-labels=team=beta \
--node-taints=team=beta:NoSchedule \
--nodes-min=3 \
--nodes-max=10
Complete Multi-Tenant Setup Example:
# 1. Create namespace per team
apiVersion: v1
kind: Namespace
metadata:
name: team-alpha
labels:
team: alpha
environment: production
---
# 2. ResourceQuota
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "100"
requests.memory: 200Gi
pods: "50"
---
# 3. LimitRange
apiVersion: v1
kind: LimitRange
metadata:
name: team-alpha-limits
namespace: team-alpha
spec:
limits:
- type: Container
max:
cpu: "4"
memory: 8Gi
defaultRequest:
cpu: 250m
memory: 256Mi
---
# 4. NetworkPolicy (bonus: network isolation)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: team-alpha-isolation
namespace: team-alpha
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
team: alpha # Only allow traffic from same team
Monitoring Fair Allocation:
# Check quota usage across namespaces
$ kubectl get resourcequota --all-namespaces
# Check which team is using most resources
$ kubectl top pods --all-namespaces --sort-by=cpu | head -20
# See pending Pods (potential starvation)
$ kubectl get pods --all-namespaces --field-selector=status.phase=Pending
# Audit events for quota exceeded
$ kubectl get events --all-namespaces | grep "exceeded quota"
Alert When Team Hits Limits:
# Prometheus alert rule
- alert: NamespaceQuotaExceeded
expr: |
kube_resourcequota{type="hard"} -
kube_resourcequota{type="used"} < 0
for: 5m
labels:
severity: warning
annotations:
description: "Team {{ $labels.namespace }} exceeded quota"
Best Practices:
- ✅ Start with generous quotas, adjust based on usage
- ✅ Use PriorityClasses to protect production workloads
- ✅ Monitor quota usage and set alerts
- ✅ Document team resource allocation policies
- ✅ Regular reviews and adjustments
- ✅ Consider cost allocation tools (Kubecost, OpenCost)
🎓 Bonus Tips for Interviews
Red Flags to Avoid ❌
- ❌ “Scheduler runs containers” (No, that’s kubelet!)
- ❌ “Only one scheduler per cluster” (Multiple are supported!)
- ❌ “Scheduler directly talks to kubelet” (All via API Server!)
- ❌ “Scheduler stores state” (It’s stateless, etcd has state!)
Pro Interview Answers ✅
- ✅ Mention “Filter and Score” phases
- ✅ Reference specific plugins (NodeResourcesFit, NodeAffinity)
- ✅ Discuss production scenarios (preemption, topology spread)
- ✅ Show troubleshooting knowledge
- ✅ Mention HA and leader election
Follow-Up Topics to Study 📚
- Custom Resource Definitions (CRDs) for scheduling
- Descheduler (rebalances Pods post-scheduling)
- Cluster Autoscaler integration
- Volcano / YuniKorn schedulers
- Scheduling latency optimization
- Multi-cluster scheduling
📊 Quick Reference Cheat Sheet
SCHEDULER WORKFLOW
==================
1. Watch API Server for Pods with nodeName = ""
2. FILTER: Remove incompatible nodes
3. SCORE: Rank remaining nodes (0-100)
4. BIND: Update Pod with selected nodeName
5. Kubelet takes over → Runs container
KEY PLUGINS
===========
Filter:
• NodeResourcesFit - CPU/memory check
• NodeAffinity - Node selectors
• TaintToleration - Taints/tolerations
• PodTopologySpread - Even distribution
Score:
• NodeResourcesBalancedAllocation - Balanced usage
• ImageLocality - Cached images
• InterPodAffinity - Pod affinity
• NodeResourcesLeastAllocated - Spread Pods
TROUBLESHOOTING COMMANDS
========================
kubectl describe pod <pod-name>
kubectl get events --sort-by=.lastTimestamp
kubectl logs -n kube-system kube-scheduler-xxx
kubectl top nodes
kubectl describe nodes | grep -A 5 "Allocated"
🚀 What’s Next?
Practice Scenarios:
- Set up Priority Classes in a test cluster
- Implement Pod Topology Spread Constraints
- Configure ResourceQuotas for multi-tenancy
- Write a simple scheduler extender
- Troubleshoot real Pending Pods
Further Reading:
Found this helpful? Share your scheduler war stories in the comments! 👇
Next in series: Part 2 – Kubernetes Controllers Deep Dive
#Kubernetes #K8s #DevOps #SRE #Interview #CloudNative #Scheduler #TechInterview #Learning
💡 Pro Tip: Star this for your next interview prep! These questions cover 90% of scheduler-related interview topics.