Kubernetes Interview Series

Kubernetes Interview Series – Part 1

🎯 Master the Kubernetes Scheduler: 15 Must-Know Questions

From fundamentals to production scenarios – test your scheduler expertise!


📚 Fundamentals (Questions 1-5)

1️⃣ Where does the Kubernetes Scheduler fit into the Kubernetes architecture?

Answer: The Scheduler is a core control plane component that runs on master/control-plane nodes alongside:

  • API Server
  • Controller Manager
  • etcd

🔑 Key Point: It does NOT run on worker nodes. Worker nodes run kubelet, kube-proxy, and container runtime.

Pro Tip: In HA setups, multiple scheduler instances run, but only one is active (leader election).


2️⃣ What is the main function of the Kubernetes Scheduler?

Answer: Its primary job is Pod-to-Node assignment.

The Process:

  1. 👀 Watches for unscheduled Pods (no nodeName set)
  2. 🔍 Evaluates all nodes based on:
    • Resource availability (CPU, memory)
    • Constraints (affinity, taints, tolerations)
    • Policies and priorities
  3. ✅ Selects the best-fit node
  4. 📝 Updates Pod spec with nodeName

Not the Scheduler’s Job: Actually running containers (that’s kubelet’s responsibility)


3️⃣ How does the Scheduler know which Pods need scheduling?

Answer: The Scheduler uses a watch mechanism on the API Server to monitor Pods with:

  • spec.nodeName = empty/unset
  • status.phase = Pending

Example Pod in need of scheduling:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  nodeName: ""  # Empty = needs scheduling
  containers:
  - name: nginx
    image: nginx

After Scheduling:

spec:
  nodeName: "worker-node-1"  # Scheduler assigns this

4️⃣ What happens after the Scheduler selects a node for a Pod?

Answer: The Handoff Process:

  1. Scheduler → Updates Pod object via API Server: spec: nodeName: "selected-node"
  2. API Server → Persists to etcd
  3. Kubelet (on selected node) → Sees the assignment:
    • Pulls container image
    • Creates containers via container runtime
    • Manages Pod lifecycle
  4. Pod starts running → Status updated to Running

Analogy: Scheduler is like a dispatcher assigning taxis to customers. The driver (kubelet) actually picks them up!


5️⃣ What are the two main phases the Scheduler uses to pick a node?

Answer:

Phase 1: Filtering (Predicates) 🔍 Eliminates nodes that can’t run the Pod:

  • ❌ Insufficient CPU/memory
  • ❌ Taints without matching tolerations
  • ❌ Node selector mismatch
  • ❌ Pod affinity/anti-affinity violations
  • ❌ Volume binding conflicts

Phase 2: Scoring (Priorities) 📊 Ranks remaining nodes (0-100 score):

  • ⚖️ Balanced resource allocation
  • 📦 Pod spreading across zones
  • 🏷️ Image locality (already pulled)
  • 🎯 Priority class weights

Example:

  • 100 nodes in cluster
  • After filtering: 20 feasible nodes
  • After scoring: Node with score 95 is selected

🔧 Deep Dive (Questions 6-10)

6️⃣ Does the Scheduler directly run Pods on nodes?

Answer:Absolutely NOT!

What Scheduler Does:

  • ✅ Decides WHERE a Pod should run
  • ✅ Updates Pod spec with nodeName

What Kubelet Does:

  • ✅ Actually creates containers
  • ✅ Pulls images
  • ✅ Manages container lifecycle
  • ✅ Reports back to API Server

Real-World Analogy:

  • Scheduler = Airport control tower (decides which gate)
  • Kubelet = Ground crew (actually parks the plane)

Common Interview Trap: “Does the Scheduler run containers?” → NO!


7️⃣ What are the key sub-components or plugins inside the Scheduler?

Answer:

Scheduling Framework Components:

1. Scheduling Queue 📥

  • Stores Pods waiting to be scheduled
  • Priority queue (higher priority Pods scheduled first)
  • BackoffQ for failed scheduling attempts

2. Filter Plugins 🔍

  • NodeResourcesFit: Checks CPU/memory
  • NodeAffinity: Evaluates node selectors
  • TaintToleration: Checks taints/tolerations
  • PodTopologySpread: Distributes Pods evenly
  • VolumeBinding: Ensures PVC availability

3. Score Plugins 📊

  • NodeResourcesBalancedAllocation: Prefers balanced usage
  • ImageLocality: Prefers nodes with images cached
  • InterPodAffinity: Considers Pod affinity rules
  • NodeResourcesLeastAllocated: Spreads Pods across nodes

4. Bind Plugin 🔗

  • Final step: binds Pod to selected node
  • Updates API Server

Extension Points: PreFilter, Filter, PostFilter, PreScore, Score, Reserve, Permit, PreBind, Bind, PostBind


8️⃣ How does the Scheduler interact with other control plane components?

Answer:

Component Interactions:

🔷 With API Server (Primary Interface):

  • Watches for unscheduled Pods
  • Reads Node resource info
  • Updates Pod bindings
  • All communication goes through API Server

🔷 With etcd (Indirect via API Server):

  • All scheduling decisions persisted in etcd
  • Reads cluster state

🔷 With Controller Manager:

  • Controllers create Pods (ReplicaSet, Deployment)
  • Scheduler assigns them to nodes
  • Controllers handle Pod lifecycle

🔷 With Kubelet (Indirect):

  • Scheduler writes nodeName
  • Kubelet reads and acts on assignment

Data Flow Example:

Deployment → Controller Manager → Creates Pods → API Server → 
Scheduler watches → Assigns node → API Server → Kubelet → Runs Pod


9️⃣ Can a cluster have more than one Scheduler?

Answer:YES! Multiple schedulers are supported and commonly used.

Use Cases:

1. Custom Scheduling Logic

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  schedulerName: gpu-scheduler  # Custom scheduler
  containers:
  - name: ml-app
    image: tensorflow/tensorflow:latest-gpu

2. Default Scheduler (if not specified)

spec:
  schedulerName: default-scheduler  # Implicit if omitted

3. Multiple Schedulers Running Simultaneously

  • Default Kubernetes Scheduler
  • Custom GPU scheduler
  • Custom batch job scheduler
  • Third-party schedulers (Volcano, YuniKorn)

How It Works:

  • Each scheduler watches for Pods with matching schedulerName
  • Only one scheduler processes each Pod
  • Leader election for multiple instances of same scheduler

Real-World Example: ML team uses custom scheduler for GPU workloads while web apps use default scheduler.


🔟 What would happen if the Scheduler goes down?

Answer:

Impact Analysis:

✅ What STILL Works:

  • Existing Pods keep running (kubelet manages them)
  • Services, Ingress, ConfigMaps continue functioning
  • Controllers keep monitoring existing resources
  • kubectl commands for existing resources work

❌ What BREAKS:

  • New Pods remain in Pending state indefinitely
  • Scaling operations stuck (new replicas won’t schedule)
  • Deployments/StatefulSets can’t create new Pods
  • Pod rescheduling after node failures won’t work

Real-World Scenario:

# Before Scheduler fails
$ kubectl get pods
NAME          STATUS    RESTARTS   AGE
app-1         Running   0          5m

# Scheduler goes down

$ kubectl scale deployment app --replicas=3
$ kubectl get pods
NAME          STATUS    RESTARTS   AGE
app-1         Running   0          5m
app-2         Pending   0          30s  # ⚠️ Stuck!
app-3         Pending   0          30s  # ⚠️ Stuck!

# Check events
$ kubectl describe pod app-2
Events:
  Warning  FailedScheduling  5s  default-scheduler  0/3 nodes available: Scheduler not reachable

Recovery:

  • In HA setups: Another scheduler instance takes over (leader election)
  • Manual restart: kubectl -n kube-system delete pod kube-scheduler-xxx
  • Pending Pods automatically scheduled once Scheduler is back

Pro Tip: Always run multiple scheduler replicas in production!


🚀 Advanced Scenarios (Questions 11-15)

1️⃣1️⃣ How does Priority and Preemption work in the Scheduler?

Answer:

Priority Scheduling allows critical Pods to be scheduled before others, and even evict lower-priority Pods if needed.

Setting Up Priority:

1. Create PriorityClass

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000  # Higher = more important
globalDefault: false
description: "Critical production workloads"

2. Assign to Pod

apiVersion: v1
kind: Pod
metadata:
  name: critical-app
spec:
  priorityClassName: high-priority
  containers:
  - name: app
    image: my-app

Preemption Process:

Scenario: Cluster at capacity, high-priority Pod arrives

  1. Scheduler finds no available nodes (all resources used)
  2. Identifies preemption candidates:
    • Lower priority Pods on suitable nodes
    • Calculates minimum Pods to evict
  3. Evicts lower-priority Pods (graceful termination)
  4. Schedules high-priority Pod once resources free up

Example:

# Before: Cluster full with low-priority Pods
$ kubectl get pods
NAME          PRIORITY    STATUS
web-1         100         Running
web-2         100         Running
web-3         100         Running

# High-priority Pod arrives
$ kubectl create -f critical-pod.yaml

# After: Low-priority Pod evicted
$ kubectl get pods
NAME          PRIORITY    STATUS
critical-app  1000000     Running
web-1         100         Running
web-2         100         Terminating  # ⚠️ Preempted
web-3         100         Running

Real-World Use Cases:

  • Database backups (lower priority) vs live queries (higher priority)
  • Batch jobs (can be preempted) vs API services (critical)
  • Development workloads vs production workloads

Best Practice: Always set PodDisruptionBudgets to prevent too many Pods being preempted simultaneously!


1️⃣2️⃣ You have a Pod stuck in Pending state with “0/5 nodes are available: Insufficient cpu”. How do you troubleshoot?

Answer:

Step-by-Step Troubleshooting:

1. Check Pod Resource Requests

$ kubectl describe pod stuck-pod
...
Requests:
  cpu:     4000m  # Requesting 4 CPUs
  memory:  8Gi
Events:
  Warning  FailedScheduling  0/5 nodes are available: 5 Insufficient cpu.

2. Check Node Allocatable Resources

$ kubectl describe nodes | grep -A 5 "Allocated resources"
Allocated resources:
  Resource           Requests    Limits
  cpu                3800m (95%)  4000m (100%)
  memory             7.5Gi (94%)  8Gi (100%)

3. Identify Resource Hogs

$ kubectl top nodes
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
worker-node-1  3850m        96%    7680Mi          96%
worker-node-2  3900m        97%    7890Mi          98%

4. Check for Taints/Tolerations

$ kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
NAME           TAINTS
worker-node-1  [map[effect:NoSchedule key:node.kubernetes.io/disk-pressure]]

Solutions:

Option 1: Reduce Pod Resource Requests

spec:
  containers:
  - name: app
    resources:
      requests:
        cpu: 2000m     # Reduced from 4000m
        memory: 4Gi    # Reduced from 8Gi

Option 2: Scale Down Other Pods

$ kubectl scale deployment low-priority-app --replicas=0

Option 3: Add More Nodes

# In cloud environments
$ eksctl scale nodegroup --cluster=my-cluster --name=ng-1 --nodes=3

Option 4: Use Cluster Autoscaler Automatically adds nodes when Pods can’t be scheduled

Diagnostic Commands:

# See all pending Pods and reasons
$ kubectl get events --field-selector involvedObject.kind=Pod --sort-by='.lastTimestamp'

# Check scheduler logs
$ kubectl logs -n kube-system kube-scheduler-xxx

# Simulate scheduling
$ kubectl describe node worker-node-1 | grep -A 10 "Allocated resources"

Real-World Tip: Always set requests lower than actual usage and use limits for safety!


1️⃣3️⃣ What is Pod Topology Spread Constraints and when would you use it?

Answer:

Pod Topology Spread Constraints ensure Pods are evenly distributed across failure domains (zones, nodes, racks) for high availability.

The Problem It Solves:

Without constraints, scheduler might place all replicas on same zone/node:

Zone A: 5 Pods  ⚠️ All eggs in one basket!
Zone B: 0 Pods
Zone C: 0 Pods

With Topology Spread:

Zone A: 2 Pods  ✅ Distributed
Zone B: 2 Pods  ✅ Distributed
Zone C: 1 Pod   ✅ Distributed

Configuration Example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 6
  template:
    spec:
      topologySpreadConstraints:
      - maxSkew: 1                    # Max difference between zones
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule  # or ScheduleAnyway
        labelSelector:
          matchLabels:
            app: web
      containers:
      - name: nginx
        image: nginx

Key Parameters:

1. maxSkew

  • Maximum allowed difference in Pod count
  • maxSkew: 1 → Zones can differ by at most 1 Pod
  • Lower = more even distribution

2. topologyKey

  • Node label to use as topology domain
  • Common keys:
    • topology.kubernetes.io/zone (AZs)
    • kubernetes.io/hostname (individual nodes)
    • topology.kubernetes.io/region

3. whenUnsatisfiable

  • DoNotSchedule: Strict (Pod stays Pending if can’t spread)
  • ScheduleAnyway: Soft (tries to spread but schedules anyway)

Real-World Scenarios:

Scenario 1: High Availability Across AZs

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: topology.kubernetes.io/zone
  whenUnsatisfiable: DoNotSchedule
  # Ensures no single AZ failure takes down all Pods

Scenario 2: Even Load Across Nodes

topologySpreadConstraints:
- maxSkew: 2
  topologyKey: kubernetes.io/hostname
  whenUnsatisfiable: ScheduleAnyway
  # Spreads across nodes but doesn't block scheduling

Scenario 3: Multi-Region Distribution

topologySpreadConstraints:
- maxSkew: 1
  topologyKey: topology.kubernetes.io/region
  whenUnsatisfiable: DoNotSchedule
  # For global applications

Comparison with Pod Anti-Affinity:

FeatureTopology SpreadPod Anti-Affinity
GranularityFine control (maxSkew)Binary (yes/no)
Use CaseEven distributionAvoid co-location
FlexibilityMore optionsLess flexible
PerformanceBetter at scaleCan be slow

Best Practice for Production:

topologySpreadConstraints:
# Zone-level spreading (primary)
- maxSkew: 1
  topologyKey: topology.kubernetes.io/zone
  whenUnsatisfiable: DoNotSchedule
# Node-level spreading (secondary)
- maxSkew: 2
  topologyKey: kubernetes.io/hostname
  whenUnsatisfiable: ScheduleAnyway

Debugging:

# Check Pod distribution
$ kubectl get pods -o wide -l app=web --sort-by=.spec.nodeName

# See why Pod didn't spread
$ kubectl describe pod my-pod | grep -A 10 "Events"

1️⃣4️⃣ How would you implement custom scheduling logic without writing a full custom scheduler?

Answer:

Three Approaches:


Option 1: Scheduler Extender (Webhook-Based) 🔌

Extends default scheduler with HTTP webhooks.

How It Works:

  1. Default scheduler runs Filter/Score phases
  2. Calls your extender webhook at specific points
  3. Your service returns additional filtering/scoring
  4. Scheduler combines results

Example Extender Configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    extenders:
    - urlPrefix: "http://my-extender:8080"
      filterVerb: "filter"
      prioritizeVerb: "prioritize"
      weight: 1
      enableHTTPS: false
      nodeCacheCapable: true
      ignorable: false

Extender Service (Python Example):

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/filter', methods=['POST'])
def filter_nodes():
    data = request.json
    pod = data['Pod']
    nodes = data['Nodes']['items']
    
    # Custom filtering logic
    # Example: Only schedule on nodes with GPU
    filtered_nodes = [
        node for node in nodes
        if 'nvidia.com/gpu' in node['status']['allocatable']
    ]
    
    return jsonify({
        'Nodes': {'items': filtered_nodes},
        'FailedNodes': {},
        'Error': ''
    })

@app.route('/prioritize', methods=['POST'])
def prioritize_nodes():
    data = request.json
    nodes = data['Nodes']['items']
    
    # Custom scoring logic
    # Example: Prefer nodes with more available memory
    scores = []
    for node in nodes:
        available_mem = get_available_memory(node)
        score = min(int(available_mem / 1000000), 100)
        scores.append({'host': node['metadata']['name'], 'score': score})
    
    return jsonify(scores)

Use Cases:

  • GPU-aware scheduling
  • License-based placement
  • Custom cost optimization
  • Integration with external systems

Option 2: Scheduling Profiles 📋

Configure different scheduling behaviors without code.

Example:

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
# Profile 1: Default
- schedulerName: default-scheduler
  plugins:
    score:
      enabled:
      - name: NodeResourcesBalancedAllocation
        weight: 1

# Profile 2: Bin Packing (pack tightly)
- schedulerName: bin-packing-scheduler
  plugins:
    score:
      enabled:
      - name: NodeResourcesMostAllocated  # Opposite of default!
        weight: 1
      disabled:
      - name: NodeResourcesBalancedAllocation

# Profile 3: GPU Optimized
- schedulerName: gpu-scheduler
  plugins:
    filter:
      enabled:
      - name: NodeResourcesFit
    score:
      enabled:
      - name: NodeResourcesMostAllocated

Using Different Profiles:

# Batch job - use bin packing
apiVersion: batch/v1
kind: Job
metadata:
  name: data-processing
spec:
  template:
    spec:
      schedulerName: bin-packing-scheduler
      containers:
      - name: processor
        image: data-processor

# ML training - use GPU scheduler
apiVersion: v1
kind: Pod
metadata:
  name: ml-training
spec:
  schedulerName: gpu-scheduler
  containers:
  - name: trainer
    image: tensorflow/tensorflow:latest-gpu

Option 3: Admission Controllers + Mutating Webhooks 🎯

Modify Pod specs before scheduling.

Example Use Case: Auto-add node selectors based on namespace

Mutating Webhook:

func mutatePod(pod *corev1.Pod) {
    // Add node selector for production namespace
    if pod.Namespace == "production" {
        if pod.Spec.NodeSelector == nil {
            pod.Spec.NodeSelector = make(map[string]string)
        }
        pod.Spec.NodeSelector["tier"] = "production"
        pod.Spec.NodeSelector["ssd"] = "true"
    }
    
    // Add toleration for batch jobs
    if pod.Labels["workload"] == "batch" {
        pod.Spec.Tolerations = append(pod.Spec.Tolerations,
            corev1.Toleration{
                Key:      "batch",
                Operator: corev1.TolerationOpEqual,
                Value:    "true",
                Effect:   corev1.TaintEffectNoSchedule,
            })
    }
}

Comparison Table:

ApproachComplexityFlexibilityUse Case
Scheduler ExtenderMediumHighCustom logic, external integration
Scheduling ProfilesLowMediumDifferent policies per workload
Admission WebhooksMediumHighPre-scheduling Pod modifications

Recommendation:

  • Simple needs → Scheduling Profiles
  • External integration → Scheduler Extender
  • Pod mutation → Admission Webhooks
  • Complex custom logic → Full custom scheduler

1️⃣5️⃣ In a multi-tenant cluster, how would you ensure fair resource allocation and prevent one team from starving others?

Answer:

Multi-Layered Approach:


Layer 1: ResourceQuotas (Namespace-Level Limits) 🎯

Prevent teams from consuming more than their fair share.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "100"        # Total CPU across all Pods
    requests.memory: 200Gi     # Total memory
    limits.cpu: "200"          # Max CPU with limits
    limits.memory: 400Gi       # Max memory
    persistentvolumeclaims: "10"
    pods: "50"                 # Max Pod count
    services.loadbalancers: "3"

Check Quota Usage:

$ kubectl describe resourcequota -n team-alpha
Name:                   team-alpha-quota
Resource                Used   Hard
--------                ----   ----
requests.cpu            85     100    # ⚠️ 85% used!
requests.memory         180Gi  200Gi
pods                    45     50

Layer 2: LimitRanges (Pod-Level Defaults) 📏

Prevent individual Pods from being too large.

apiVersion: v1
kind: LimitRange
metadata:
  name: pod-limits
  namespace: team-alpha
spec:
  limits:
  # Container limits
  - type: Container
    max:
      cpu: "4"            # No single container > 4 CPU
      memory: 8Gi         # No single container > 8Gi
    min:
      cpu: 100m           # Minimum request
      memory: 128Mi
    default:
      cpu: 500m           # Default if not specified
      memory: 512Mi
    defaultRequest:
      cpu: 250m
      memory: 256Mi
  
  # Pod limits
  - type: Pod
    max:
      cpu: "8"            # No Pod > 8 CPU total
      memory: 16Gi

Effect:

# Pod without requests/limits
apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
  - name: app
    image: my-image
    # No resources specified

# After LimitRange mutation:
spec:
  containers:
  - name: app
    resources:
      requests:
        cpu: 250m      # Auto-added
        memory: 256Mi
      limits:
        cpu: 500m      # Auto-added
        memory: 512Mi

Layer 3: Priority Classes (Workload Importance) 🏆

Ensure critical workloads get scheduled first.

# Production workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: production-high
value: 1000000
globalDefault: false
description: "Production critical services"

---
# Development workloads
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: development-low
value: 100
preemptionPolicy: Never  # Can't preempt others
description: "Development and testing"

Assign to Pods:

# Production Pod
apiVersion: v1
kind: Pod
metadata:
  name: api-server
  namespace: team-alpha
spec:
  priorityClassName: production-high
  containers:
  - name: api
    image: api-server

# Dev Pod  
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: team-beta
spec:
  priorityClassName: development-low
  containers:
  - name: test
    image: test-app

Layer 4: Node Pools / Taints (Physical Isolation) 🏭

Dedicate nodes to specific teams.

Setup:

# Label nodes for each team
$ kubectl label nodes node-1 node-2 node-3 team=alpha
$ kubectl label nodes node-4 node-5 node-6 team=beta

# Add taints to prevent other teams
$ kubectl taint nodes node-1 node-2 node-3 team=alpha:NoSchedule
$ kubectl taint nodes node-4 node-5 node-6 team=beta:NoSchedule

Team-A Pods (with toleration):

apiVersion: v1
kind: Pod
metadata:
  name: team-a-pod
spec:
  nodeSelector:
    team: alpha
  tolerations:
  - key: team
    operator: Equal
    value: alpha
    effect: NoSchedule
  containers:
  - name: app
    image: team-a-image

Result:

  • Team A Pods → Only run on nodes 1-3
  • Team B Pods → Only run on nodes 4-6
  • No cross-team interference

Layer 5: Cluster Autoscaler Configuration ⚙️

Fair autoscaling per team.

# Separate node groups per team with autoscaling
# In cloud provider (AWS EKS example):

# Team Alpha node group
eksctl create nodegroup \
  --cluster=my-cluster \
  --name=team-alpha-ng \
  --node-labels=team=alpha \
  --node-taints=team=alpha:NoSchedule \
  --nodes-min=3 \
  --nodes-max=10

# Team Beta node group
eksctl create nodegroup \
  --cluster=my-cluster \
  --name=team-beta-ng \
  --node-labels=team=beta \
  --node-taints=team=beta:NoSchedule \
  --nodes-min=3 \
  --nodes-max=10

Complete Multi-Tenant Setup Example:

# 1. Create namespace per team
apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha
  labels:
    team: alpha
    environment: production

---
# 2. ResourceQuota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    pods: "50"

---
# 3. LimitRange
apiVersion: v1
kind: LimitRange
metadata:
  name: team-alpha-limits
  namespace: team-alpha
spec:
  limits:
  - type: Container
    max:
      cpu: "4"
      memory: 8Gi
    defaultRequest:
      cpu: 250m
      memory: 256Mi

---
# 4. NetworkPolicy (bonus: network isolation)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: team-alpha-isolation
  namespace: team-alpha
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          team: alpha  # Only allow traffic from same team

Monitoring Fair Allocation:

# Check quota usage across namespaces
$ kubectl get resourcequota --all-namespaces

# Check which team is using most resources
$ kubectl top pods --all-namespaces --sort-by=cpu | head -20

# See pending Pods (potential starvation)
$ kubectl get pods --all-namespaces --field-selector=status.phase=Pending

# Audit events for quota exceeded
$ kubectl get events --all-namespaces | grep "exceeded quota"

Alert When Team Hits Limits:

# Prometheus alert rule
- alert: NamespaceQuotaExceeded
  expr: |
    kube_resourcequota{type="hard"} - 
    kube_resourcequota{type="used"} < 0
  for: 5m
  labels:
    severity: warning
  annotations:
    description: "Team {{ $labels.namespace }} exceeded quota"

Best Practices:

  1. ✅ Start with generous quotas, adjust based on usage
  2. ✅ Use PriorityClasses to protect production workloads
  3. ✅ Monitor quota usage and set alerts
  4. ✅ Document team resource allocation policies
  5. ✅ Regular reviews and adjustments
  6. ✅ Consider cost allocation tools (Kubecost, OpenCost)

🎓 Bonus Tips for Interviews

Red Flags to Avoid ❌

  • ❌ “Scheduler runs containers” (No, that’s kubelet!)
  • ❌ “Only one scheduler per cluster” (Multiple are supported!)
  • ❌ “Scheduler directly talks to kubelet” (All via API Server!)
  • ❌ “Scheduler stores state” (It’s stateless, etcd has state!)

Pro Interview Answers ✅

  • ✅ Mention “Filter and Score” phases
  • ✅ Reference specific plugins (NodeResourcesFit, NodeAffinity)
  • ✅ Discuss production scenarios (preemption, topology spread)
  • ✅ Show troubleshooting knowledge
  • ✅ Mention HA and leader election

Follow-Up Topics to Study 📚

  • Custom Resource Definitions (CRDs) for scheduling
  • Descheduler (rebalances Pods post-scheduling)
  • Cluster Autoscaler integration
  • Volcano / YuniKorn schedulers
  • Scheduling latency optimization
  • Multi-cluster scheduling

📊 Quick Reference Cheat Sheet

SCHEDULER WORKFLOW
==================
1. Watch API Server for Pods with nodeName = ""
2. FILTER: Remove incompatible nodes
3. SCORE: Rank remaining nodes (0-100)
4. BIND: Update Pod with selected nodeName
5. Kubelet takes over → Runs container

KEY PLUGINS
===========
Filter:
• NodeResourcesFit - CPU/memory check
• NodeAffinity - Node selectors
• TaintToleration - Taints/tolerations
• PodTopologySpread - Even distribution

Score:
• NodeResourcesBalancedAllocation - Balanced usage
• ImageLocality - Cached images
• InterPodAffinity - Pod affinity
• NodeResourcesLeastAllocated - Spread Pods

TROUBLESHOOTING COMMANDS
========================
kubectl describe pod &lt;pod-name>
kubectl get events --sort-by=.lastTimestamp
kubectl logs -n kube-system kube-scheduler-xxx
kubectl top nodes
kubectl describe nodes | grep -A 5 "Allocated"


🚀 What’s Next?

Practice Scenarios:

  1. Set up Priority Classes in a test cluster
  2. Implement Pod Topology Spread Constraints
  3. Configure ResourceQuotas for multi-tenancy
  4. Write a simple scheduler extender
  5. Troubleshoot real Pending Pods

Further Reading:


Found this helpful? Share your scheduler war stories in the comments! 👇

Next in series: Part 2 – Kubernetes Controllers Deep Dive

#Kubernetes #K8s #DevOps #SRE #Interview #CloudNative #Scheduler #TechInterview #Learning


💡 Pro Tip: Star this for your next interview prep! These questions cover 90% of scheduler-related interview topics.

Leave a Comment

Your email address will not be published. Required fields are marked *

Stay up to date with our blogs.

Subscribe to receive email notifications for new blog posts.