Cordoning Nodes and Draining Pods

This SOP should be followed in the following scenarios:
  • If maintenance is scheduled to be carried out on an Openshift node.

Steps

  1. Connect to the os-control01 host associated with this ENV. Become root sudo su -.

  2. Mark the node as unschedulable:

    nodes=$(oc get nodes -o name  | sed -E "s/node\///")
    echo $nodes
    
    for node in ${nodes[@]}; do oc adm cordon $node; done
    node/<node> cordoned
  3. Check that the node status is NotReady,SchedulingDisabled

    oc get node <node1>
    NAME        STATUS                        ROLES     AGE       VERSION
    <node1>     NotReady,SchedulingDisabled   worker    1d        v1.18.3

    Note: It might not switch to NotReady immediately, there maybe many pods still running.

  4. Evacuate the Pods from worker nodes using one of the following methods This will drain node <node1>, delete any local data, and ignore daemonsets, and give a period of 60 seconds for pods to drain gracefully.

    oc adm drain <node1> --delete-emptydir-data=true --ignore-daemonsets=true --grace-period=15
  5. Perform the scheduled maintenance on the node Do what ever is required in the scheduled maintenance window

  6. Once the node is ready to be added back into the cluster We must uncordon the node. This allows it to be marked scheduleable once more.

    nodes=$(oc get nodes -o name  | sed -E "s/node\///")
    echo $nodes
    
    for node in ${nodes[@]}; do oc adm uncordon $node; done