Upgrade OCP4 Cluster

Please see the official documentation for more information [1][3], this SOP can be used as a rough guide.

Prerequisites

  • Incase an upgrade fails, it is wise to first take an etcd backup. To do so follow the SOP [2].

  • Ensure that all installed Operators are at the latest versions for their channel [5].

  • Ensure that the latest oc client rpm is available at /srv/web/infra/bigfiles/openshiftboot/oc-client/ on the batcave01 server. Retrieve the RPM from [6] choose the Openshift Clients Binary rpm. Rename rpm to oc-client.rpm

  • Ensure that the sudo rbac-playbook manual/ocp4-sysadmin-openshift.yml -t "upgrade-rpm" playbook is run to install this updated oc client rpm.

Upgrade OCP

At the time of writing the version installed on the cluster is 4.8.11 and the upgrade channel is set to stable-4.8. It is easiest to update the cluster via the web console. Go to:

  • Administration

  • Cluster Settings

  • In order to upgrade between z or patch version (x.y.z), when one is available, click the update button.

  • When moving between y or minor versions, you must first switch the upgrade channel to fast-4.9 as an example. You should also be on the very latest z/patch version before upgrading.

  • When the upgrade has finished, switch back to the upgrade channel for stable.

Upgrade failures

In the worst case scenario we may have to restore etcd from the backups taken at the start [4]. Or reinstall a node entirely.

Troubleshooting

There are many possible ways an upgrade can fail mid way through.

  • Check the monitoring alerts currently firing, this can often hint towards the problem

  • Often individual nodes are failing to take the new MachineConfig changes and will show up when examining the MachineConfigPool status.

  • Might require a manual reboot of that particular node

  • Might require killing pods on that particular node