Lessons to take from CPD upgrade 3.5 to 4.0.4

Sanjit Chakraborty
5 min readFeb 14, 2022
Image Credits: Gemini-Create

Cloud Pak for Data (CPD) upgrade would be whole lot easier, if it’s just few simple steps involve. Unfortunately, the chance of that happen is slim, or may be bather luck in distant future. Mainly when you are upgrading from CPD v3.5 to v4.0.x; due to changes in basic architecture, there is lot more involvement needed and process itself long winded. But there is never a dull moment when you do the upgrade. It is always an exciting and eventful experience. 😉

Recently we performed few CPD upgrades successfully and going to share lessons learned from these involvements.

Environment details

  • Current OpenShift version 4.8.28
  • Current CPD version 3.5.6
  • Target CPD version 4.0.4
  • Air-gapped cluster
  • Services installed Watson Knowledge Catalog (WKC), Data Virtualization (DV), Data Management Console (DMC)

Order of service upgrade: First you will start the upgrade with Cloud Pak for Data. It will automatically deploy IBM Cloud Pak for Data platform and the IBM Cloud Pak foundational services. If you need to upgrade WKC along with other CPD services. Always, upgrade the WKC first before any other services to avoid farther complexity.

Secondly, if service B has dependency on service A, then upgrade service A before B. For the same reason, upgrade DMC before DV.

Upgrading CPD directly from 3.5.6 to 4.0.x: CPD documentation says — “If you are running an earlier refresh of Cloud Pak for Data Version 3.5, you must upgrade to Refresh 9 of Version 3.5 before you can upgrade to Refresh 4 of Version 4.0”.

In my experience, you can upgrade CPD 3.5.6 directly to version 4.0.4. Of course, at first I ran into a problem. By default, CPD v.4.0.4 is corresponding to IBM Cloud Pak Foundation Server CASE package version 1.10.1. But during upgrade the platform operator, noticed a db2u image was missing and causing failure. To overcome this challenge, I download and mirror all IBM Cloud Pak Foundation Server CASE package versions corresponding to CPD version 4.0.1 to 4.0.4.

NFS provisioner version: If you are using NFS storage class with CPD 3.5, existing NFS provisioner does not prevent the CPD to upgrade. But if you need to create a new NFS storage class that will fail. Latest version of NFS provisioner needed on OpenShift 4.x. Otherwise NFS provisioner pod will fail with error “Unable to attach or mount volumes: unmounted volumes”.

You must download latest NFS provisioner; which should have the nfs-subdir-external-provisioner:v4.0.2 image or higher.

Failed to start ‘wdp-rabbitmq’ pod: Under a rare condition, while applying WKC custom resource, wdp-rabbitmq pod fails with error “Readiness probe failed connect: connection refused”. This is a known issue, when you upgrade CPD from 3.5.0 to 4.0. The rabbitmq volume might become corrupted because of ungraceful termination. To resolve the issue, recreate the rabbitmq volumes and run the unquiesce operation again.

oc scale sts/rabbitmq-ha --replicas 0
oc delete pvc data-rabbitmq-ha-0 data-rabbitmq-ha-1 data-rabbitmq-ha-2
oc scale sts/rabbitmq-ha --replicas 3

Db2u pods in pending state: During WKC custom resource upgrade (CCS), c-db2oltp-iis-db2u-0 and c-db2oltp-wkc-db2u-0 pods are stuck in pending state, with message “forbidden sysctl: kernel.shmmni not whitelisted”.

You need to create a kubeletconfig to allow Db2U to make unsafe sysctl calls for Db2 to manage required memory settings. But for reason, if there is already a kubeletconfig associated to worker machine config pool (mcp) and you are creating a second one (db2u-kubelet); you encounter this problem. OpenShift supports only one kubeletconfig per mcp. We need to marge both kubeletconfig and create single one.

Prevent operator from automatic upgrade : When you are creating a CPD software operator subscription, you can specify whether OpenShift automatically loads newer versions of the operator or whether it creates an update request that must be approved by a cluster administrator. However, any operators that are automatically installed by another CPD software operator are created with the automatic install plan.

It is good practice to use the manual install plan to ensure that all aspects of the software remain at the same version until you are ready to upgrade the software, you can edit the ZenService custom resource to add or update the installPlanApproval parameter and set to manual. This parameter ensures that any dependent operator subscriptions that are created by the Operand Deployment Lifecycle Manager are created with the specified install plan.

To confirm the manually upgrade the Cloud Pak for Data control plane when you install a newer version of IBM Cloud Pak foundational services, let pin the installation to a specific version. Change manual upgrade plan for ZenService custom resource and addversion 4.3.2:

oc patch ZenService lite-cr \
— namespace <cpd-instance> \
— type=merge \
— patch '{"spec": {"version":"4.3.2"}}'

Scale up IIS failed: During post upgrade task, you patched IIS custom resource to scale up IIS resources but the IISoperator management pod log reporting error TASK [Wait for Cassandra to roll out]. Same time the job building the Cassandra database is failing.

Find out and delete the job associated to the Cassandra. Secondly, delete the IIS operator management pod from ibm-common-services project for reconciliation.

CPD project failing to communicate with ibm-common-services: By default, all pods in a project are accessible from other pods and network endpoints. You could have a NetworkPolicyto isolate pods and network points from other projects. This causes incoming connection failure between pods from CPD user project and ibm-common-services project.

If CPD user project has network policy limiting communication sources, it needs to be update to allow traffic with ibm-common-services project. For example: if CPD user project has a network policy allowing communication with projects labelled as atlas-vast=custom_pods , same label should added to the ibm-common-services project.

oc label namespace ibm-common-services atlas-vast=custom_pods

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Following is the runbook used for this upgrade. Some of the above lessons already considered in the runbook.

Problems mentioned in the blog are very specific to a particular environment. You could have a different experience depending on your cluster environment and settings. Step listed in the runbook is for upgrading CPD from 3.5.6 to 4.0.4. Upgrade to a different CPD version could need additional consideration. Always consult CPD documentation before perform an upgrade.

--

--

Sanjit Chakraborty

Sanjit enjoys building solutions that incorporate business intelligence, predictive and optimization components to solve complex real-world problems.