Configuring CPD multi-tenancy with node selector

Sanjit Chakraborty
4 min readJan 31, 2022
Image courtesy of Rebecca Williams, michiganradio.org

I wrote a blog earlier on configuring multitenant cluster with Cloud Pak for Data (CPD). But since then, it has gone through an extreme transformation. You can run CPD now on the Red Hat OpenShift cluster; whether it’s behind your firewall or on the cloud. If you have an OpenShift deployment on IBM Cloud, AWS, Microsoft Azure, or Google Cloud, you can deploy CPD on that cluster. Otherwise, if you prefer to keep your deployment behind a firewall, you can run CPD on the private, on-premises cluster.

CPD supports different installation and deployment mechanisms for achieving multitenancy. This information can be found in the CPD documentation. The recommended way to achieve multitenancy is to install each instance of CPD in a separate OpenShift project. This can provide logical isolation between each CPD instance with limited physical isolation. To put more physical isolation, you may add constrain that an instance can only be able to run on a particular set of nodes. This allows more control on a node where a pod lands, e.g., to ensure that a pod ends up on a machine with an SSD attached to it, to co-locate pods from two different services that communicate a lot into the same availability zone or to ensure every CPD instance have their fair share of the resource.

By default, the OpenShift scheduler manages the right node to get selected by checking the node’s capacity for CPU and RAM and comparing it to the pod’s resource requests. The scheduler ensures that, for each of these resource types, the sum of all resource requests by the pod containers are less than the node's capacity. This mechanism ensures that pods end up on nodes with spare resources. But for whatever reason, if you want to allocate dedicated nodes for individual CPD instances, the simplest solution is using node selection constraint (NodeSelector).

A NodeSelector specifies a map of key-value pairs. The rules are defined using custom labels on nodes and selectors specified in pods or projects. For the project to be eligible to spread on specific nodes, the project must have the indicated key-value pairs as the label on the nodes.

Let’s take an example to use this multitenancy situation. You need to deploy two CPD services; 1) Watson Knowledge Catalog (WKC), and 2) Master Data Management (MDM) on separate CPD instances with dedicated nodes. Imagine you have six nodes on the OpenShift cluster. Nodes 1-3 need to be dedicated for WKC, and nodes 4-6 for MDM.

Example of multi-tenant Cloud Pak for Data environment

To chive the requirement, you can create two projects; “wkc” and “mdm”. Using NodeSelector, paired nodes 1-3 with project “wkc” and project “mdm” with nodes 4-6. Once key-value pairs are established, deploy two CPD instances. First CPD instance for MDM installed on “mdm” project and other for WKC on “wkc” project. It is important that you configure the key-value pair for both projects to implement complete isolation. In this scenario, a separate storage class is not considered but in an ideal situation should have dedicated storage for two different CPD instances. You need these storage classes during the CPD deployment only. Following is the step to create key-value pairs:

Key-value Pair Configuration Step for MDM

1. Add label to all nodes for MDM
oc label node <node name> region=mdm
2. Verify label added
oc get nodes -l region=mdm
3. Create a new project
oc new-project mdm
4. Add label to the project
oc label namespace mdm "region=mdm"
5. Add "node-selector" annotation with other annotations and save
oc edit namespace mdm
Example of step 5
apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/description: ""
openshift.io/display-name: ""
openshift.io/node-selector: region=mdm ## Annotation added
openshift.io/requester: kube:admin
6. Use project name "mdm" while installing fist instance of CPD and MDM to restrict this service within the specified project and nodes. Check the product documentation for CPD Installation instruction.

Key-value Pair Configuration Step for WKC

1. Add label to all nodes for WKC
oc label node <node name> region=wkc
2. Verify label added
oc get nodes -l region=wkc
3. Create a new project
oc new-project wkc
4. Add label to the project
oc label namespace wkc "region=wkc"
5. Add "node-selector" annotation with other annotations and save
oc edit namespace wkc
Example of step 5
apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/description: ""
openshift.io/display-name: ""
openshift.io/node-selector: region=wkc ## Annotation added
openshift.io/requester: kube:admin
6. Use project name "wkc" while installing second instance of CPD and WKC to restrict this service within the specified project and nodes. Check the product documentation for CPD Installation instruction.

In this scenario, I used the NodeSelector scheduling feature that allows scheduling projects to a node whose labels match the nodeSelector labels. There are other features like node affinity, inter-pod affinity, etc. which can also help you with pod scheduling.

--

--

Sanjit Chakraborty

Sanjit enjoys building solutions that incorporate business intelligence, predictive and optimization components to solve complex real-world problems.