Externalize Database from Information Server using NodePort

Sanjit Chakraborty
4 min readNov 3, 2021

Under Cloud Pak for Data (CPD), Watson Knowledge Catalog (WKC) combines InfoSphere Information Governance Catalog and InfoSphere Information Analyzer (part of InfoSphere Information Server) into a single application to deliver Governance, Quality, Catalog capabilities on a single platform.

A high-level architecture of IIS

InfoSphere Information Server (IIS) is a product family that provides a unified data integration platform so that companies can understand, cleanse, transform, and deliver trustworthy and context-rich information to critical business initiatives. IIS offers various product modules and components that provide integrated end-to-end information integration solutions. It’s consisting of three components: 1) A Java Platform, Enterprise Edition (J2EE) application server, 2) A parallel processing runtime engine, and 3) A database repository.

There may be certain requirements, where you need to access directly one of the databases under the IIS Engine tier: Information Analyzer (IADB) or QualityStage Match DB. As you may know, Information Analyzer plays a prominent role in the understanding phase of an information integration project. Trusted information is at the core of many business initiatives. It is one of the foundations for decision-making processes. Information Analyzer helps organizations assess and monitor data quality, identify data quality concerns and demonstrate compliance. Imagine you need to visualize some data quality information from IADB using Microsoft Power BI. In this post, I will explain how you can access IADB that resides within a WKC pod.

OpenShift provides multiple methods for communicating from outside the cluster with services running in the cluster. A cluster administrator can expose a service endpoint that external traffic can reach, by assigning a unique external IP address to that service from a range of external IP addresses. In this case, we will use NodePorts to expose the service nodePort on all nodes in the cluster. The node port exposes the c-db2oltp-iis-db2u service on a static port on the node IP address, which let us access the IADB database running on c-db2oltp-iis-db2u-0 pod. By default, NodePorts are in the range of 30000–32767, which means a NodePort is unlikely to match a service’s intended port (for example, 50000 may be exposed as 31104).

Accessing IADB

  1. Login to the CPD cluster using the OC command
# oc login <OpenShift_URL:port>

2. Select the CPD project

# oc project <CDP project name>

3. Find the “c-db2oltp-iis-db2u” service. Typically, this service would be listed as “clusterIP” indicating that it is only visible inside the cluster.

# oc get svc | grep c-db2oltp-iis-db2uc-db2oltp-iis-db2u      ClusterIP   172.30.148.63    <none>        50000/TCP,50001/TCP,25000/TCP,25001/TCP,25002/TCP,25003/TCP,25004/TCP,25005/TCP   107d

4. Change the “clusterIP” to “NodePort” to expose the “c-db2oltp-iis-db2u” service. Edit the service and search for the word ClusterIP, then change the second occurrence of ClusterIP to NodePort. Save the file and exit.

# oc edit svc c-db2oltp-iis-db2u

5. List the services again and you will see the type changed to NodePort and the port 50000 has been mapped with an additional port 31104 or some other number. In this example: 31104 is the port that is mapped externally and will be the port you will use to access the database in CPD.

# oc get svc | grep c-db2oltp-iis-db2uc-db2oltp-iis-db2u             NodePort    172.30.148.63    <none>        50000:31104/TCP,50001:32361/TCP,25000:32284/TCP,25001:31112/TCP,25002:32541/TCP,25003:31287/TCP,25004:32469/TCP,25005:32688/TCP   107d

6. Expose the c-db2oltp-iis-db2u service as a route

# oc expose service c-db2oltp-iis-db2u

7. Verify service is exposed as a route

# oc get routeNAME                 HOST/PORT                                                     PATH   SERVICES             PORT                   TERMINATION            WILDCARDc-db2oltp-iis-db2u   c-db2oltp-iis-db2u-cp4d.apps.cpdv4-services.cp.fyre.ibm.com          c-db2oltp-iis-db2u   db2-server                                    None

8. Update the HAProxy configuration file located at /etc/haproxy/haproxy.cfg. Configure the iadb (db2) NodePort with an ingress controller. Add a similar entry as follows to the haproxy.cfg file. Make sure the master hosts name, IP address, and port reflect your configuration. Restart the haproxy service (systemctl restart haproxy).

9. Get the name of the pod that contains the IADB repository

# oc get pods | grep iis-servicesiis-services-7b5fdd4fbc-mwlbt        1/1     Running     0      106d

10. Retrieve password for iauser from iis-services pod in CPD.

# oc rsh iis-services-7b5fdd4fbc-mwlbt env | grep IAUSER_PASSIAUSER_PASSWORD=7ZXqDgY3IY

11. Just for testing create a Db2-type platform connection to access IADB. Use connection details as per your environment to access the IADB database.

Database = iadbHostname = c-db2oltp-iis-db2u-cp4d.apps.cpdv4-services.cp.fyre.ibm.com (Step 7)Port = 31104 (Step 5)Username = iauserPassword = 7ZXqDgY3IY (step 10)

12. Create a JDBC connection from the Microsoft Power BI using the same connection details as above to get access to IADB. You can use the Microsoft documentation for details on connecting the Db2 database from Power Query Desktop.

I used IADB as an example, but a similar approach can be followed to externalize any databases on the CPD platform.

--

--

Sanjit Chakraborty

Sanjit enjoys building solutions that incorporate business intelligence, predictive and optimization components to solve complex real-world problems.