In this blog post, I will quickly cover my recent experience in troubleshooting a DB2/LDAP authentication problem in a DB2 DPF database.
In a DB2 (9.7 FixPack 7) DPF data-warehouse database, a connection attempt by Datastage to any node other than the coordinator node was failing. Connection attempt to coordinator node succeeds. The same user id and password were being used in both the attempts.
I was working on a proof of concept that would allow ETL (DataStage) jobs to directly connect to the data nodes to allow parallel loads (directly into each partition) into a data-warehouse database. User ID used by Datastage was system-level id (non-LDAP). The user id was created on all the data nodes and coordinator node. The test ETL job was aborting with a familiar error message. All evidence indicated that this could be a password problem.
Approach to solution:
For the test ELT job, user id and password were saved and were supplied as parameter(s). That eliminated the chance of different password (incorrect one) being used to connect to data nodes.
I tried to isolate the problem to a specific user id. However, I found that ETL jobs failed even when instance owner’s credentials were used. To remove Datastage from the equation, I did an explicit connection to DB2 as Instance owner from the command prompt on one of the data nodes. To my surprise, this failed !! To me, this indicated a bigger problem. However, an implicit connection was successful.
db2inst1@hostdata01:~> id uid=608(db2inst1) gid=608(bcuigrp) groups=608(bcuigrp) db2inst1@hostdata01:~> db2 connect to edwdv <<<<----- Successful Implicit connection. Database Connection Information Database server = DB2/LINUXX8664 9.7.7 SQL authorization ID = DB2INST1 Local database alias = EDWDV
Here was the error message when an explicit connection attempt was made.
$db2 connect to edwdv user db2inst1 <<<<<--------- This works just fine on the coordinator node. Enter current password for db2inst1: SQL30082N Security processing failed with reason "24" ("USERNAME AND/OR PASSWORD INVALID"). SQLSTATE=08001
Messages in db2diag.log
db2diag.log had a message that indicated password problem.
Password validation for user db2inst1 failed with rc = -2146500507
1) The user ID was not locked.
2) The password that was being supplied was the right one.
3) There was no recent fixpack that was applied that could have messed up things.
4) Instance owner’s password-less ‘ssh’ between DPF nodes was working just fine. (This is actually a pre-requisite in DB2 DPF).
5) db2set parameter for DB2 and transparent LDAP authentication (DB2AUTH=OSAUTHDB) was set on all the nodes.
Errors in /var/log/messages file
I noticed that an error message was being written to /var/log/messages file (this was SUSE Linux) every time an explicit connection attempt was made.
Oct 7 10:28:39 hostdata01 db2ckpwd 5: pam_warn(db2:auth): function= [pam_sm_authenticate] service=[db2] terminal=[<unknown>] user=[db2inst1] ruser=[<unknown>] rhost=[<unknown>]
The key words for me were “pam_warn”, “db2:auth”, “pam_sm_authenticate”. Google search lead me to Ember Crook’s blog post on DB2 and Transparent LDAP. That is where I read about the file /etc/pam.d/db2.
To my surprise, I found this file only on the coordinator node.
db2inst1@hostadm01:/etc/pam.d> ls -ltr /etc/pam.d/db2 <<<<--------- This is on the co-ordinator node -rw-r--r-- 1 root root 383 2014-10-08 16:15 db2 db2inst1@hostdata01:/etc/pam.d> ls -ltr /etc/pam.d/db2 <<<<--------- This is on the data1 node /bin/ls: /etc/pam.d/db2: No such file or directory db2inst1@hostdata02:/etc/pam.d> ls -ltr /etc/pam.d/db2 <<<<--------- This is on the data2 node /bin/ls: /etc/pam.d/db2: No such file or directory db2inst1@hostdata03:/etc/pam.d> ls -ltr /etc/pam.d/db2 <<<<--------- This is on the data3 node /bin/ls: /etc/pam.d/db2: No such file or directory
The problem was that DB2 expected the file /etc/pam.d/db2 to be on all the nodes in the DPF database. However, this file was only on the coordinator node. The error message SQL30082N was misleading. It indicated that the problem could be with the user id’s credentials.
After the file /etc/pam.d/db2 file was copied onto the data nodes, explicit connection attempt worked as expected. No instance restart was required. This experience is a reminder that each node in a DPF database needs to be configured exactly the same way. Minor differences might hide the problems for some time but it is only a matter of time that problems surface.
Contents of /etc/pam.d/db2 file
Below was how our /etc/pam.d/db2 file looked like. I am not a PAM (Programmable Access Module) expert. However, after some research, I now understand that the authentication process (for DB2) is top-down as outlined in the file /etc/pam.d/db2.
# The PAM configuration file for DB2 auth sufficient pam_ldap.so use_first_pass auth required pam_unix2.so account sufficient pam_ldap.so account required pam_unix2.so password required pam_pwcheck.so password sufficient pam_ldap.so use_first_pass password required pam_unix2.so use_authtok use_first_pass session required pam_unix2.so
PAM is flexible and it supports both local and LDAP users. The above PAM configuration supports system user ids via
pam_unix2.so and LDAP users via pam_ldap.so.
pam_ldap.so — As this is in the 1st line, DB2 first tries to authenticate via LDAP. If authentication succeeds, the process exits (with a success) dictated by key word ‘sufficient’ (as in ‘necessary and sufficient’ condition).
pam_unix2.so — If the user id is NOT found in LDAP or if LDAP authentication fails, DB2 then relies on operating system (LINUX in this case) to authenticate the user. “use_first_pass” in the 1st line passes on the password to 2nd authentication attempt. User is not prompted for the password for the second time. This authentication step is a ‘required’ one. If authentication fails in this step, an error is returned to the user.
Hope this helps. I would appreciate anyone sharing your experiences with PAM in AIX or LINUX.