DB2 DPF Tablespace Recovery Scenarios — Part 1 — Recover Tablespace(s) on Coordinator Node

In this blog post I am going to share with you about something exciting that I recently worked on. You don’t get a chance to work on recovering database objects very often and that too in a DPF environment.. !! I will walk you through the scenario of recovering these tables in a series of blog posts. This is part 1 (out of 3 I assume) of such series.. This time we will only cover recovering tables that are on Coordinator Node.

This actually happened in our Enterprise Data Warehouse Production environment.  A Datastage (ETL) job was inadvertently run a day earlier and this resulted in modifying ~40+ tables. I was asked if I could recover to a point-in-time to the previous night ~8:30 PM (which happens to be just after our full online database backup).

Out of 40 tables that needed to be recovered, ~10 were on the coordinator node (Partition 0 in our case). Remaining ~30 were spread across multiple database partitions. For simplicity, I will use the number as 2 (tables instead of 10) in this blog post as a recovery scenario.

Here is the problem Statement in a summary.

“2 tables that are on the coordinator node (usually partition 0) need to be recovered to a point-in-time from an online backup.”

Steps Performed:

1) Find out which tablespaces (both Data and Index) need to be recovered.
2) Find out which partition group do tablespaces belong to. In this case we assume that they are only on coordinator node. But I will show how to check this using system catalog tables.
3) Run restore with options depending on which partition group/partitions do tablespaces belong to.

Lets call 2 tables that need to be recovered as TABLE1 and TABLE2.

1) Recovering tables actually involves recovering tablespaces.

To get the list of tablespaces, here is the SQL:

db2 “select char(TBSPACE,20) from syscat.tables where tabname in (‘TABLE1′,’TABLE2’)
union
select char(INDEX_TBSPACE,20) from syscat.tables where tabname in (‘TABLE1′,’TABLE2’) 
with ur”;

1
——————–
DTBSP1
DTBSP2
ITBSP1
ITBSP2

4 record(s) selected

So recovering 2 tables in this case actually means recovering 4 tablespaces (2 data and 2 index)

2) To determine which database partitions do these tables belong to, use the following SQL:

db2 “select char(tbsp_name, 20) tablespace, char(DBPGNAME, 20) nodegroup, DBPARTITIONNUM from sysibmadm.tbsp_utilization where tbsp_name in (‘DTBSP1′,’ITBSP1′,’DTBSP2′,’ITBSP2’) order by tbsp_name, DBPARTITIONNUM with ur”

TABLESPACE NODEGROUP DBPARTITIONNUM
——————– ——————– ————–
DTBSP1 SDPG 0
DTBSP2 SDPG 0
ITBSP1 SDPG 0
ITBSP2 SDPG 0

Next steps:

Login to the coordinator node;

Run “db2 export DB2NODE=0” and then run db2 (db2_all not needed in this case as we are only recovering from 1 partition (#0)command shown below:

db2 “RESTORE DATABASE TESTDB TABLESPACE (DTBSP1, DTBSP2, ITBSP1, ITBSP2) ONLINE USE TSM TAKEN AT 20140227143006 REPLACE EXISTING WITHOUT PROMPTING”

I would usually put these commands in a shell script and run in a nohup mode (especially when I am logged in remote using a VPN) to circumvent network related hiccups.

Once the restore is complete, below is the status of these tablespaces.

db2 “select distinct(char(tbsp_name,20)), char(tbsp_state , 40) from SYSIBMADM.TBSP_UTILIZATION where TBSP_STATE <> ‘NORMAL’ AND DBPGNAME = ‘SDPG'”

1 2
——————– —————————————-
DTBSP1 ROLLFORWARD_PENDING
DTBSP2 ROLLFORWARD_PENDING
ITBSP1 ROLLFORWARD_PENDING
ITBSP2 ROLLFORWARD_PENDING

4 record(s) selected.

Next step is to rollforward logs.
In this situation, we determined that the best point to rollforward is an immediate point-in-time after database backup.

Here is my rollforward command:

db2 “ROLLFORWARD DATABASE TESTDB TO END OF BACKUP TABLESPACE (DTBSP1, DTBSP2, ITBSP1, ITBSP2) ONLINE”
Rollforward Status

Input database alias = TESTDB
Number of nodes have returned status = 1

Node number = 0
Rollforward status = TBS working
Next log file to be read = S0285597.LOG
Log files processed = –
Last committed transaction = 2014-02-02-18.01.07.000000 UTC

DB20000I The ROLLFORWARD command completed successfully.

Here is the status of the tablespaces after the above rollforward activity.

db2 “select distinct(char(tbsp_name,20)), char(tbsp_state , 40) from SYSIBMADM.TBSP_UTILIZATION where TBSP_STATE <> ‘NORMAL’ AND DBPGNAME = ‘SDPG'”

1 2
——————– —————————————-
DTBSP1 ROLLFORWARD_IN_PROGRESS
DTBSP2 ROLLFORWARD_IN_PROGRESS
ITBSP1 ROLLFORWARD_IN_PROGRESS
ITBSP2 ROLLFORWARD_IN_PROGRESS

4 record(s) selected.

Once I found that the timestamp shown as the “Last committed transaction” in the output of rollforward command (above) is satisfactory, I stopped the rollforward activity. Here is how you do it:

db2 “ROLLFORWARD DATABASE TESTDB STOP TABLESPACE (DTBSP1, DTBSP2, ITBSP1, ITBSP2) ONLINE”

Rollforward Status

Input database alias = TESTDB
Number of nodes have returned status = 1

Node number = 0
Rollforward status = not pending
Next log file to be read =
Log files processed = –
Last committed transaction = 2014-02-02-18.01.07.000000 UTC

DB20000I The ROLLFORWARD command completed successfully.
Check for the status:

db2 “select distinct(char(tbsp_name,20)), char(tbsp_state , 40) from SYSIBMADM.TBSP_UTILIZATION where TBSP_STATE <> ‘NORMAL’ AND DBPGNAME = ‘SDPG'”

1 2
——————– —————————————-
DTBSP1 BACKUP_PENDING
DTBSP2 BACKUP_PENDING
ITBSP1 BACKUP_PENDING
ITBSP2 BACKUP_PENDING

4 record(s) selected.

A quick way to take tablespaces out of Backup-Pending mode is to run backup to /dev/null. If you have time, it is highly recommended that you back up to disk / TSM for future recovery purposes.
db2 “BACKUP DATABASE TESTDB ON ALL DBPARTITIONNUMS TABLESPACE ((DTBSP1, DTBSP2, ITBSP1, ITBSP2) ONLINE”

Part Result
—- ————————————————————————
0000 DB20000I The BACKUP DATABASE command completed successfully.
0001 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2”
0002 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2
0003 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2
0004 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2
0005 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2
0006 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2
0007 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2
0008 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2
0009 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2
0010 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2
0011 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2
0012 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “DTBSP1,DTBSP2,ITBSP1,ITBSP2

Backup successful. The timestamp for this backup image is : 20140301193534

Above warnings are expected as the tablespaces of interest are NOT on partitions 1 through 12. I could easily prevented these warnings by letting db2 backup command to run only on partition 0.

db2 “select distinct(char(tbsp_name,20)), char(tbsp_state , 40) from SYSIBMADM.TBSP_UTILIZATION where TBSP_STATE <> ‘NORMAL’ AND DBPGNAME = ‘SDPG'”

1 2
——————– —————————————-

0 record(s) selected.

db2 “select distinct(char(tbsp_name,20)), char(tbsp_state , 40) from SYSIBMADM.TBSP_UTILIZATION where TBSP_STATE <> ‘NORMAL'”

1 2
——————– —————————————-

0 record(s) selected.

I ran a simple select against these 2 tables to check if I could access them:

db2 “select count(*) from SCHEMA1.TABLE1 with ur”

1
———–
23254

1 record(s) selected.

db2 “select count(*) from SCHEMA1.TABLE2 with ur”

1
———–
5632

1 record(s) selected.

I asked the user to validate and found out he is happy with the data in these tables (after recovery).

Hope this blog post was helpful. In future posts, I will share how to recover tablespaces on other partitions.

Appreciate you sharing your feedback and questions.

Until then..have a good time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s