DB2 DPF Tablespace Recovery Scenarios — Part 2 — Recover Tablespace(s) on Data Partitions

In this blog post, I will share how to recover tablespaces that are on database partitions other than the catalog partition. This blog post is 2nd out of 3 parts. The topic is “DB2 DPF Tablespace Recovery Scenarios”. Part 1 covered recovering tablespaces on the Catalog Partition.

For example if there are a total of 13 (logical) partitions in a DPF environment and partition “0” is  catalog partition, partitions 1 to 12 are called Data Partitions. In this blog, we sill see how to recover tablespaces that are on partitions 1 to 12.

To keep things simple, assume that only 2 tables need to be recovered. Here are the steps that would be done from a high level:

1) Find out which tablespaces (both Data and Index) need to be recovered.

2) Find out which partition group do tablespaces belong to.

3) Run restore with options depending on which partition group/partitions do tablespaces belong to.

Lets call 2 tables that need to be recovered as TABLE3 and TABLE4.

1) As discussed in part 1, recovering tables actually involves recovering tablespaces.

To get the list of tablespaces, here is the SQL:

db2 “select char(TBSPACE,20) from syscat.tables where tabname in (‘TABLE3′,’TABLE4′)
union
select char(INDEX_TBSPACE,20) from syscat.tables where tabname in (‘TABLE3′,’TABLE4′) 
with ur”;

1
——————–
TBSP3
TBSP4
TBSP3_IX
TBSP4_IX

4 record(s) selected

So recovering 2 tables in this case actually means recovering 4 tablespaces (2 data and 2 index).

2) To determine which database partitions do these tables belong to, use the following SQL:

db2 “select char(tbsp_name, 20) tablespace, char(DBPGNAME, 20) nodegroup, DBPARTITIONNUM from sysibmadm.tbsp_utilization where tbsp_name in (‘TBSP3′,’TBSP3_IX′,’TBSP4′,’TBSP4_IX′) order by tbsp_name, DBPARTITIONNUM with ur”

TABLESPACE NODEGROUP DBPARTITIONNUM
——————– ——————– ————–
TBSP3 PDPG 1
TBSP3 PDPG 2
TBSP3 PDPG 3
TBSP3 PDPG 4
TBSP3 PDPG 5
TBSP3 PDPG 6
TBSP3 PDPG 7
TBSP3 PDPG 8
TBSP3 PDPG 9
TBSP3 PDPG 10
TBSP3 PDPG 11
TBSP3 PDPG 12
TBSP3_IX PDPG 1
TBSP3_IX PDPG 2
TBSP3_IX PDPG 3
TBSP3_IX PDPG 4
TBSP3_IX PDPG 5
TBSP3_IX PDPG 6
TBSP3_IX PDPG 7
TBSP3_IX PDPG 8
TBSP3_IX PDPG 9
TBSP3_IX PDPG 10
TBSP3_IX PDPG 11
TBSP3_IX PDPG 12
TBSP4 PDPG 1
TBSP4 PDPG 2
TBSP4 PDPG 3
TBSP4 PDPG 4
TBSP4 PDPG 5
TBSP4 PDPG 6
TBSP4 PDPG 7
TBSP4 PDPG 8
TBSP4 PDPG 9
TBSP4 PDPG 10
TBSP4 PDPG 11
TBSP4 PDPG 12
TBSP4_IX PDPG 1
TBSP4_IX PDPG 2
TBSP4_IX PDPG 3
TBSP4_IX PDPG 4
TBSP4_IX PDPG 5
TBSP4_IX PDPG 6
TBSP4_IX PDPG 7
TBSP4_IX PDPG 8
TBSP4_IX PDPG 9
TBSP4_IX PDPG 10
TBSP4_IX PDPG 11
TBSP4_IX PDPG 12

48 record(s) selected.

As you see from the output above, 4 tablespaces (TBSP3, TBSP3_IX, TBSP4, TBSP4_IX) each are on partitions 1 to 12 which is why you see 4 (tablespaces) times 12 partitions = 48 rows as the output of the select statement.

Next step is to kick off the restore.. here is the command (You could run this command from any node)

db2_all “||<<-0< db2 RESTORE DATABASE TESTDB TABLESPACE \(TBSP3, TBSP3_IX, TBSP4, TBSP4_IX\) ONLINE USE TSM TAKEN AT 20140227143006 REPLACE EXISTING WITHOUT PROMPTING”

Couple of things to notice in the above command:

1) db2_all –> This prefix is required when the restore command needs to be run on multiple partitions (in this case 1 to 12)

2) || –> || executes command in parallel on all the partitions

3) -0 –> By using “-0”, you are asking DB2 to skip partition 0 and run it on all partitions (1 to 12)

4) Escape character “\” in the tablespace list is to escape the character “(” and “)”

Here is the output of the above command (You would see a response from each of the 12 partitions):

rah: omitting logical node 0

rah: primary monitoring process for db2 is 25191

hostdata03: DB20000I The RESTORE DATABASE command completed successfully.
hostdata03: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 14805

hostdata01: DB20000I The RESTORE DATABASE command completed successfully.
hostdata01: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 27090

hostdata01: DB20000I The RESTORE DATABASE command completed successfully.
hostdata01: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 31356

hostdata01: DB20000I The RESTORE DATABASE command completed successfully.
hostdata01: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 31818

hostdata03: DB20000I The RESTORE DATABASE command completed successfully.
hostdata03: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 2395

hostdata03: DB20000I The RESTORE DATABASE command completed successfully.
hostdata03: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 6651

hostdata01: DB20000I The RESTORE DATABASE command completed successfully.
hostdata01: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 20813

hostdata02: DB20000I The RESTORE DATABASE command completed successfully.
hostdata02: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 10010

hostdata03: DB20000I The RESTORE DATABASE command completed successfully.
hostdata03: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 19250

hostdata02: DB20000I The RESTORE DATABASE command completed successfully.
hostdata02: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 19372

hostdata02: DB20000I The RESTORE DATABASE command completed successfully.
hostdata02: db2 RESTORE DATABASE … completed ok

rah: primary monitoring process for db2 is 30299

hostdata02: DB20000I The RESTORE DATABASE command completed successfully.
hostdata02: db2 RESTORE DATABASE … completed ok

I would usually put these commands in a shell script and run in a nohup mode (especially when I am logged in remote using a VPN) to circumvent network related hiccups.

db2 “select distinct(char(tbsp_name,20)), char(tbsp_state , 40) from SYSIBMADM.TBSP_UTILIZATION where TBSP_STATE <> ‘NORMAL’ AND DBPGNAME = ‘PDPG’”

1 2
——————– —————————————-
TBSP3 ROLLFORWARD_PENDING
TBSP4 ROLLFORWARD_PENDING
TBSP3_IX ROLLFORWARD_PENDING
TBSP4_IX ROLLFORWARD_PENDING

4 record(s) selected.

Next step is to rollforward logs.

In this situation, we determined that the best point to rollforward is an immediate point-in-time after database backup.

Here is my rollforward command: (You can only run rollforward command on the catalog partition).

db2 “ROLLFORWARD DATABASE TESTDB TO END OF BACKUP TABLESPACE (TBSP3, TBSP3_IX, TBSP4, TBSP4_IX) ONLINE”

I want this tablespace to be rolled forward on all the partitions it is on and so I did not use the “except” clause.

Also, with “end of backup” clause, the default is for rollforward to run on all database partitions as evident from the syntax from IBM’s Information Center. Here is the notes from IBM about the scope of the rollforward command:

“In a partitioned database environment, this command can only be invoked from the catalog partition. A database or table space rollforward operation to a specified point in time affects all database partitions that are listed in the db2nodes.cfg file. A database or table space rollforward operation to the end of logs affects the database partitions that are specified. If no database partitions are specified, it affects all database partitions that are listed in the db2nodes.cfg file; if rollforward recovery is not needed on a particular partition, that partition is ignored.”

Here is the output of the above rollforward command:

Rollforward Status

Input database alias = TESTDB
Number of nodes have returned status = 12

Node number Rollforward Next log Log files processed Last committed transaction
status to be read
———– ————————– ——————- ————————- ————————–
1 TBS working S0084149.LOG – 2014-01-30-08.15.05.000000 UTC
2 TBS working S0084889.LOG – 2014-02-02-18.01.14.000000 UTC
3 TBS working S0085834.LOG – 2014-01-30-08.12.04.000000 UTC
4 TBS working S0084328.LOG – 2014-01-30-08.12.04.000000 UTC
5 TBS working S0084100.LOG – 2014-02-02-18.01.12.000000 UTC
6 TBS working S0084341.LOG – 2014-01-22-19.00.14.000000 UTC
7 TBS working S0083916.LOG – 2014-02-02-18.01.15.000000 UTC
8 TBS working S0083326.LOG – 2014-02-02-18.01.14.000000 UTC
9 TBS working S0083083.LOG – 2014-02-02-18.01.16.000000 UTC
10 TBS working S0083708.LOG – 2014-02-02-18.01.14.000000 UTC
11 TBS working S0084195.LOG – 2014-02-02-18.01.10.000000 UTC
12 TBS working S0090578.LOG – 2014-02-02-18.01.11.000000 UTC

DB20000I The ROLLFORWARD command completed successfully.

Here is the status of these tablespaces after the above rollforward:

db2 “select distinct(char(tbsp_name,20)), char(tbsp_state , 40) from SYSIBMADM.TBSP_UTILIZATION where TBSP_STATE <> ‘NORMAL’ AND DBPGNAME = ‘PDPG’”

1 2
——————– —————————————-
TBSP3 ROLLFORWARD_IN_PROGRESS
TBSP4 ROLLFORWARD_IN_PROGRESS
TBSP3_IX ROLLFORWARD_IN_PROGRESS
TBSP4_IX ROLLFORWARD_IN_PROGRESS

4 record(s) selected.

Once I found that the timestamp shown as the “Last committed transaction” in the output of rollforward command (above) is satisfactory, I stopped the rollforward activity. Here is how you do it:

db2 “ROLLFORWARD DATABASE TESTDB STOP TABLESPACE (TBSP3, TBSP3_IX, TBSP4, TBSP4_IX) ONLINE”

Rollforward Status

Input database alias = TESTDB
Number of nodes have returned status = 12

Node number Rollforward Next log Log files processed Last committed transaction
status to be read
———– ————————– ——————- ————————- ————————–
1 not pending – 2014-01-30-08.15.05.000000 UTC
2 not pending – 2014-02-02-18.01.14.000000 UTC
3 not pending – 2014-01-30-08.12.04.000000 UTC
4 not pending – 2014-01-30-08.12.04.000000 UTC
5 not pending – 2014-02-02-18.01.12.000000 UTC
6 not pending – 2014-01-22-19.00.14.000000 UTC
7 not pending – 2014-02-02-18.01.15.000000 UTC
8 not pending – 2014-02-02-18.01.14.000000 UTC
9 not pending – 2014-02-02-18.01.16.000000 UTC
10 not pending – 2014-02-02-18.01.14.000000 UTC
11 not pending – 2014-02-02-18.01.10.000000 UTC
12 not pending – 2014-02-02-18.01.11.000000 UTC

DB20000I The ROLLFORWARD command completed successfully.

db2 “select distinct(char(tbsp_name,20)), char(tbsp_state , 40) from SYSIBMADM.TBSP_UTILIZATION where TBSP_STATE <> ‘NORMAL’ AND DBPGNAME = ‘PDPG’”

1 2
——————– —————————————-
TBSP3 BACKUP_PENDING
TBSP4 BACKUP_PENDING
TBSP3_IX BACKUP_PENDING
TBSP4_IX BACKUP_PENDING

4 record(s) selected.

A quick way to take tablespaces out of Backup-Pending mode is to run backup to /dev/null. If you have time, it is highly recommended that you back up to disk / TSM for future recovery purposes.

db2 “BACKUP DATABASE TESTDB ON ALL DBPARTITIONNUMS TABLESPACE (TBSP3, TBSP4, TBSP3_IX, TBSP4_IX) ONLINE”

Part Result
—- ————————————————————————
0000 SQL2430W The database backup succeeded, but the following table spaces do not
exist on this database partition: “TBSP3, TBSP4, TBSP3_IX, TBSP4_IX”.
0001 DB20000I The BACKUP DATABASE command completed successfully.
0002 DB20000I The BACKUP DATABASE command completed successfully.
0003 DB20000I The BACKUP DATABASE command completed successfully.
0004 DB20000I The BACKUP DATABASE command completed successfully.
0005 DB20000I The BACKUP DATABASE command completed successfully.
0006 DB20000I The BACKUP DATABASE command completed successfully.
0007 DB20000I The BACKUP DATABASE command completed successfully.
0008 DB20000I The BACKUP DATABASE command completed successfully.
0009 DB20000I The BACKUP DATABASE command completed successfully.
0010 DB20000I The BACKUP DATABASE command completed successfully.
0011 DB20000I The BACKUP DATABASE command completed successfully.
0012 DB20000I The BACKUP DATABASE command completed successfully.

Backup successful. The timestamp for this backup image is : 20140301191628

Above warning SQL2340W is expected as the tablespaces of interest are NOT on partitions 0. I could have easily prevented this warning by letting db2 backup command to run only on partition 1 to 12 (Check backup syntax for how to do this).

db2 “select distinct(char(tbsp_name,20)), char(tbsp_state , 40) from SYSIBMADM.TBSP_UTILIZATION where TBSP_STATE <> ‘NORMAL’ AND DBPGNAME = ‘PDPG’”

1 2
——————– —————————————-

0 record(s) selected.

db2 “select distinct(char(tbsp_name,20)), char(tbsp_state , 40) from SYSIBMADM.TBSP_UTILIZATION where TBSP_STATE <> ‘NORMAL’”

1 2
——————– —————————————-

0 record(s) selected.

I ran a simple select against these 2 tables to check if I could access them:

db2 “select count(*) from SCHEMA3.TABLE3 with ur”

1
———–
2325433

1 record(s) selected.

db2 “select count(*) from SCHEMA4.TABLE4 with ur”

1
———–
563234

1 record(s) selected.

I asked application folks if the data in these tables looks good. They validated and indicated it all looked good.

Hope this blog post was helpful. Appreciate you posting your questions/comments. Thanks for reading. Have a good time.

Happy Restoring times to you 🙂

Pavan Kristipati.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s