Tablespace backups in a DB2 LUW DPF database

In this blog post, I will share about few gotchas to look out for when backing up tablespaces in a DB2 DPF database. What is DPF? I wrote a detailed blog post on db2commerce.com.

Why Tablespace backups?

Why should we look into backing up tablespaces instead of the entire database? Below are couple of instances when we want to backup tablespaces:

  • You just created a new tablespace; Before taking an incremental/delta backup, the newly created tablespace needs to be backed-up.
  • Tablespace is in BACKUP PENDING state. This can happen when we perform a load operation for a recoverable database and specify the COPY NO parameter. We can remove the tablespace from the BACKUP PENDING state by backing up the tablespace.
  • Is it practical to do database backups? A database backup for a multi-terabyte database (think Data Warehouse) is not practical all the time. It consumes too many resources (CPU, storage, lock contention etc.). If the recovery effort is well thought out, we can get away with performing tablespace level backups.

Now that we know why we need to look at tablespace level backups, let us look at a problem I recently encountered.

Problem:

SQL2066N encountered when backing up a tablespace in a DB2 DPF database. The backup was being run from a catalog node.

$ db2 "backup database TESTDB tablespace (TBSP1) online to /backups"
SQL2066N  The command failed because the specified table space either does not
exist or is not supported with the specified command.  Specified table space
name: "TBSP1".

Troubleshooting SQL2066N:

One of the quickest ways to decipher what a SQL error code means is to prefix it with a db2 ? as shown below.

 $ db2 "? SQL2066N"
SQL2066N  The command failed because the specified table space either
      does not exist or is not supported with the specified command.
      Specified table space name: "".

Explanation:
The table space name specified is syntactically correct, but it does not
exist in the database or cannot be used in the specified utility
operation.

A table space might be disallowed for many reasons, including the
following examples:

*  If the utility in use is a backup operation, the table space might be
   disallowed because it is a system or user temporary table space or is
   in an inconsistent state.
*  If the utility in use is a restore operation, the table space might
   be disallowed because another restore operation or a rollforward
   operation of the table space is already in progress.
*  If the utility is a rollforward operation, the table space might be
   disallowed because a restore of the table space is already in
   progress.

User response:

Resubmit the utility command, specifying a valid table space.

Initial findings:

From the above, it looks like we might be passing an incorrect tablespace name. Messages in db2diag.log also lead us to believe that incorrect tablespace name could be the cause of the problem.

2015-09-09-06.52.51.712287-240 E6241A1073           LEVEL: Severe
PID     : 15991648             TID : 48053          PROC : db2sysc 0
INSTANCE: db2inst1               NODE : 000           DB   : TESTDB
APPHDL  : 0-2914               APPID: *N0.db2inst1.150909105252
AUTHID  : db2inst1               HOSTNAME: dvserver1
EDUID   : 48053                EDUNAME: db2agent (TESTDB) 0
FUNCTION: DB2 UDB, database utilities, sqlubBuildAppTbsp, probe:1815
MESSAGE : SQL2066N  The command failed because the specified table space either
          does not exist or is not supported with the specified command.
          Specified table space name: "".
DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
 sqlcaid : SQLCA     sqlcabc: 136   sqlcode: -2066   sqlerrml: 10
 sqlerrmc: TBSP1
 sqlerrp : sqlubBui
 sqlerrd : (1) 0x00000000      (2) 0x00000000      (3) 0x00000000
           (4) 0x00000000      (5) 0x00000000      (6) 0x00000000
 sqlwarn : (1)      (2)      (3)      (4)        (5)       (6)
           (7)      (8)      (9)      (10)        (11)
 sqlstate:

At this point, I wanted to find out if the tablespace actually exists in the database and if it does, which partitions is it created on? Here is the SQL to find out.

SELECT CHAR(TBSP_NAME, 20) TABLESPACE,        
DBPARTITIONNUM 
FROM   SYSIBMADM.TBSP_UTILIZATION 
WHERE TBSP_NAME LIKE '%TBSP1%' 
ORDER  BY TBSP_NAME,          
          DBPARTITIONNUM 
WITH UR;  

TABLESPACE           DBPARTITIONNUM
-------------------- --------------
TBSP1                             1
TBSP1                             2
TBSP1                             3
TBSP1                             4
TBSP1                             5
  5 record(s) selected.

From the above output, tablespace TBSP1 does exist in the database and is created on partitions 1 through 5. Note that it is NOT created on partition 0 (which is the catalog node). However, as stated at the beginning of the blog post, we are running the backup command from partition 0. Can this be the reason? Are we running into problems because we are running backup of a tablespace (that is NOT created on catalog partition) from the catalog partition? It turns out that this is the problem. How do we handle this?

Solution:

When we look at backup command’s syntax, we find the following from IBM’s Knowledge Center – DB2 Backup Command.

Scope

In a partitioned database environment, if no database partitions are specified, this command affects only the database partition on which it is executed.

What this means is that we cannot take a backup of tablespace TBSP1 from catalog node (partition 0) as the tablespace does not exist on this partition, at least without making changes to the backup command.

How to get around this? It appears like we have couple of solutions.. but, when we dig further, we’ll find out that we only have one solution.

Explicitly connect to one of the partitions:

We realized that we were on partition 0 when we were backing up a tablespace that does not exist on partition 0. What if we explicitly connect to a partition (example partition #1) on which the tablespace exists? Would that work? Let us see.

$export DB2NODE=1
$ db2 terminate
DB20000I  The TERMINATE command completed successfully.
 $ echo $DB2NODE
1
$ db2 "backup database TESTDB tablespace (TBSP1) online to /backups"
Backup successful. The timestamp for this backup image is : 20150909070438

How do we know if we backed up tablespace on all partitions or on a single partition? Well, for starters, we only see one backup image when we list the backup images. If we find only one backup image, it means that the backup was run only on a single partition. When a (tablespace/database) backup is run for multiple partitions, multiple images are generated.. one per database partition.

 $ ls -ltr TESTDB*
-rw-------    1 db2inst1   db2iadm1  4044328960 Sep 09 07:12 TESTDB.3.db2inst1.DBPART001.20150909071246.001

Also, messages in db2diag.log point to successful completion of backup only on partition #1.

2015-09-09-07.04.46.351196-240 E27460A508           LEVEL: Info
PID     : 5505716              TID : 14968          PROC : db2sysc 1
INSTANCE: db2inst1               NODE : 001           DB   : TESTDB
APPHDL  : 1-30285              APPID: *N1.db2inst1.150909110438
AUTHID  : db2inst1               HOSTNAME: dvserver1
EDUID   : 14968                EDUNAME: db2agent (TESTDB) 1
FUNCTION: DB2 UDB, database utilities, sqlubSetupJobControl, probe:1897
MESSAGE : Starting an online tablespace TBSP1... backup.

2015-09-09-07.04.56.977959-240 E32815A461           LEVEL: Info
PID     : 5505716              TID : 14968          PROC : db2sysc 1
INSTANCE: db2inst1               NODE : 001           DB   : TESTDB
APPHDL  : 1-30285              APPID: *N1.db2inst1.150909110438
AUTHID  : db2inst1               HOSTNAME: dvserver1
EDUID   : 14968                EDUNAME: db2agent (TESTDB) 1
FUNCTION: DB2 UDB, database utilities, sqlubcka, probe:1070
MESSAGE : Backup complete.

Why is this so? This is because when a backup is run, it only affects the database partition on which the backup is run. What if we have multiple tablespaces that we want to backup them up? What if each of them is on a subset of database partitions? Is there an easy way to handle such complex tablespace and partition mapping? Yes there is..

ALL DBPARTITIONNUMS clause to rescue:

The backup command has a clause ‘ALL DBPARTITIONNUMS’ that would allow the backup to be run on all the database partitions. What if a tablespace does not exist on all the database partitions? Well, there is no problem because the backup utility simply prints a warning message indicating that the tablespace does not exist on a specific partition. See below for an example:


$ db2 "backup database TESTDB ON ALL DBPARTITIONNUMS tablespace (TBSP1) online to /backups"
Part  Result
----  ------------------------------------------------------------------------
0000  SQL2430W  The database backup succeeded, but the following table spaces do not exist on this database partition: "TBSP1".
0001  DB20000I  The BACKUP DATABASE command completed successfully.
0002  DB20000I  The BACKUP DATABASE command completed successfully.
0003  DB20000I  The BACKUP DATABASE command completed successfully.
0004  DB20000I  The BACKUP DATABASE command completed successfully.
0005  DB20000I  The BACKUP DATABASE command completed successfully.

Backup successful. The timestamp for this backup image is : 20150906182257

In the above output, we see a warning (SQL2430W) message about the tablespace TBSP1 not existing on the catalog partition. Look at the prefix ‘000’ before the warning message for partition number. This message could be safely ignored and does not impact backup of the tablespace on other partitions.

Conclusion:

  • When backing up a tablespace, we need to pay attention to how the tablespace is spread across multiple database partitions.
  • Alternatively, include the ALL DBPARTITIONNUMS clause in the backup command as this would take care backups on all database partitions on which the tablespace exists. Also, this is the best practice to ensure recoverability of database objects on all the partitions.

What other gotchas have you encountered when dealing with backups in a single or multi-partitioned DB2 LUW database? Please share in comments. Thank you for reading.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s