In this blog post, I will share about few gotchas to look out for when backing up tablespaces in a DB2 DPF database. What is DPF? I wrote a detailed blog post on db2commerce.com.
Why Tablespace backups?
Why should we look into backing up tablespaces instead of the entire database? Below are couple of instances when we want to backup tablespaces:
- You just created a new tablespace; Before taking an incremental/delta backup, the newly created tablespace needs to be backed-up.
- Tablespace is in BACKUP PENDING state. This can happen when we perform a load operation for a recoverable database and specify the COPY NO parameter. We can remove the tablespace from the BACKUP PENDING state by backing up the tablespace.
- Is it practical to do database backups? A database backup for a multi-terabyte database (think Data Warehouse) is not practical all the time. It consumes too many resources (CPU, storage, lock contention etc.). If the recovery effort is well thought out, we can get away with performing tablespace level backups.
Now that we know why we need to look at tablespace level backups, let us look at a problem I recently encountered.
SQL2066N encountered when backing up a tablespace in a DB2 DPF database. The backup was being run from a catalog node.
$ db2 "backup database TESTDB tablespace (TBSP1) online to /backups" SQL2066N The command failed because the specified table space either does not exist or is not supported with the specified command. Specified table space name: "TBSP1".
One of the quickest ways to decipher what a SQL error code means is to prefix it with a
db2 ? as shown below.
$ db2 "? SQL2066N" SQL2066N The command failed because the specified table space either does not exist or is not supported with the specified command. Specified table space name: "". Explanation: The table space name specified is syntactically correct, but it does not exist in the database or cannot be used in the specified utility operation. A table space might be disallowed for many reasons, including the following examples: * If the utility in use is a backup operation, the table space might be disallowed because it is a system or user temporary table space or is in an inconsistent state. * If the utility in use is a restore operation, the table space might be disallowed because another restore operation or a rollforward operation of the table space is already in progress. * If the utility is a rollforward operation, the table space might be disallowed because a restore of the table space is already in progress. User response: Resubmit the utility command, specifying a valid table space.
From the above, it looks like we might be passing an incorrect tablespace name. Messages in db2diag.log also lead us to believe that incorrect tablespace name could be the cause of the problem.
2015-09-09-06.52.51.712287-240 E6241A1073 LEVEL: Severe PID : 15991648 TID : 48053 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : TESTDB APPHDL : 0-2914 APPID: *N0.db2inst1.150909105252 AUTHID : db2inst1 HOSTNAME: dvserver1 EDUID : 48053 EDUNAME: db2agent (TESTDB) 0 FUNCTION: DB2 UDB, database utilities, sqlubBuildAppTbsp, probe:1815 MESSAGE : SQL2066N The command failed because the specified table space either does not exist or is not supported with the specified command. Specified table space name: "". DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes sqlcaid : SQLCA sqlcabc: 136 sqlcode: -2066 sqlerrml: 10 sqlerrmc: TBSP1 sqlerrp : sqlubBui sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000 (4) 0x00000000 (5) 0x00000000 (6) 0x00000000 sqlwarn : (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) sqlstate:
At this point, I wanted to find out if the tablespace actually exists in the database and if it does, which partitions is it created on? Here is the SQL to find out.
SELECT CHAR(TBSP_NAME, 20) TABLESPACE, DBPARTITIONNUM FROM SYSIBMADM.TBSP_UTILIZATION WHERE TBSP_NAME LIKE '%TBSP1%' ORDER BY TBSP_NAME, DBPARTITIONNUM WITH UR; TABLESPACE DBPARTITIONNUM -------------------- -------------- TBSP1 1 TBSP1 2 TBSP1 3 TBSP1 4 TBSP1 5 5 record(s) selected.
From the above output, tablespace TBSP1 does exist in the database and is created on partitions 1 through 5. Note that it is NOT created on partition 0 (which is the catalog node). However, as stated at the beginning of the blog post, we are running the backup command from partition 0. Can this be the reason? Are we running into problems because we are running backup of a tablespace (that is NOT created on catalog partition) from the catalog partition? It turns out that this is the problem. How do we handle this?
When we look at
backup command’s syntax, we find the following from IBM’s Knowledge Center – DB2 Backup Command.
In a partitioned database environment, if no database partitions are specified, this command affects only the database partition on which it is executed.
What this means is that we cannot take a backup of tablespace TBSP1 from catalog node (partition 0) as the tablespace does not exist on this partition, at least without making changes to the backup command.
How to get around this? It appears like we have couple of solutions.. but, when we dig further, we’ll find out that we only have one solution.
Explicitly connect to one of the partitions:
We realized that we were on partition 0 when we were backing up a tablespace that does not exist on partition 0. What if we explicitly connect to a partition (example partition #1) on which the tablespace exists? Would that work? Let us see.
$export DB2NODE=1 $ db2 terminate DB20000I The TERMINATE command completed successfully. $ echo $DB2NODE 1 $ db2 "backup database TESTDB tablespace (TBSP1) online to /backups" Backup successful. The timestamp for this backup image is : 20150909070438
How do we know if we backed up tablespace on all partitions or on a single partition? Well, for starters, we only see one backup image when we list the backup images. If we find only one backup image, it means that the backup was run only on a single partition. When a (tablespace/database) backup is run for multiple partitions, multiple images are generated.. one per database partition.
$ ls -ltr TESTDB* -rw------- 1 db2inst1 db2iadm1 4044328960 Sep 09 07:12 TESTDB.3.db2inst1.DBPART001.20150909071246.001
Also, messages in db2diag.log point to successful completion of backup only on partition #1.
2015-09-09-07.04.46.351196-240 E27460A508 LEVEL: Info PID : 5505716 TID : 14968 PROC : db2sysc 1 INSTANCE: db2inst1 NODE : 001 DB : TESTDB APPHDL : 1-30285 APPID: *N1.db2inst1.150909110438 AUTHID : db2inst1 HOSTNAME: dvserver1 EDUID : 14968 EDUNAME: db2agent (TESTDB) 1 FUNCTION: DB2 UDB, database utilities, sqlubSetupJobControl, probe:1897 MESSAGE : Starting an online tablespace TBSP1... backup. 2015-09-09-07.04.56.977959-240 E32815A461 LEVEL: Info PID : 5505716 TID : 14968 PROC : db2sysc 1 INSTANCE: db2inst1 NODE : 001 DB : TESTDB APPHDL : 1-30285 APPID: *N1.db2inst1.150909110438 AUTHID : db2inst1 HOSTNAME: dvserver1 EDUID : 14968 EDUNAME: db2agent (TESTDB) 1 FUNCTION: DB2 UDB, database utilities, sqlubcka, probe:1070 MESSAGE : Backup complete.
Why is this so? This is because when a backup is run, it only affects the database partition on which the backup is run. What if we have multiple tablespaces that we want to backup them up? What if each of them is on a subset of database partitions? Is there an easy way to handle such complex tablespace and partition mapping? Yes there is..
ALL DBPARTITIONNUMS clause to rescue:
The backup command has a clause ‘ALL DBPARTITIONNUMS’ that would allow the backup to be run on all the database partitions. What if a tablespace does not exist on all the database partitions? Well, there is no problem because the backup utility simply prints a warning message indicating that the tablespace does not exist on a specific partition. See below for an example:
$ db2 "backup database TESTDB ON ALL DBPARTITIONNUMS tablespace (TBSP1) online to /backups" Part Result ---- ------------------------------------------------------------------------ 0000 SQL2430W The database backup succeeded, but the following table spaces do not exist on this database partition: "TBSP1". 0001 DB20000I The BACKUP DATABASE command completed successfully. 0002 DB20000I The BACKUP DATABASE command completed successfully. 0003 DB20000I The BACKUP DATABASE command completed successfully. 0004 DB20000I The BACKUP DATABASE command completed successfully. 0005 DB20000I The BACKUP DATABASE command completed successfully. Backup successful. The timestamp for this backup image is : 20150906182257
In the above output, we see a warning (SQL2430W) message about the tablespace TBSP1 not existing on the catalog partition. Look at the prefix ‘000’ before the warning message for partition number. This message could be safely ignored and does not impact backup of the tablespace on other partitions.
- When backing up a tablespace, we need to pay attention to how the tablespace is spread across multiple database partitions.
- Alternatively, include the
ALL DBPARTITIONNUMSclause in the backup command as this would take care backups on all database partitions on which the tablespace exists. Also, this is the best practice to ensure recoverability of database objects on all the partitions.
What other gotchas have you encountered when dealing with backups in a single or multi-partitioned DB2 LUW database? Please share in comments. Thank you for reading.