Quick primer on DB2 Tables and Tablespaces in a multi-partitioned (DPF) database

When I was starting out to work in a DB2 DPF database, I had many questions related to database objects. Some of them were related to gaining understanding of how tables and tablespaces were created in a DPF database, how to spread out (hash distribution) a table across partitions etc.

In this blog post, I will cover how tables and tablespaces are created, how these objects are spread out across multiple partitions in a DPF database. My goal is to be such that “friend” of that DBA who is experienced in a single partitioned DB2 environment but is just getting started in a DPF database. Here we go..

Which database partitions is the table on?

At first, it might sound like a simple question and it may look like a question whose answer could be found in the table’s create table statement.. but it is not quite that way. The answer is hidden in the DB2’s catalog tables and is not explicit!

There are 3 layers of abstraction between tables and partitions in a DPF database. These are tablespace containers, tablespaces and the partition groups and are largely invisible to the end user. These layers are explained further below but first lets look the SQL that queries DB2’s catalogs to find out which partitions does a table belong to…

SELECT CHAR(TABSCHEMA, 20) as SCHEMA,
 CHAR(TABNAME, 40) AS TABLE,
 CHAR(TABLESPACES.TBSPACE, 20) AS TABLESPACE,
 CHAR(PGS.DBPGNAME,20) AS PARTITION_GROUP,
 LISTAGG(DBPARTITIONNUM, ',') WITHIN GROUP (ORDER BY DBPARTITIONNUM) AS PARTITIONS
FROM
 SYSCAT.TABLES TABLES ,
 SYSCAT.TABLESPACES TABLESPACES,
 SYSCAT.DBPARTITIONGROUPDEF PGS
WHERE
 TABSCHEMA = 'EDWSDV'
 AND TABNAME = 'EMPLOYEE'
AND TABLESPACES.TBSPACE = TABLES.TBSPACE
AND TABLESPACES.DBPGNAME = PGS.DBPGNAME
GROUP BY PGS.DBPGNAME,
 TABNAME,
 TABSCHEMA,
 TABLESPACES.TBSPACE
 WITH UR;

The result would be like shown below:

SCHEMA     TABLE    TABLESPACE       PARTITION_GROUP   PARTITIONS
-------- ------------- ----------- --------------  ------------------
EDWSDV    EMPLOYEE  DV20140902114851   PDPG     1,2,3,4,5,6,7,8,9,10,11,12

From the above output, it is evident that the table EDWSDV.EMPLOYEE is on the partitions 1 through 12.

The columns TABLESPACE and PARTITION_GROUP were included to demonstrate that layers of abstraction between DB2’s tables and the partitions in a DPF database. The hierarchy is as below:

Tables belong to a particular tablespace, tablespaces belongs to a specific partiton group and a partition group contains a specific set of partitions that are defined in $HOME/sqllib/db2nodes.cfg where $HOME is the DB2 Instance Owner’s home directory.

Which database partitions is the tablespace on?

The SQL mentioned above answers this question for us. Just remove the schema and table from the above SQL and join DB2’s catalog tables SYSCAT.TABLESPACES and SYSCAT.DBPARTITIONGROUPDEF to find out the partition numbers on which the tablespace in question is created on. Here is the SQL that does the job:

SELECT CHAR(TBSPACE, 20) TABLESPACE,
CHAR(PGS.DBPGNAME,20) AS PARTITION_GROUP,
LISTAGG(DBPARTITIONNUM, ',') WITHIN GROUP (ORDER BY DBPARTITIONNUM) AS PARTITIONS
FROM SYSCAT.DBPARTITIONGROUPDEF PGS , SYSCAT.TABLESPACES TABLESPACES
WHERE TABLESPACES.NGNAME = PGS.DBPGNAME
AND TABLESPACES.TBSPACE = 'DVPARTY'
GROUP BY CHAR(PGS.DBPGNAME,20)
, TBSPACE WITH UR;
TABLESPACE   PARTITION_GROUP   PARTITIONS
----------- --------------  ------------------
DV20140902114851   PDPG     1,2,3,4,5,6,7,8,9,10,11,12

How to create a table on more than 1 partition?

As discussed above, there is no option in the “create table .. ” command to specify which partitions the table is to be created on.. If that is the case, how do you specify which partitions does the table need to belong to?

You do this in the following steps.

Step 1) Create a “Partition Group”

The Partition Group “PDPG” (that is shown in the output of the SQL above) could be created as shown below:

CREATE DATABASE PARTITION GROUP "PDPG" ON DBPARTITIONNUMS (1,2,3,4,5,6,7,8,9,10,11,12);

You could also specify a subset of partitions as shown below to create other partition groups. The partition group name has to be unique in a DPF database.

CREATE DATABASE PARTITION GROUP SDPG ON DBPARTITIONNUMS (0); 
CREATE DATABASE PARTITION GROUP D1PG ON DBPARTITIONNUMS (1,2,3,4); 
CREATE DATABASE PARTITION GROUP D2PG ON DBPARTITIONNUMS (5,6,7,8);
CREATE DATABASE PARTITION GROUP D3PG ON DBPARTITIONNUMS (9,10,11,12);

Step 2) Create a “Tablespace” in partition group created in step 1

As you would notice below, the tablespace dv20140902114851 will be created in the partition group “PDPG”.

Tablespace’s containers are specified in the “USING FILE….” clause. Each container is local and is specific to a database partition.

CREATE LARGE TABLESPACE dv20140902114851 IN DATABASE PARTITION GROUP PDPG PAGESIZE 16384 MANAGED BY DATABASE
USING (FILE '/db2fs/db2inst1/NODE0001/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (1)
USING (FILE '/db2fs/db2inst1/NODE0002/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (2)
USING (FILE '/db2fs/db2inst1/NODE0003/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (3)
USING (FILE '/db2fs/db2inst1/NODE0004/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (4)
USING (FILE '/db2fs/db2inst1/NODE0005/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (5)
USING (FILE '/db2fs/db2inst1/NODE0006/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (6)
USING (FILE '/db2fs/db2inst1/NODE0007/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (7)
USING (FILE '/db2fs/db2inst1/NODE0008/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (8)
USING (FILE '/db2fs/db2inst1/NODE0009/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (9)
USING (FILE '/db2fs/db2inst1/NODE0010/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (10)
USING (FILE '/db2fs/db2inst1/NODE0011/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (11)
USING (FILE '/db2fs/db2inst1/NODE0012/EDWDB/dv20140902114851' 200) ON DBPARTITIONNUMS (12)
EXTENTSIZE 32
PREFETCHSIZE AUTOMATIC
BUFFERPOOL BP_16K
OVERHEAD 3.630000
TRANSFERRATE 0.070000
AUTORESIZE YES
INCREASESIZE 10 PERCENT
MAXSIZE NONE
NO FILE SYSTEM CACHING
DROPPED TABLE RECOVERY ON ;

Once the above tablespace is created, it spans across 12 partitions. This is because the partition group in which the tablespace is created has 12 partitions in it. Further in this example, the tablespace dv20140902114851 will have 12 containers, the size of each one being 200 pages.

Step 3) Create a table in the tablespace created in step 2

This is the final step in creating a table across multiple partitions.

CREATE TABLE EDWSDV.EMPLOYEE  (
                  EMPLOYEE_ID VARCHAR(50) NOT NULL ,
                  EMPLOYEE_FIRST_NAME VARCHAR(50) NOT NULL ,
                  EMPLOYEE_LAST_NAME VARCHAR(50) SMALLINT ,
                  ROW_INSERT_TMSTMP TIMESTAMP)
                  DISTRIBUTE BY HASH (EMPLOYEE_ID) 
IN dv20140902114851 INDEX IN dv20140902114851ix;

In the above example, the table EDWSDV.EMPLOYEE will be hashed on the column EMPLOYEE_ID. i.e., EMPLOYEE_ID is the hash distribution key for this table. For every row that is inserted/updated, DB2 applies a hashing algorithm based on the value of the hash key (in this case EMPLOYEE_ID). DB2 then decides to send that specific row to a specific partition based on the result of the hashing algorithm.

As you would notice from the “create table …” statement, from a high level, there are only 2 differences in the “create table .. ” statement itself between a single partitioned and a DPF database. They are:

1) DISTRIBUTE BY clause — In this clause, you include the hash keys for the table. You could include multiple columns (separate by a comma) as hash key for a table.

If the “create table .. ” statement does not have “DISTRIBUTE BY..” clause and the tablespace (in the create table .. statement) belongs to multiple-partitions, then:

  • If a primary key is defined, the first column of the primary key becomes the distribution key
  • Otherwise, the first column whose data type is valid for a distribution key becomes the distribution key

2) “IN $tablespace” clause — In this clause, you specify the tablespace that the table needs to be created in. For a table that is spread across multiple partitons, you would need to specify that tablespace that belongs to those partitons that you are interested in spreading the table across.

Advertisements

One thought on “Quick primer on DB2 Tables and Tablespaces in a multi-partitioned (DPF) database

  1. Pingback: Choosing the right distribution keys in a DB2 DPF database

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s