Creating and loading column-organized and row-organized tables in DB2 BLU

I am currently working on exploring columnar technology in DB2 BLU 10.5 Fix Pack 4 (a.k.a. Cancun release) and its other features. In this blog post, I will share my initial experience in creating and loading column-organized and row-organized tables in DB2 BLU Acceleration (referred to as DB2 BLU from now on). Further, for a simple SQL statement, we will look at run-time comparisons between a column-organized table and row-organized table.

What is a Column-Organized Table?

Traditionally, DB2 LUW’s tables were organized by rows. This meant a single page stored values of multiple columns of a row. Column-organized tables are different. In columnar format, a single page stores the values of just a single column. With columnar storage, DB2 can read from disk just the columns that are needed – thus reducing the amount of I/O. There is lot of material online that covers basics of column-organized tables. I will list at the bottom of this blog such material for further reference.

What is BLU?

Quick read on BLU could be found on wiki. I would like to point out that BLU is not an acronym.

 System Configuration

Below is the system configuration that was used:

64 bit AIX 7.1.3 TL03 / 8 cores / 64 GB Memory / DB2 10.5 FP4

High level steps

1) Set DB2_WORKLOAD registry variable

2) Create Database

3) Create tablespaces (optional)

4) Create column organized and row organized table

5) Load test data

6) Run test workload for runtime comparison

 Set DB2_WORKLOAD registry variable – The BLU switch

After installing DB2 software and creating an instance (steps are not any different from non-BLU DB2 installation), my next step was to switch on that parameter that informs DB2 to treat the instance as ‘BLU’ instance. As instance owner, run the command db2set DB2_WORKLOAD=ANALYTICS and stop/start the instance.

Create Database

The create database command was not anything special.

CREATE DATABASE BLUDBFP4 ON 
'/db2inst1/BLUDBFP4/data1', 
'/db2inst1/BLUDBFP4/data2', 
'/db2inst1/BLUDBFP4/data3' 
dbpath on '/db2inst1/BLUDBFP4'
DB20000I  The CREATE DATABASE command completed successfully.

Because DB2_WORKLOAD was set to ANALYTICS, the pagesize and extend size are automatically set to 32 KB and 4 respectively.

Create tablespaces (optional)

If you do not want to create tablespaces, tables are created in USERSPACE1 which is the default user tablespace that is created at the time of database creation.

CREATE TABLESPACE DDA; 
DB20000I  The SQL command completed successfully.

CREATE TABLESPACE DDA_IX; 
DB20000I  The SQL command completed successfully.

Create column-organized table

Column-organized table type would be the default type once the db2set variable DB2_WORKLOAD has been set to ANALYTICS. An important point to note is that column-organized tables are NOT the default type for all DB2 10.5 databases.. As indicated earlier, columnar storage is the default only if the db2set parameter has been set.

The syntax for “create table” statement is not much different from row-organized table. Here is a test table that I created.

CREATE TABLE EDM.TRANS_DETAIL_BLU_COLUMN  (
                  TRANS_DETAIL_GK BIGINT NOT NULL ,
                  ACCT_GK INTEGER NOT NULL ,
                  DIM_ACCT_GK INTEGER NOT NULL ,
                  TRANS_PROD_TYPE_CDLKP SMALLINT NOT NULL ,
                  TRANS_POST_DTLKP INTEGER NOT NULL ,
                  TRANS_POST_SEQ_NUM SMALLINT NOT NULL ,
                  DIM_TRANS_CODE_GK INTEGER NOT NULL ,
                  TRANS_POST_DATE DATE NOT NULL ,
                  DIM_PROD_GK INTEGER ,
                  PROD_GK INTEGER ,
                  DIM_ORGZTN_UNIT_GK INTEGER ,
                  ORGZTN_UNIT_GK INTEGER ,
                  TRANS_EFF_DTLKP INTEGER NOT NULL ,
                  TRANS_DTLKP INTEGER ,
                  TRANS_SRVC_PROVDR_CDLKP INTEGER ,
                  TRANS_SRVC_PROVDR_NON_STND_TEXT VARCHAR(36) ,
                  TRANS_SITE_CODE VARCHAR(2) ,
                  TRANS_DIN_TEXT VARCHAR(9) ,
                  DEBIT_CREDIT_TYPE_CDLKP INTEGER NOT NULL ,
                  TRANS_AMT DECIMAL(15,2) NOT NULL ,
                  TRANS_STMT_DTLKP INTEGER ,
                  ROW_INSERT_TMSTMP TIMESTAMP NOT NULL WITH DEFAULT  ,
                  ROW_INSERT_USER_ID VARCHAR(128) NOT NULL WITH DEFAULT USER ,
                  ROW_UPDATE_TMSTMP TIMESTAMP NOT NULL WITH DEFAULT  ,
                  ROW_UPDATE_USER_ID VARCHAR(128) NOT NULL WITH DEFAULT USER ,
                  JOB_BATCH_GK INTEGER NOT NULL )
IN DDA INDEX IN DDA_IX;
DB20000I  The SQL command completed successfully.
ALTER TABLE EDM.TRANS_DETAIL_BLU_COLUMN
        ADD CONSTRAINT PK_TRANS_DETAIL_BLU_COLUMN PRIMARY KEY
                (TRANS_DETAIL_GK,
                 ACCT_GK);
DB20000I  The SQL command completed successfully.

For every colum-organized table, there is a synopsis table (which is also column organized) that is created in the schema SYSIBM. The column ‘TABLEORG’ in syscat.tables identifies if the table is a column (C) or a row-organized (R) table.

The synopsis table has a prefix “SYN140919161109299065 ” followed by the ‘_base’ (In this case, _TRANS_DETAIL_BLU_COLUMN) table. SYN = Synopsis table; 14=2014 (Year); 0919 indicates Sept 19th; Rest of the name indicates the timestamp at which the table was created.

SELECT CHAR(TABSCHEMA, 10) AS SCHEMA, 
       CHAR(TABNAME, 70)   AS TABLE, 
       TABLEORG
FROM   SYSCAT.TABLES
WHERE  TABNAME LIKE '%TRANS_DETAIL%' 
WITH UR;

SCHEMA     TABLE                                         TABLEORG
---------- --------------------------------------------- -------
EDM        TRANS_DETAIL_BLU_COLUMN                          C
SYSIBM     SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN    C

  4 record(s) selected

IBM’s documentation indicates that the synopsis table helps with data organization of the base table and in data skipping during query processing. At this point, I am still working on gaining a deeper understanding of how DB2 would use a synopsis table.

I then ran db2look command (db2look -d bludbfp4 -e -z edm -e TRANS_DETAIL_BLU_COLUMN) to extract DDL for the table that was just created. One thing that popped out was that DB2 appended the phrase “ORGANIZE BY COLUMN” at the end of the “create table” statement. DDL from db2look is shown below.

CREATE TABLE EDM.TRANS_DETAIL_BLU_COLUMN  (
                  TRANS_DETAIL_GK BIGINT NOT NULL ,
.......................
                  JOB_BATCH_GK INTEGER NOT NULL )
IN DDA INDEX IN DDA_IX
    ORGANIZE BY COLUMN;

Create row-organized table

A row-organized table is created by simply appending “ORGANIZE BY ROW” to the DDL. Here is the snippet:

CREATE TABLE EDM.TRANS_DETAIL_BLU_ROW  (
                  TRANS_DETAIL_GK BIGINT NOT NULL ,
.......................
                  JOB_BATCH_GK INTEGER NOT NULL )
IN DDA INDEX IN DDA_IX
    ORGANIZE BY ROW;
DB20000I  The SQL command completed successfully.

As you would have probably noticed, I created the row-organized table in the same tablespace as the column-organized table. In other words, these two types of tables can co-exist not only in the same database but also in the same tablespace.

Can I re-use the name?

My next thought was.. “Can I create a row-organized table with the same name (and in the same schema) as column-organized table”? You probably guessed it.. The answer is “No”. Here is what I tried. As expected I got an error indicating that an identical table exists.

db2 "CREATE TABLE EDM.TRANS_DETAIL_BLU_COLUMN 
(COL1 INTEGER) 
ORGANIZE BY ROW"

DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL0601N  The name of the object to be created is identical to the existing
name "EDM.TRANS_DETAIL_BLU_COLUMN" of type "TABLE".  SQLSTATE=42710

Load data

I chose sample data of 100 Million for this exercise. I used “Load from cursor” from a table on a remote database. SQL script looks like below.

connect to BLUDBFP4 ;
declare dda cursor database edwdv user db2inst1 using password for select * from EDM.TRANS_DETAIL fetch first 100000000 rows only;

LOAD FROM dda OF cursor messages dda.msgfile.column insert INTO EDM.TRANS_DETAIL_BLU_COLUMN NONRECOVERABLE ;

Below are load times for row table and column table.

Row Table Column Table
Load Time 41 minutes 140 minutes

I re-ran the load into column-organized table after increasing the UTIL_HEAP_SZ to 1M pages as recommended in one of the articles on BLU. Earlier, it was set to AUTOMATIC (179903). There was no difference in the load times. I plan to spend more time in future to understand tuning knobs for faster loads into DB2 BLU. But for now, load time for this data set and for the table structure discussed above, is lot slower on column-organized table compared to row-organized table.

‘CARD’ (in syscat.tables) for column-organized table was populated right after the load. There was no need for extra step of doing manual runstats. As expected, this was not the case in _ROW table. The CARD value was -1 for _ROW table indicating that stats were not collected.

select char (tabschema, 20) as tabschema, 
substr (tabname,1,70 ) as tabname, 
card
from syscat.tables
where tabname like '%TRANS_DETAIL_BLU%' with ur;

TABSCHEMA            TABNAME                                                         CARD     
-------------------- ----------------------------------------------------------------------------
EDM                TRANS_DETAIL_BLU_COLUMN                                         100000000   
EDM                TRANS_DETAIL_BLU_ROW                                                -1       
SYSIBM             SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN                       -1
  3 record(s) selected.

Below is the output after runstats on the row table:

TABSCHEMA                  TABNAME                       CARD       
-------------------- --------------------------------- ----------   
EDM      TRANS_DETAIL_BLU_COLUMN                        100000000     
EDM      TRANS_DETAIL_BLU_ROW                           100000000
SYSIBM   SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN   -1

I tried running runstats on the synopsis table (the TYPE column in syscat.tables for synopsis table is ‘T’) but got an error.

db2 "runstats on table SYSIBM.SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN"
 SQL20288N  Statistics could not be updated for the object. Object name:
 "SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN". Object type: "ASM
 TABLE". Reason code: "1".  SQLSTATE=428DY

db2 “? SQL20288N” simply indicated that stats are not supported for the target object. I simply deduced that runstats are not supported for synopsis table.

How big is the Synopsis table?

db2 "select count(*) from SYSIBM.SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN"
1
-----------
      97663
  1 record(s) selected.

I remember reading somewhere that for every 1024 rows in the base table, there is 1 row in the synopsis table. Apparently, that is true ! (100000000/97663 ~ 1024)

Run test workload for run time comparison

Next, I ran a simple group by statement (below) to test to see if there is a big difference in run times.

SELECT  TRANS_POST_DTLKP, count(*) 
FROM 
EDM.TRANS_DETAIL_BLU_ROW 
group by TRANS_POST_DTLKP 
order by TRANS_POST_DTLKP desc 
with ur;
.......
241 record(s) selected.
real    2m5.934s
user    0m0.005s
sys     0m0.008s
SELECT  TRANS_POST_DTLKP, count(*) 
FROM 
EDM.TRANS_DETAIL_BLU_COLUMN 
group by TRANS_POST_DTLKP 
order by TRANS_POST_DTLKP desc 
with ur;
.........
  241 record(s) selected.
real    0m2.536s
user    0m0.005s
sys     0m0.008s

Wow ! This is impressive ! Run time against the row based table was ~2+ minutes while the one against the column table was ~2 seconds. In other words, the query’s runtime was 98.4% shorter (or 1.6% of the run time on row-organized table) on column-organized table !!

Row Table Column Table
Sample query runtime 125 seconds 2 seconds

I will share my findings on compression ratios, storage savings and workload run time details on column-organized tables in a future blog post.

Observations:

1) BLU is largely transparent to the end user and even to the DBA. Identity columns and triggers are not supported yet (DB2 10.5 FP4). No secondary indexes are supported.

2) Row and column organized tables can co-exist in the same tablespace in a database.

3) Analytical queries run lot faster on BLU compared to row based tables.

4) Inserts into column organized tables are slow. There could be tuning knobs that could speed up inserts. This needs to be explored.

5) Runstats need not be run on columnar tables. So far it looks like stats are automated.

Side Notes:

1) Column organized tables cannot be created on DMS tablespaces. They can only be created on automatic tablespaces.

2) With column organized tables, there are no options when it comes to compression. There is no “COMPRESS YES/NO” option as we have with row organized tables. The only option that needs to be picked is “ORGANIZE BY COLUMN”.

3) I have NOT sorted data before loading data into tables. In one of the deep-dive talks on DB2 BLU, I remember hearing that compression, data-skipping and SQL performance are much more efficient on sorted data.

4) Starting in FP4 (in DB2 10.5), ‘alter table’ functionality is supported. This was not allowed until FP3/FP3a.

5) When I was just getting started with this proof of concept, memory on the database server was only 16 GB. When I tried using “load from cursor”, I encountered the following error.

SQL0969N  There is no message text corresponding to SQL error “-2062614516” in the message file on this workstation.  The error was returned from module “SQLUMIO ” with original tokens “”.

I initially attributed this error to be a bug in DB2 BLU but I was wrong. I did not want to spend too much time in troubleshooting this and instead modified the load command by adding a ‘where’ clause. This cut down the result set from 100M rows to ~10 M rows at a time. I ran multiple loads this way to achieve my desired 100M rows data set into DB2 BLU tables.

We later added 48 GB memory to the database server and now the memory was 64 GB. When I re-ran the ‘load from cursor’, I did not get the error. Fact is IBM suggests to have 64 GB RAM with 8 cores. Please refer to this document for further information.

Further reading resources:

1) How BLU Acceleration really works by Guy Lohman

2) Less caffeine, more compression by Iqbal Goralwalla

3) The DB2Night Show #140: DB2 LUW V10.5 CANCUN with George Baklarz

Advertisements

3 thoughts on “Creating and loading column-organized and row-organized tables in DB2 BLU

  1. Hi Pavan, I’m IM consultant in Venezuela. I did see your steps and I would like to see the values ​​of these parameters. of course If you want share with me.

    DFT_DEGREE
    SHEAPTHRES_SHR
    SORTHEAP
    CATALOGCAHCE_SZ
    UTIL_HEAP_SZ
    DFT_EXTENT_SZ
    AUTOMATIC_REORG
    DFT_TABLE_ORG
    Thank you.
    I

  2. Hi Nicholas — Appreciate you stopping by on db2talk.com. I have no problem in sharing this info. Here are the values. Would love to know how you are tuning your configs on columnar storage.

    Degree of parallelism (DFT_DEGREE) = ANY
    Sort heap thres for shared sorts (4KB) (SHEAPTHRES_SHR) = 294803
    Sort list heap (4KB) (SORTHEAP) = 32768
    Catalog cache size (4KB) (CATALOGCACHE_SZ) = 360
    Default tablespace extentsize (pages) (DFT_EXTENT_SZ) = 32
    Automatic reorganization (AUTO_REORG) = ON
    Default table organization (DFT_TABLE_ORG) = COLUMN
    Utilities heap size (4KB) (UTIL_HEAP_SZ) = 1000000

    • Hi Pavian, please use this steps (from my co-worker Jorge Anguiano IBM IM Consultant in Mexico)

      1) update dbm cfg
      SHEAPTHRES 0
      UTIL_HEAP_SZ 4000000 not automatic
      (Only the first time when load the table)

      2)
      update db cfg using
      blocknonlogged yes

      3)
      Use load without nonrecoverable
      LOAD FROM dda OF cursor messages dda.msgfile.column insert INTO EDM.TRANS_DETAIL_BLU_COLUMN

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s