I am currently working on exploring columnar technology in DB2 BLU 10.5 Fix Pack 4 (a.k.a. Cancun release) and its other features. In this blog post, I will share my initial experience in creating and loading column-organized and row-organized tables in DB2 BLU Acceleration (referred to as DB2 BLU from now on). Further, for a simple SQL statement, we will look at run-time comparisons between a column-organized table and row-organized table.
What is a Column-Organized Table?
Traditionally, DB2 LUW’s tables were organized by rows. This meant a single page stored values of multiple columns of a row. Column-organized tables are different. In columnar format, a single page stores the values of just a single column. With columnar storage, DB2 can read from disk just the columns that are needed – thus reducing the amount of I/O. There is lot of material online that covers basics of column-organized tables. I will list at the bottom of this blog such material for further reference.
What is BLU?
Quick read on BLU could be found on wiki. I would like to point out that BLU is not an acronym.
Below is the system configuration that was used:
64 bit AIX 7.1.3 TL03 / 8 cores / 64 GB Memory / DB2 10.5 FP4
High level steps
1) Set DB2_WORKLOAD registry variable
2) Create Database
3) Create tablespaces (optional)
4) Create column organized and row organized table
5) Load test data
6) Run test workload for runtime comparison
Set DB2_WORKLOAD registry variable – The BLU switch
After installing DB2 software and creating an instance (steps are not any different from non-BLU DB2 installation), my next step was to switch on that parameter that informs DB2 to treat the instance as ‘BLU’ instance. As instance owner, run the command db2set DB2_WORKLOAD=ANALYTICS and stop/start the instance.
The create database command was not anything special.
CREATE DATABASE BLUDBFP4 ON '/db2inst1/BLUDBFP4/data1', '/db2inst1/BLUDBFP4/data2', '/db2inst1/BLUDBFP4/data3' dbpath on '/db2inst1/BLUDBFP4' DB20000I The CREATE DATABASE command completed successfully.
Because DB2_WORKLOAD was set to ANALYTICS, the pagesize and extend size are automatically set to 32 KB and 4 respectively.
Create tablespaces (optional)
If you do not want to create tablespaces, tables are created in USERSPACE1 which is the default user tablespace that is created at the time of database creation.
CREATE TABLESPACE DDA; DB20000I The SQL command completed successfully. CREATE TABLESPACE DDA_IX; DB20000I The SQL command completed successfully.
Create column-organized table
Column-organized table type would be the default type once the db2set variable DB2_WORKLOAD has been set to ANALYTICS. An important point to note is that column-organized tables are NOT the default type for all DB2 10.5 databases.. As indicated earlier, columnar storage is the default only if the db2set parameter has been set.
The syntax for “create table” statement is not much different from row-organized table. Here is a test table that I created.
CREATE TABLE EDM.TRANS_DETAIL_BLU_COLUMN ( TRANS_DETAIL_GK BIGINT NOT NULL , ACCT_GK INTEGER NOT NULL , DIM_ACCT_GK INTEGER NOT NULL , TRANS_PROD_TYPE_CDLKP SMALLINT NOT NULL , TRANS_POST_DTLKP INTEGER NOT NULL , TRANS_POST_SEQ_NUM SMALLINT NOT NULL , DIM_TRANS_CODE_GK INTEGER NOT NULL , TRANS_POST_DATE DATE NOT NULL , DIM_PROD_GK INTEGER , PROD_GK INTEGER , DIM_ORGZTN_UNIT_GK INTEGER , ORGZTN_UNIT_GK INTEGER , TRANS_EFF_DTLKP INTEGER NOT NULL , TRANS_DTLKP INTEGER , TRANS_SRVC_PROVDR_CDLKP INTEGER , TRANS_SRVC_PROVDR_NON_STND_TEXT VARCHAR(36) , TRANS_SITE_CODE VARCHAR(2) , TRANS_DIN_TEXT VARCHAR(9) , DEBIT_CREDIT_TYPE_CDLKP INTEGER NOT NULL , TRANS_AMT DECIMAL(15,2) NOT NULL , TRANS_STMT_DTLKP INTEGER , ROW_INSERT_TMSTMP TIMESTAMP NOT NULL WITH DEFAULT , ROW_INSERT_USER_ID VARCHAR(128) NOT NULL WITH DEFAULT USER , ROW_UPDATE_TMSTMP TIMESTAMP NOT NULL WITH DEFAULT , ROW_UPDATE_USER_ID VARCHAR(128) NOT NULL WITH DEFAULT USER , JOB_BATCH_GK INTEGER NOT NULL ) IN DDA INDEX IN DDA_IX; DB20000I The SQL command completed successfully. ALTER TABLE EDM.TRANS_DETAIL_BLU_COLUMN ADD CONSTRAINT PK_TRANS_DETAIL_BLU_COLUMN PRIMARY KEY (TRANS_DETAIL_GK, ACCT_GK); DB20000I The SQL command completed successfully.
For every colum-organized table, there is a synopsis table (which is also column organized) that is created in the schema SYSIBM. The column ‘TABLEORG’ in syscat.tables identifies if the table is a column (C) or a row-organized (R) table.
The synopsis table has a prefix “SYN140919161109299065 ” followed by the ‘_base’ (In this case, _TRANS_DETAIL_BLU_COLUMN) table. SYN = Synopsis table; 14=2014 (Year); 0919 indicates Sept 19th; Rest of the name indicates the timestamp at which the table was created.
SELECT CHAR(TABSCHEMA, 10) AS SCHEMA, CHAR(TABNAME, 70) AS TABLE, TABLEORG FROM SYSCAT.TABLES WHERE TABNAME LIKE '%TRANS_DETAIL%' WITH UR; SCHEMA TABLE TABLEORG ---------- --------------------------------------------- ------- EDM TRANS_DETAIL_BLU_COLUMN C SYSIBM SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN C 4 record(s) selected
IBM’s documentation indicates that the synopsis table helps with data organization of the base table and in data skipping during query processing. At this point, I am still working on gaining a deeper understanding of how DB2 would use a synopsis table.
I then ran db2look command (db2look -d bludbfp4 -e -z edm -e TRANS_DETAIL_BLU_COLUMN) to extract DDL for the table that was just created. One thing that popped out was that DB2 appended the phrase “ORGANIZE BY COLUMN” at the end of the “create table” statement. DDL from db2look is shown below.
CREATE TABLE EDM.TRANS_DETAIL_BLU_COLUMN ( TRANS_DETAIL_GK BIGINT NOT NULL , ....................... JOB_BATCH_GK INTEGER NOT NULL ) IN DDA INDEX IN DDA_IX ORGANIZE BY COLUMN;
Create row-organized table
A row-organized table is created by simply appending “ORGANIZE BY ROW” to the DDL. Here is the snippet:
CREATE TABLE EDM.TRANS_DETAIL_BLU_ROW ( TRANS_DETAIL_GK BIGINT NOT NULL , ....................... JOB_BATCH_GK INTEGER NOT NULL ) IN DDA INDEX IN DDA_IX ORGANIZE BY ROW; DB20000I The SQL command completed successfully.
As you would have probably noticed, I created the row-organized table in the same tablespace as the column-organized table. In other words, these two types of tables can co-exist not only in the same database but also in the same tablespace.
Can I re-use the name?
My next thought was.. “Can I create a row-organized table with the same name (and in the same schema) as column-organized table”? You probably guessed it.. The answer is “No”. Here is what I tried. As expected I got an error indicating that an identical table exists.
db2 "CREATE TABLE EDM.TRANS_DETAIL_BLU_COLUMN (COL1 INTEGER) ORGANIZE BY ROW" DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned: SQL0601N The name of the object to be created is identical to the existing name "EDM.TRANS_DETAIL_BLU_COLUMN" of type "TABLE". SQLSTATE=42710
I chose sample data of 100 Million for this exercise. I used “Load from cursor” from a table on a remote database. SQL script looks like below.
connect to BLUDBFP4 ; declare dda cursor database edwdv user db2inst1 using password for select * from EDM.TRANS_DETAIL fetch first 100000000 rows only; LOAD FROM dda OF cursor messages dda.msgfile.column insert INTO EDM.TRANS_DETAIL_BLU_COLUMN NONRECOVERABLE ;
Below are load times for row table and column table.
|Row Table||Column Table|
|Load Time||41 minutes||140 minutes|
I re-ran the load into column-organized table after increasing the UTIL_HEAP_SZ to 1M pages as recommended in one of the articles on BLU. Earlier, it was set to AUTOMATIC (179903). There was no difference in the load times. I plan to spend more time in future to understand tuning knobs for faster loads into DB2 BLU. But for now, load time for this data set and for the table structure discussed above, is lot slower on column-organized table compared to row-organized table.
‘CARD’ (in syscat.tables) for column-organized table was populated right after the load. There was no need for extra step of doing manual runstats. As expected, this was not the case in _ROW table. The CARD value was -1 for _ROW table indicating that stats were not collected.
select char (tabschema, 20) as tabschema, substr (tabname,1,70 ) as tabname, card from syscat.tables where tabname like '%TRANS_DETAIL_BLU%' with ur; TABSCHEMA TABNAME CARD -------------------- ---------------------------------------------------------------------------- EDM TRANS_DETAIL_BLU_COLUMN 100000000 EDM TRANS_DETAIL_BLU_ROW -1 SYSIBM SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN -1 3 record(s) selected.
Below is the output after runstats on the row table:
TABSCHEMA TABNAME CARD -------------------- --------------------------------- ---------- EDM TRANS_DETAIL_BLU_COLUMN 100000000 EDM TRANS_DETAIL_BLU_ROW 100000000 SYSIBM SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN -1
I tried running runstats on the synopsis table (the TYPE column in syscat.tables for synopsis table is ‘T’) but got an error.
db2 "runstats on table SYSIBM.SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN" SQL20288N Statistics could not be updated for the object. Object name: "SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN". Object type: "ASM TABLE". Reason code: "1". SQLSTATE=428DY
db2 “? SQL20288N” simply indicated that stats are not supported for the target object. I simply deduced that runstats are not supported for synopsis table.
How big is the Synopsis table?
db2 "select count(*) from SYSIBM.SYN140919161109299065_TRANS_DETAIL_BLU_COLUMN" 1 ----------- 97663 1 record(s) selected.
I remember reading somewhere that for every 1024 rows in the base table, there is 1 row in the synopsis table. Apparently, that is true ! (100000000/97663 ~ 1024)
Run test workload for run time comparison
Next, I ran a simple group by statement (below) to test to see if there is a big difference in run times.
SELECT TRANS_POST_DTLKP, count(*) FROM EDM.TRANS_DETAIL_BLU_ROW group by TRANS_POST_DTLKP order by TRANS_POST_DTLKP desc with ur; ....... 241 record(s) selected. real 2m5.934s user 0m0.005s sys 0m0.008s
SELECT TRANS_POST_DTLKP, count(*) FROM EDM.TRANS_DETAIL_BLU_COLUMN group by TRANS_POST_DTLKP order by TRANS_POST_DTLKP desc with ur; ......... 241 record(s) selected. real 0m2.536s user 0m0.005s sys 0m0.008s
Wow ! This is impressive ! Run time against the row based table was ~2+ minutes while the one against the column table was ~2 seconds. In other words, the query’s runtime was 98.4% shorter (or 1.6% of the run time on row-organized table) on column-organized table !!
|Row Table||Column Table|
|Sample query runtime||125 seconds||2 seconds|
I will share my findings on compression ratios, storage savings and workload run time details on column-organized tables in a future blog post.
1) BLU is largely transparent to the end user and even to the DBA. Identity columns and triggers are not supported yet (DB2 10.5 FP4). No secondary indexes are supported.
2) Row and column organized tables can co-exist in the same tablespace in a database.
3) Analytical queries run lot faster on BLU compared to row based tables.
4) Inserts into column organized tables are slow. There could be tuning knobs that could speed up inserts. This needs to be explored.
5) Runstats need not be run on columnar tables. So far it looks like stats are automated.
1) Column organized tables cannot be created on DMS tablespaces. They can only be created on automatic tablespaces.
2) With column organized tables, there are no options when it comes to compression. There is no “COMPRESS YES/NO” option as we have with row organized tables. The only option that needs to be picked is “ORGANIZE BY COLUMN”.
3) I have NOT sorted data before loading data into tables. In one of the deep-dive talks on DB2 BLU, I remember hearing that compression, data-skipping and SQL performance are much more efficient on sorted data.
4) Starting in FP4 (in DB2 10.5), ‘alter table’ functionality is supported. This was not allowed until FP3/FP3a.
5) When I was just getting started with this proof of concept, memory on the database server was only 16 GB. When I tried using “load from cursor”, I encountered the following error.
SQL0969N There is no message text corresponding to SQL error “-2062614516” in the message file on this workstation. The error was returned from module “SQLUMIO ” with original tokens “”.
I initially attributed this error to be a bug in DB2 BLU but I was wrong. I did not want to spend too much time in troubleshooting this and instead modified the load command by adding a ‘where’ clause. This cut down the result set from 100M rows to ~10 M rows at a time. I ran multiple loads this way to achieve my desired 100M rows data set into DB2 BLU tables.
We later added 48 GB memory to the database server and now the memory was 64 GB. When I re-ran the ‘load from cursor’, I did not get the error. Fact is IBM suggests to have 64 GB RAM with 8 cores. Please refer to this document for further information.
Further reading resources:
1) How BLU Acceleration really works by Guy Lohman
2) Less caffeine, more compression by Iqbal Goralwalla