Raghu On Tech

Pro tips on managing data at scale

How To Perform DB2 Loads in HADR Environment ?

by Raghu | Jan 24, 2018 | 7 comments

Problem Statement:

You need to run DB2 LOAD utility in a HADR environment, however DB2 LOAD utility only logs certain kinds of transaction log records and its not enough to replicate the loaded data via transaction logs to the HADR Standby(s). If you would like to learn more about what kind of log records that LOAD utility marks in transaction logs, take a look at the following link from IBM Knowledge Center.

DB2 LOAD Utility Log Records

Since LOAD operation is not completely logged, one need to understand its implications when running the utility in HADR environments. So how do you handle LOAD’s in a HADR environment ?

I can think of a few solutions to get past this situation. Lets discuss them in detail in the following sections.

Solution 1:

I will start with the easy one first. You simply reinitialize the HADR standby(s) with the backup image of the primary database post completion of LOAD. I am not going to show exact commands for this operation since this is a straight forward solution. Following are the sequence of steps one might perform to reinitialize HADR

STOP HADR on primary and standby(s).
Perform LOAD operation on HADR primary database table(s).
Take an online backup of primary database.
Restore the backup image on Standby database server(s).
Start HADR on standby database(s) and primary database in that sequence.

Solution 2:

Here assuming you have the IBM recommended identical primary and standby servers with respect to hardware and software, we will work towards a solution with out making use of a shared or Network File System. In my case, I created a filesystem dedicated to database backups on my primary and standby virtual machines.

df -H | grep -i db2backup
/dev/mapper/db2vg-db2backup 74G 23G 48G 32% /db2backup

Following are the sequence of steps that one might perform to successfully replicate a LOAD operation on primary to Standby servers without the use of shared file system. All the IP addresses and hostnames you see below are fictional or modified.

Primary:

Created a test table for this exercise, this table structure is identical to an existing table.
Stop HADR on primary database server. so that you can perform the LOAD on primary. This is needed to ensure that standby won’t keep looking for the LOAD copy file.

$ db2 -tvf guidmap_test_new.ddl
CREATE TABLE "TEST"."TEST_NEW" ( "CAF_GUID" BIGINT NOT NULL , "BACKEND_GUID" VARCHAR(200 OCTETS) , "TS_ACCOUNT_ID" BIGINT NOT NULL , "TYPE" INTEGER NOT NULL WITH DEFAULT 0 , "FOLDER" VARCHAR(150 OCTETS) , "PCOUNT" INTEGER WITH DEFAULT 0 , "FCOUNT" INTEGER WITH DEFAULT 0 ) IN "TESTTS" ORGANIZE BY ROW
DB20000I The SQL command completed successfully.

ALTER TABLE "TEST"."TEST_NEW" VOLATILE CARDINALITY
DB20000I The SQL command completed successfully.

SET NLS_STRING_UNITS = 'SYSTEM'
DB20000I The SQL command completed successfully.

CREATE INDEX "TEST"."IDX_TEST_1" ON "TEST"."TEST_NEW" ("TS_ACCOUNT_ID" ASC, "BACKEND_GUID" ASC, "CAF_GUID" ASC) COMPRESS NO INCLUDE NULL KEYS ALLOW REVERSE SCANS
DB20000I The SQL command completed successfully.

SET NLS_STRING_UNITS = 'SYSTEM'
DB20000I The SQL command completed successfully.

CREATE INDEX "TEST"."IDX_TEST_3" ON "TEST"."TEST_NEW" ("CAF_GUID" DESC) COMPRESS NO INCLUDE NULL KEYS ALLOW REVERSE SCANS
DB20000I The SQL command completed successfully.

$ db2 "stop hadr on database test"
DB20000I The STOP HADR ON DATABASE command completed successfully.

Standby:

Deactivate the Standby database.
Stop HADR on standby database, so that you can perform the LOAD on primary. This is needed to ensure that standby won’t keep looking for the LOAD copy file.

$ db2 deactivate db test
DB20000I The DEACTIVATE DATABASE command completed successfully.
$ db2 "stop hadr on database test"
DB20000I The STOP HADR ON DATABASE command completed successfully.

Primary:

Declare a cursor to select all from an existing table which is identical to the newly created table in step 1.
Load the data from cursor with load copy image saved to /db2backup/db2inst1.
SCP/SFTP the LOAD copy image from primary to standby to exact same location on Standby server.

$ db2 "declare c1 cursor for select * from test.test"
DB20000I  The SQL command completed successfully.

$ db2 "load from c1 of cursor messages load_test_new.msg insert into test.test_new copy yes to /db2backup/db2inst1"

Number of rows read         = 21137741
Number of rows skipped      = 0
Number of rows loaded       = 21137741
Number of rows rejected     = 0
Number of rows deleted      = 0
Number of rows committed    = 21137741

$ ll /db2backup/db2inst1
total 12551496
drwxr-xr-x. 3 db2inst1 db2iadm1       4096 Sep 24  2016 logs
-rw-r-----. 1 db2inst1 db2iadm1 4577472512 Jan 22 19:41 TEST.4.db2inst1.DBPART000.20180122193803.001

$ scp /db2backup/db2inst1/TEST.4.db2inst1.DBPART000.20180122193803.001 db2inst1@10.0.1.3:/db2backup/db2inst1
db2inst1@10.0.1.3's password: 
TEST.4.db2inst1.DBPART000.20180122193803.001                                                                                                                                  100% 4365MB  63.3MB/s   01:09

Standby:

Start HADR on standby database.

$ db2 "start hadr on database test as standby"
DB20000I  The START HADR ON DATABASE command completed successfully.

Primary:

Start HADR on primary database.
Verify both primary and standby are operating in PEER state.

$ db2 "start hadr on database test as primary"
DB20000I  The START HADR ON DATABASE command completed successfully.

Standby:

Ensure that LOAD copy image was restored successfully on HADR standby database. You should see something similar to below in the db2diag.log.

2018-01-22-19.46.36.045862-480 I65276650E511 LEVEL: Warning
PID : 68079 TID : 46912942106368 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : TEST
APPHDL : 0-27005 APPID: *LOCAL.DB2.180123034621
HOSTNAME: 
EDUID : 1068 EDUNAME: db2agent (TEST) 0
FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:549
DATA #1 : String, 68 bytes
Starting to restore a load copy.
TEST.TEST_NEW.20180122193803

2018-01-22-19.50.59.927064-480 I65277162E468 LEVEL: Warning
PID : 68079 TID : 46912942106368 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : TEST
APPHDL : 0-27005 APPID: *LOCAL.DB2.180123034621
HOSTNAME: 
EDUID : 1068 EDUNAME: db2agent (TEST) 0
FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:1143
MESSAGE : Load copy restore completed successfully.

Takeover HADR on standby database server.
Verify that Takeover command completed successfully and both primary and standby are in PEER state.
Finally verify that the number of rows loaded on primary is equal to the count on the new primary.

$ db2 "takeover hadr on database test"
DB20000I  The TAKEOVER HADR ON DATABASE command completed successfully.
[db2inst1@db2lab_chicago db2inst1]$ db2pd -db test -HADR

Database Member 0 -- Database TEST -- Active -- Up 0 days 00:09:04 -- Date 2018-01-22-19.55.24.838408

                            HADR_ROLE = PRIMARY
                          REPLAY_TYPE = PHYSICAL
                        HADR_SYNCMODE = NEARSYNC
                           STANDBY_ID = 1
                        LOG_STREAM_ID = 0
                           HADR_STATE = PEER
                           HADR_FLAGS = 
                  PRIMARY_MEMBER_HOST = 10.0.1.3
                     PRIMARY_INSTANCE = db2inst1
                       PRIMARY_MEMBER = 0
                  STANDBY_MEMBER_HOST = 10.0.1.4
                     STANDBY_INSTANCE = db2inst1
                       STANDBY_MEMBER = 0
                  HADR_CONNECT_STATUS = CONNECTED
             HADR_CONNECT_STATUS_TIME = 01/22/2018 19:46:31.586983 (1516679191)
          HEARTBEAT_INTERVAL(seconds) = 15
                     HEARTBEAT_MISSED = 0
                   HEARTBEAT_EXPECTED = 35
                HADR_TIMEOUT(seconds) = 60
        TIME_SINCE_LAST_RECV(seconds) = 14
             PEER_WAIT_LIMIT(seconds) = 0
           LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000371
   LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000
                  LOG_HADR_WAIT_COUNT = 1
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
            PRIMARY_LOG_FILE,PAGE,POS = S0189512.LOG, 1050, 14572339324443
            STANDBY_LOG_FILE,PAGE,POS = S0189512.LOG, 1050, 14572339324443
                  HADR_LOG_GAP(bytes) = 0
     STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0189512.LOG, 1050, 14572339324443
       STANDBY_RECV_REPLAY_GAP(bytes) = 0
                     PRIMARY_LOG_TIME = 01/22/2018 19:54:56.000000 (1516679696)
                     STANDBY_LOG_TIME = 01/22/2018 19:54:56.000000 (1516679696)
              STANDBY_REPLAY_LOG_TIME = 01/22/2018 19:54:56.000000 (1516679696)
         STANDBY_RECV_BUF_SIZE(pages) = 16384
             STANDBY_RECV_BUF_PERCENT = 0
           STANDBY_SPOOL_LIMIT(pages) = 0
                STANDBY_SPOOL_PERCENT = NULL
                   STANDBY_ERROR_TIME = NULL
                 PEER_WINDOW(seconds) = 120
                      PEER_WINDOW_END = 01/22/2018 19:57:11.000000 (1516679831)
             READS_ON_STANDBY_ENABLED = N

$ db2 "select count(1) as count from TEST.TEST_NEW with ur"

COUNT      
-----------
   21137741

  1 record(s) selected.

Now you have seen two different ways to replicate a RECOVERABLE LOAD from HADR primary database to the standby database.

Solution 3:

Now lets take a look at the final and most commonly used solution to replicate RECOVERABLE LOAD’s between primary and standby servers. For this solution you need to have a shared file system or NFS (Network file system) between primary and standby servers. Following are the preparation steps that I performed.

On my iMac I used VMWare Fusion pro and built two virtual machines.
These two virtual machines act as a primary and standby servers and are built using thunderbolt SSD storage for fast boot time etc.
I also shared my iMac’s drive “/mnt/hgfs” between two virtual machines, which will be visible from both VM’s.

Lets verify that the shared drive is accessible via both primary and standby, a simple df command should help us determine that

Primary:

$ df -H|grep mnt
.host:/                        501G  139G  362G  28% /mnt/hgfs

Standby:

$ df -H|grep mnt
.host:/                        501G  139G  362G  28% /mnt/hgfs

Create a directory on the shared filesystem from either primary or standby servers. In my case since I am already on primary I chose to do it from primary VM.

Primary:

$ mkdir /mnt/hgfs/raghu/loads
$ cd /mnt/hgfs/raghu/loads

Primary:

$ db2 "declare test_c1 cursor for select * from test.test"
DB20000I  The SQL command completed successfully.

$ db2 "load from test_c1 of cursor messages load_test_new.msg replace into test.test_new copy yes to /mnt/hgfs/raghu/loads"

Number of rows read         = 21137741
Number of rows skipped      = 0
Number of rows loaded       = 21137741
Number of rows rejected     = 0
Number of rows deleted      = 0
Number of rows committed    = 21137741

Now verify that load copy file is accessible from both primary and standby VM’s.

Primary:

$ pwd
/mnt/hgfs/raghu/loads

$ ls -ltr
total 4473328
-rw-r-----. 1 501 games 4577472512 Jan 29 21:29 TEST.4.db2inst1.DBPART000.20180129212636.001
-rw-r--r--. 1 501 games       1112 Jan 29 21:29 load_test_new.msg

Standby:

$ pwd
/mnt/hgfs/raghu/loads

$ ls -ltr
total 4473328
-rw-r-----. 1 501 games 4577472512 Jan 29 21:29 TEST.4.db2inst1.DBPART000.20180129212636.001
-rw-r--r--. 1 501 games       1112 Jan 29 21:29 load_test_new.msg

Now verify that LOAD copy got replicated to the standby. You can look for the following entry in the db2diag.log on standby.

Standby:

2018-01-29-21.29.47.972153-480 I69653877E511         LEVEL: Warning
PID     : 68079                TID : 46912908551936  PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000            DB   : TEST
APPHDL  : 0-27021              APPID: *LOCAL.DB2.180130051853
HOSTNAME: xxxxxxxxxxxx
EDUID   : 1537                 EDUNAME: db2agent (TEST) 0
FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:549
DATA #1 : String, 68 bytes
Starting to restore a load copy.
TEST.TEST_NEW.20180129212636

2018-01-29-21.31.51.788936-480 I69654389E468         LEVEL: Warning
PID     : 68079                TID : 46912908551936  PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000            DB   : TEST
APPHDL  : 0-27021              APPID: *LOCAL.DB2.180130051853
HOSTNAME: xxxxxxxxxxxx
EDUID   : 1537                 EDUNAME: db2agent (TEST) 0
FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:1143
MESSAGE : Load copy restore completed successfully.

This solution is really simple. However you need to remember that all the projects or ETL teams or whoever that run db2 LOAD utility in a HADR environment must utilize the shared or NFS file system as destination for their LOAD COPY files. If not, you are in for a lot of surprises during failover scenarios.

Useful DB2 resources:

Db2 Knowledge Center V10.5

Db2 Knowledge Center V11.1

International Db2 Users Group(IDUG)

Channel Db2

RSS - Posts

Follow me on Twitter

My Tweets

Follows

7 Comments

Ember Crooks on May 31, 2018 at 7:17 pm

Nice blog, and nice level of detail. I like the innovation of the second method for rare scenarios.

I opened an RFE with IBM to try to force the load copy location for the third scenario. Vote for it here:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=83961

If I ever have the third scenario in a database environment, I run a script or monitor to look at the load copy locations and find any that are not correct on an ongoing basis. They slip in if you’re not careful.

I also wrote a blog article on how to identify tables on the standby that were inoperative due to an incorrect load copy location: http://db2commerce.com/2017/10/19/avoiding-inoperative-tables-on-the-hadr-standby/
Reply
- Raghu on May 31, 2018 at 9:09 pm
  
  Hi Ember, thank you for the nice words.
  I voted yes for your RFE and infact I read your blog post.
  
  Also if you don’t mind would you be able to read and test the following tool and let me know what you think should be improved ?
  
  https://www.raghu-on-tech.com/2017/12/19/how-to-deep-compare-two-db2-environments/
  
  I read multiple blog posts of yours on db2commerce.com and they are very informative. Now that I started to blog I can appreciate you even more, as I understand how much time and effort it takes on your part to blog.
  
  Regards,
  Raghu
  Reply
James on November 1, 2018 at 3:32 pm

Hi Raghu,

This is very informative . We had recently enabled HADR for one our environments and using the third process you mentioned. I had a question here : As LOAD utility generates copy image of an loaded data, how long this image will retain on the shared file system? Do we need to have this LOAD copy image forever ? or is there a way in DB2 /LOAD utitliy to delete it after the data has been loaded to secondary database ?

We are using DB2LUW – v 10.5 fp8

Thank you in advance.

J
Reply
- Raghu on November 6, 2018 at 8:16 pm
  
  James,
  
  LOAD copy images should be part of your database recovery strategy. When ever you restore a database and rollforward through transaction logs DB2 looks for these LOAD copy files. If it can’t find them your table will go into drop pending state. So you should keep these files as long as you keep your transaction log files.
  
  You should manually clean them up along with your transaction log files.
  You can even automate this via DB2… take a look at below two database configuration parameters.
  
  AUTO_DEL_REC_OBJ
  REC_HIS_RETENTN
  Reply
Andrew Hilden on March 27, 2019 at 12:42 am

Great Post Raghu. DB2 on Cloud which is the IBM managed version of Db2 on the IBM Cloud currently use a deviation of Option 3 where we leverage Cloud Object Storage for LOAD COPY files as well as transaction logs and backups.
Reply
Abhi on October 28, 2020 at 8:07 pm

Hi Raghu, great article.
I am loading my database and the load copy files are so huge – arond 40GB each.
I am currently doing one load – copying the file to DR – starting HADR and letting it catchup before I do the next. But this process is too slow as I have a lot of loads to go through.
My question is – Can I do multiple loads – copy the load copy file one by one as they catchup.? What will happen if the copy file is missing – will the DR wait for it or crash?
Thanks
Reply
Fredd on December 22, 2020 at 11:08 pm

Hi Raghu,
Good post
I use the third way with NFS. Loading millions of records is fast. However, I have high times when LOAD rebuilds the indexes.
Any suggestions to improve LOAD?
Thanks
Greetings from Ecuador
Reply