Raghu On Tech
Pro tips on managing data at scaleHow To Perform DB2 Loads in HADR Environment ?

Problem Statement:
You need to run DB2 LOAD utility in a HADR environment, however DB2 LOAD utility only logs certain kinds of transaction log records and its not enough to replicate the loaded data via transaction logs to the HADR Standby(s). If you would like to learn more about what kind of log records that LOAD utility marks in transaction logs, take a look at the following link from IBM Knowledge Center.
Since LOAD operation is not completely logged, one need to understand its implications when running the utility in HADR environments. So how do you handle LOAD’s in a HADR environment ?
I can think of a few solutions to get past this situation. Lets discuss them in detail in the following sections.
Solution 1:
I will start with the easy one first. You simply reinitialize the HADR standby(s) with the backup image of the primary database post completion of LOAD. I am not going to show exact commands for this operation since this is a straight forward solution. Following are the sequence of steps one might perform to reinitialize HADR
- STOP HADR on primary and standby(s).
- Perform LOAD operation on HADR primary database table(s).
- Take an online backup of primary database.
- Restore the backup image on Standby database server(s).
- Start HADR on standby database(s) and primary database in that sequence.
Solution 2:
Here assuming you have the IBM recommended identical primary and standby servers with respect to hardware and software, we will work towards a solution with out making use of a shared or Network File System. In my case, I created a filesystem dedicated to database backups on my primary and standby virtual machines.
df -H | grep -i db2backup
/dev/mapper/db2vg-db2backup 74G 23G 48G 32% /db2backup
Following are the sequence of steps that one might perform to successfully replicate a LOAD operation on primary to Standby servers without the use of shared file system. All the IP addresses and hostnames you see below are fictional or modified.
Primary:
- Created a test table for this exercise, this table structure is identical to an existing table.
- Stop HADR on primary database server. so that you can perform the LOAD on primary. This is needed to ensure that standby won’t keep looking for the LOAD copy file.
$ db2 -tvf guidmap_test_new.ddl CREATE TABLE "TEST"."TEST_NEW" ( "CAF_GUID" BIGINT NOT NULL , "BACKEND_GUID" VARCHAR(200 OCTETS) , "TS_ACCOUNT_ID" BIGINT NOT NULL , "TYPE" INTEGER NOT NULL WITH DEFAULT 0 , "FOLDER" VARCHAR(150 OCTETS) , "PCOUNT" INTEGER WITH DEFAULT 0 , "FCOUNT" INTEGER WITH DEFAULT 0 ) IN "TESTTS" ORGANIZE BY ROW DB20000I The SQL command completed successfully. ALTER TABLE "TEST"."TEST_NEW" VOLATILE CARDINALITY DB20000I The SQL command completed successfully. SET NLS_STRING_UNITS = 'SYSTEM' DB20000I The SQL command completed successfully. CREATE INDEX "TEST"."IDX_TEST_1" ON "TEST"."TEST_NEW" ("TS_ACCOUNT_ID" ASC, "BACKEND_GUID" ASC, "CAF_GUID" ASC) COMPRESS NO INCLUDE NULL KEYS ALLOW REVERSE SCANS DB20000I The SQL command completed successfully. SET NLS_STRING_UNITS = 'SYSTEM' DB20000I The SQL command completed successfully. CREATE INDEX "TEST"."IDX_TEST_3" ON "TEST"."TEST_NEW" ("CAF_GUID" DESC) COMPRESS NO INCLUDE NULL KEYS ALLOW REVERSE SCANS DB20000I The SQL command completed successfully. $ db2 "stop hadr on database test" DB20000I The STOP HADR ON DATABASE command completed successfully.
Standby:
- Deactivate the Standby database.
- Stop HADR on standby database, so that you can perform the LOAD on primary. This is needed to ensure that standby won’t keep looking for the LOAD copy file.
$ db2 deactivate db test DB20000I The DEACTIVATE DATABASE command completed successfully. $ db2 "stop hadr on database test" DB20000I The STOP HADR ON DATABASE command completed successfully.
Primary:
- Declare a cursor to select all from an existing table which is identical to the newly created table in step 1.
- Load the data from cursor with load copy image saved to /db2backup/db2inst1.
- SCP/SFTP the LOAD copy image from primary to standby to exact same location on Standby server.
$ db2 "declare c1 cursor for select * from test.test" DB20000I The SQL command completed successfully. $ db2 "load from c1 of cursor messages load_test_new.msg insert into test.test_new copy yes to /db2backup/db2inst1" Number of rows read = 21137741 Number of rows skipped = 0 Number of rows loaded = 21137741 Number of rows rejected = 0 Number of rows deleted = 0 Number of rows committed = 21137741 $ ll /db2backup/db2inst1 total 12551496 drwxr-xr-x. 3 db2inst1 db2iadm1 4096 Sep 24 2016 logs -rw-r-----. 1 db2inst1 db2iadm1 4577472512 Jan 22 19:41 TEST.4.db2inst1.DBPART000.20180122193803.001 $ scp /db2backup/db2inst1/TEST.4.db2inst1.DBPART000.20180122193803.001 db2inst1@10.0.1.3:/db2backup/db2inst1 db2inst1@10.0.1.3's password: TEST.4.db2inst1.DBPART000.20180122193803.001 100% 4365MB 63.3MB/s 01:09
Standby:
- Start HADR on standby database.
$ db2 "start hadr on database test as standby" DB20000I The START HADR ON DATABASE command completed successfully.
Primary:
- Start HADR on primary database.
- Verify both primary and standby are operating in PEER state.
$ db2 "start hadr on database test as primary" DB20000I The START HADR ON DATABASE command completed successfully.
Standby:
- Ensure that LOAD copy image was restored successfully on HADR standby database. You should see something similar to below in the db2diag.log.
2018-01-22-19.46.36.045862-480 I65276650E511 LEVEL: Warning PID : 68079 TID : 46912942106368 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : TEST APPHDL : 0-27005 APPID: *LOCAL.DB2.180123034621 HOSTNAME: EDUID : 1068 EDUNAME: db2agent (TEST) 0 FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:549 DATA #1 : String, 68 bytes Starting to restore a load copy. TEST.TEST_NEW.20180122193803 2018-01-22-19.50.59.927064-480 I65277162E468 LEVEL: Warning PID : 68079 TID : 46912942106368 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : TEST APPHDL : 0-27005 APPID: *LOCAL.DB2.180123034621 HOSTNAME: EDUID : 1068 EDUNAME: db2agent (TEST) 0 FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:1143 MESSAGE : Load copy restore completed successfully.
- Takeover HADR on standby database server.
- Verify that Takeover command completed successfully and both primary and standby are in PEER state.
- Finally verify that the number of rows loaded on primary is equal to the count on the new primary.
$ db2 "takeover hadr on database test" DB20000I The TAKEOVER HADR ON DATABASE command completed successfully. [db2inst1@db2lab_chicago db2inst1]$ db2pd -db test -HADR Database Member 0 -- Database TEST -- Active -- Up 0 days 00:09:04 -- Date 2018-01-22-19.55.24.838408 HADR_ROLE = PRIMARY REPLAY_TYPE = PHYSICAL HADR_SYNCMODE = NEARSYNC STANDBY_ID = 1 LOG_STREAM_ID = 0 HADR_STATE = PEER HADR_FLAGS = PRIMARY_MEMBER_HOST = 10.0.1.3 PRIMARY_INSTANCE = db2inst1 PRIMARY_MEMBER = 0 STANDBY_MEMBER_HOST = 10.0.1.4 STANDBY_INSTANCE = db2inst1 STANDBY_MEMBER = 0 HADR_CONNECT_STATUS = CONNECTED HADR_CONNECT_STATUS_TIME = 01/22/2018 19:46:31.586983 (1516679191) HEARTBEAT_INTERVAL(seconds) = 15 HEARTBEAT_MISSED = 0 HEARTBEAT_EXPECTED = 35 HADR_TIMEOUT(seconds) = 60 TIME_SINCE_LAST_RECV(seconds) = 14 PEER_WAIT_LIMIT(seconds) = 0 LOG_HADR_WAIT_CUR(seconds) = 0.000 LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000371 LOG_HADR_WAIT_ACCUMULATED(seconds) = 0.000 LOG_HADR_WAIT_COUNT = 1 SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384 SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380 PRIMARY_LOG_FILE,PAGE,POS = S0189512.LOG, 1050, 14572339324443 STANDBY_LOG_FILE,PAGE,POS = S0189512.LOG, 1050, 14572339324443 HADR_LOG_GAP(bytes) = 0 STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0189512.LOG, 1050, 14572339324443 STANDBY_RECV_REPLAY_GAP(bytes) = 0 PRIMARY_LOG_TIME = 01/22/2018 19:54:56.000000 (1516679696) STANDBY_LOG_TIME = 01/22/2018 19:54:56.000000 (1516679696) STANDBY_REPLAY_LOG_TIME = 01/22/2018 19:54:56.000000 (1516679696) STANDBY_RECV_BUF_SIZE(pages) = 16384 STANDBY_RECV_BUF_PERCENT = 0 STANDBY_SPOOL_LIMIT(pages) = 0 STANDBY_SPOOL_PERCENT = NULL STANDBY_ERROR_TIME = NULL PEER_WINDOW(seconds) = 120 PEER_WINDOW_END = 01/22/2018 19:57:11.000000 (1516679831) READS_ON_STANDBY_ENABLED = N $ db2 "select count(1) as count from TEST.TEST_NEW with ur" COUNT ----------- 21137741 1 record(s) selected.
Now you have seen two different ways to replicate a RECOVERABLE LOAD from HADR primary database to the standby database.
Solution 3:
Now lets take a look at the final and most commonly used solution to replicate RECOVERABLE LOAD’s between primary and standby servers. For this solution you need to have a shared file system or NFS (Network file system) between primary and standby servers. Following are the preparation steps that I performed.
- On my iMac I used VMWare Fusion pro and built two virtual machines.
- These two virtual machines act as a primary and standby servers and are built using thunderbolt SSD storage for fast boot time etc.
- I also shared my iMac’s drive “/mnt/hgfs” between two virtual machines, which will be visible from both VM’s.
Lets verify that the shared drive is accessible via both primary and standby, a simple df command should help us determine that
Primary:
$ df -H|grep mnt .host:/ 501G 139G 362G 28% /mnt/hgfs
Standby:
$ df -H|grep mnt .host:/ 501G 139G 362G 28% /mnt/hgfs
Create a directory on the shared filesystem from either primary or standby servers. In my case since I am already on primary I chose to do it from primary VM.
Primary:
$ mkdir /mnt/hgfs/raghu/loads $ cd /mnt/hgfs/raghu/loads
Primary:
$ db2 "declare test_c1 cursor for select * from test.test" DB20000I The SQL command completed successfully. $ db2 "load from test_c1 of cursor messages load_test_new.msg replace into test.test_new copy yes to /mnt/hgfs/raghu/loads" Number of rows read = 21137741 Number of rows skipped = 0 Number of rows loaded = 21137741 Number of rows rejected = 0 Number of rows deleted = 0 Number of rows committed = 21137741
Now verify that load copy file is accessible from both primary and standby VM’s.
Primary:
$ pwd /mnt/hgfs/raghu/loads $ ls -ltr total 4473328 -rw-r-----. 1 501 games 4577472512 Jan 29 21:29 TEST.4.db2inst1.DBPART000.20180129212636.001 -rw-r--r--. 1 501 games 1112 Jan 29 21:29 load_test_new.msg
Standby:
$ pwd /mnt/hgfs/raghu/loads $ ls -ltr total 4473328 -rw-r-----. 1 501 games 4577472512 Jan 29 21:29 TEST.4.db2inst1.DBPART000.20180129212636.001 -rw-r--r--. 1 501 games 1112 Jan 29 21:29 load_test_new.msg
Now verify that LOAD copy got replicated to the standby. You can look for the following entry in the db2diag.log on standby.
Standby:
2018-01-29-21.29.47.972153-480 I69653877E511 LEVEL: Warning PID : 68079 TID : 46912908551936 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : TEST APPHDL : 0-27021 APPID: *LOCAL.DB2.180130051853 HOSTNAME: xxxxxxxxxxxx EDUID : 1537 EDUNAME: db2agent (TEST) 0 FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:549 DATA #1 : String, 68 bytes Starting to restore a load copy. TEST.TEST_NEW.20180129212636 2018-01-29-21.31.51.788936-480 I69654389E468 LEVEL: Warning PID : 68079 TID : 46912908551936 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : TEST APPHDL : 0-27021 APPID: *LOCAL.DB2.180130051853 HOSTNAME: xxxxxxxxxxxx EDUID : 1537 EDUNAME: db2agent (TEST) 0 FUNCTION: DB2 UDB, database utilities, sqludcpy, probe:1143 MESSAGE : Load copy restore completed successfully.
This solution is really simple. However you need to remember that all the projects or ETL teams or whoever that run db2 LOAD utility in a HADR environment must utilize the shared or NFS file system as destination for their LOAD COPY files. If not, you are in for a lot of surprises during failover scenarios.
7 Comments
Trackbacks/Pingbacks
- DB2 Load From Cursor - Raghu On Tech - […] How To Perform DB2 Loads in HADR Environment ? […]
Nice blog, and nice level of detail. I like the innovation of the second method for rare scenarios.
I opened an RFE with IBM to try to force the load copy location for the third scenario. Vote for it here:
http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=83961
If I ever have the third scenario in a database environment, I run a script or monitor to look at the load copy locations and find any that are not correct on an ongoing basis. They slip in if you’re not careful.
I also wrote a blog article on how to identify tables on the standby that were inoperative due to an incorrect load copy location: http://db2commerce.com/2017/10/19/avoiding-inoperative-tables-on-the-hadr-standby/
Hi Ember, thank you for the nice words.
I voted yes for your RFE and infact I read your blog post.
Also if you don’t mind would you be able to read and test the following tool and let me know what you think should be improved ?
https://www.raghu-on-tech.com/2017/12/19/how-to-deep-compare-two-db2-environments/
I read multiple blog posts of yours on db2commerce.com and they are very informative. Now that I started to blog I can appreciate you even more, as I understand how much time and effort it takes on your part to blog.
Regards,
Raghu
Hi Raghu,
This is very informative . We had recently enabled HADR for one our environments and using the third process you mentioned. I had a question here : As LOAD utility generates copy image of an loaded data, how long this image will retain on the shared file system? Do we need to have this LOAD copy image forever ? or is there a way in DB2 /LOAD utitliy to delete it after the data has been loaded to secondary database ?
We are using DB2LUW – v 10.5 fp8
Thank you in advance.
J
James,
LOAD copy images should be part of your database recovery strategy. When ever you restore a database and rollforward through transaction logs DB2 looks for these LOAD copy files. If it can’t find them your table will go into drop pending state. So you should keep these files as long as you keep your transaction log files.
You should manually clean them up along with your transaction log files.
You can even automate this via DB2… take a look at below two database configuration parameters.
AUTO_DEL_REC_OBJ
REC_HIS_RETENTN
Great Post Raghu. DB2 on Cloud which is the IBM managed version of Db2 on the IBM Cloud currently use a deviation of Option 3 where we leverage Cloud Object Storage for LOAD COPY files as well as transaction logs and backups.
Hi Raghu, great article.
I am loading my database and the load copy files are so huge – arond 40GB each.
I am currently doing one load – copying the file to DR – starting HADR and letting it catchup before I do the next. But this process is too slow as I have a lot of loads to go through.
My question is – Can I do multiple loads – copy the load copy file one by one as they catchup.? What will happen if the copy file is missing – will the DR wait for it or crash?
Thanks
Hi Raghu,
Good post
I use the third way with NFS. Loading millions of records is fast. However, I have high times when LOAD rebuilds the indexes.
Any suggestions to improve LOAD?
Thanks
Greetings from Ecuador