Raghu On Tech
Pro tips on managing data at scaleDB2 pureScale Cluster Slowness During Index Creation

Problem:
I work at a place where performance of our services are closely monitored. This is not the case with most other places that I worked at in the past, at least not from application stand point. So, our pureScale cluster have 5 databases whenever we create an index on any large table in ANY of the databases on the cluster we observed below symptoms.
- IO response times for the tablespaces spiked to about 250-350 ms from about 2-5 ms under normal operating conditions.
- Even when we create an index on one database we observed slowness on all other databases across the cluster.
- Application teams constantly observed slowness on their side whenever we create an index.
Analysis:
Like any DBA, I started with checking all the database/instance related wait metrics. Note that this is not a table level locking issue because we are seeing slowness across the cluster. Some of the things we checked are below
- Stress on the agent pool even though this did not make sense to me because each member on the cluster has its own pool and Index creation is only happening on one member. When you are trying to solve a problem you tend to look at everything even though it does not make some sense some times.
- CPU, Memory and IO related metrics across the cluster.
- Db2 memory pools.
- Load on the CF (Cluster Caching facility) and CF wait times. Apparently there has been CF communication during the index creation more on this in another blog post.
Then we checked with our system admins if they found anything fishy on their end, but nothing from their end as well. Then after taking a step back and thinking about the issue it almost appeared like a threading or blocking issue on a common component of pureScale cluster. My mind immediately started to think about GPFS but like any other DBA I have limited understanding of GPFS internals but enough high level understanding to look at the right places.
One of my favorite GPFS commands is mmfsadm. Below is one of the developerworks articles that I referred to while troubleshooting this issue.
GPFS Tuning Parameter — Developer Works
While I checked a lot of other things on GPFS, below command is the one that helped me get to the root cause of the problem. While creating an index I ran the below command from the member that’s creating the index.
# while : do mmfsadm dump mb | grep Worker1 sleep 2 done
The output from the above command looked like below. BINGO!!!
Above output is interesting for a couple of reasons. GPFS is not using all the available worker1Threads, there are a 512 total threads available but only 248 have been used, WHY ? Secondly all those IO requests waiting to get hold of a worker1Thread to service their request.
First of all, worker1Threads services IO requests for GPFS. As you can see the DB2 index creation algorithm is highly multithreaded process. We did not see this kind of behavior during our other IO intensive operations such as Backups/Restores or Loads. Interesting thing is even during Index rebuilds of Loads and Reorgs we did not observe this behavior. So apparently algorithm used by “CREATE INDEX” statement is different compared to “INDEX REBUILDS” during Loads and Reorgs.
IBM’s GPFS resources mentioned that GPFS or Spectrum Scale does not have enough pagepoolsize to create additional worker1Threads. By default DB2 pureScale installation configures pagepool to 512 MB and worker1Threads to 512, however each worker1Thread needs approximately 2.05 MB of pagepool memory to operate.
What is pagepool in GPFS anyway ?
A pagepool is a file data and meta-data cache for GPFS. Traditional non-distributed file systems use operating system file cache to perform these kind of operations. Important thing to note is pagepool is a pinned memory on the kernel that means once its allocated there is no getting back for any other applications, so that could be the reason why IBM went a little conservative on that but it still does not explain the imbalance we are seeing for the ratio of pagepool/worker1Threads.
Easy way to check your existing GPFS configuration is by using mmdiag command. By running mmdiag command like below you should be able to see all the currently active GPFS settings. Any setting that has a prefix of ! means that setting has been changed from default. It important to save any changes that you may make to these settings so that they can be applied after the GPFS upgrades.
Resolution:
Updated the pagepool and worker1Threads to 3 GB and 1495 to resolve the bottleneck on GPFS. We did not see any application or DB2 slowness once the changes were made. Changes were made on each member and CF on the cluster and recycled one GPFS node at a time for these changes to take effect.
Important thing to note here is that pagepool setting can be changed online, however worker1Threads paramter can only be changed online while decreasing it but not when increasing it.
We used mmchconfig to update the GPFS configuration parameters and then shutdown DB2 on that member, stopped GPFS daemon on that node, started GPFS daemon on that node and then started DB2 member/CF backup in that sequence. Repeated this for all the members and CF.
You should be able to validate the current GPFS settings in effect by running below command.
Conclusion:
So, if you are running a pureScale environment you may want to check these settings. Most environments don’t even notice when they observe 2-5 minute slowness in the middle of the night but my environment is not one of them and here we measure and troubleshoot every small dip in performance. Hope this blog post helps someone.
The GPFS requirement for memory depends on the filesystem block size and the number of worker threads. One limitation is that you cannot use more than 1/2 of the pagepool for kernel thread IO, it is also needed to cached data blocks. So if you have 1000 threads and a 1MB block size you would need 1000 * 1MB * 2 = 2Gig to have the space in the pagepool to be able to have 1000 threads doing IO.
Thank you for the response @Jim. I should have mentioned the dependency of pagepool size on the GPFS block size in my post. Our’s is DB2’s default 1 MB. We have not played around with different blocksizes to see how they affect performance.
It seems to me though its worth trying smaller block sizes as some articles suggest they tend to work better with OLTP workloads and pureScale is designed for OLTP systems.
I appreciate if you have any comments regarding the blocksize for the DB2 OLTP systems or even better if IBM DB2 and GPFS teams come up with a good default GPFS block size for DB2 pureScale.
Hard part is though, its requires a rebuild of the whole filesystem to change the block size once GPFS is created.
I wouldn’t recommend going lower than 256KB which is the default GPFS blocksize. The GPFS wiki ttps://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General Parallel File System (GPFS)/page/IBM Spectrum Scale Wiki is a good reference