Raghu On TechPro tips on managing data at scale
Db2 SQL And SQL Tuning
Its been a couple of months since I posted a blog, I have been busy with holiday season and preparing for my Db2 Performance Tips And Tricks presentation on Scott Hayes (Data View Show). I promise to be more regular and consistent in 2020. In this blog post I will talk about SQL language and techniques to performance tune SQL.
What is SQL ?
- SQL is the language of RDBMS, but its not a complete programming language. It abstracts out lots of complexities of regular programming language, that’s why tuning might get challenging.
- Its declarative (what) but not imperative (how).
- E.g. of Declarative: SELECT ID, NAME FROM TAB1 A, TAB2 B WHERE A.ID=B.ID AND A.NAME=‘Anvith’
- E.g. of Imperative: Write code to read file A, read file B, filter file A on 3rd field for Name=‘Anvith’, build B-Tree indexes on file A and File B on ID and Name, then JOIN them on ID using a Nested Loop, SORT the results using advanced Radix sort then give me the results.
You may use some of the below techniques to optimize a slow performing query.
- Runstats help optimizer determine how to get data fast by using well researched data structures and algorithms that are part of Db2 engine. So making sure the statistics are current is vital for the optimizer to generate optimal access plan for a given query.
- Apart from Runstats, there are other ways to help optimizer by writing better SQL.
- Remove unnecessary joins and columns not needed by the application. Hibernate, Cognos and other tools are known to cause issues in this area. Every additional join (especially joins with complex predicates) can introduce additional complexity to the optimizer.
- Look for objects that are causing the most Logical IO in the plan, try to fix them.
- Watch out for cardinality under/over estimations at various operators.
- If there is too much under/over estimation of cardinalities, try to fix them by making sure Runstats are current. You may use db2caem to troubleshoot the cardinality misestimations.
- Cardinality underestimations could cause large intermediary result sets be part of INNER side of the HASH join. This could cause the INNER HASH table to spill to disk, causing the query to perform miserably. When a HASH spills to disk one could argue they perform worse than a NESTED LOOP.
- If the cardinalities are messed up on a JOIN operator explore the use of Column Group Statistics (CGS). CGS made wonders for us and improved access plans for a number of queries. Below blog post will talk about CGS in detail.
- Explore options such as moving predicates closer to the table, when you have a large SQL. I have seen them make a difference in the past.
- Explore options such as using CGTT to denormalize some of the joins to help the optimizer.
- If nothing appears to work, look at using OPTGUIDELINES to influence the optimizer.
- Make sure to run explain on the SQL the way its being run from the application, i.e. if the query is using parameter markers then explain the query with parameter markers with out substituting them with literal values. If the data is skewed or non-uniform the plans generated by both the methods will be significantly different.
- Use literal values inplace of parameter markers if the predicates used by the query have skewed distribution of data for better access plan generation and optimization.
How Does A Good Plan Look ?
This is a question I get a lot from fellow DBA’s and after years of tuning SQL these are some of the things that I learned.
- To me, a good plan should get the data you requested by accessing the least number of rows possible therefore reducing the logical IO.
- Always tune to reduce logical IO, no one can guarantee a data/index page to be in the buffer pools 100% of the time.
- When you reduce the IO, CPU and memory will follow. This approach gave me the most success while performance tuning.
Ways To Reduce Logical IO:
- Table scans will cause a lot of IO. Especially when these scans reside on the INNER side of the nested loop Joins.
- Look out for any inefficient indexes in the plan.
- Ensure that the tables are being joined in the right order.
- Unnecessary sorting could cause IO.
You could even look at objects that are responsible for most IO in the database by querying MON_GET_TABLESPACE, MON_GET_TABLE, MON_GET_INDEX. Then you could use MON_GET_PKG_CACHE_STMT table function to find queries from the package cache that are referencing the table or use USAGE LISTS to capture the offending queries.