Raghu On TechPro tips on managing data at scale
How To Find Redundant/Unused Indexes ?
A redundant index in my book can have one or more leading columns that match with any other existing index leading columns on a given table. As a DBA I worked in many places where this is a reccurring problem. Many DBA’s are hesitant to drop these redundant indexes despite number of disadvantages they bring along with them.
How does these redundant indexes make it into the database in the first place ? I can think of a few situations…
- When DBA’s use db2advis (Be careful with db2advis), it highly leans towards index only access and could recommend a lot of indexes that will not really benefit the query.
- DBA’s might add a new index with out verifying that there is another index which could be modified to suit the new use case.
Why are these indexes a problem ?
- Redundant indexes waste disk space.
- Bigger and much important point is that B-Tree indexes add significant write over head (i.e. inserts/updates/deletes).
- They consume CPU time unnecessarily.
- They will increase the tempspace and transaction log space needed by REORG/LOAD jobs as these redundant indexes will be rebuilt during these processes.
Recently I have been working on a performance tuning effort to make our production environment more efficient and bring down resource utilization. As part of this effort I discovered that our REPLICATION is using tons of CPU, up on digging further I found a lot of redundant/unused indexes causing the high CPU utilization.
Read on to find out more about, how to determine the redundant and unused indexes.
All you need to do is run the below SQL statement to find out indexes that have the first 2 leading columns as the same for any table in the database. You could easily change the SQL’s WHERE clause to look for any number of identical leading columns or filter on a give SCHEMA/TABLE. In our case we started with 5 leading columns and slowly cleaned up redundant indexes by getting more restrictive with colcount.
with t1 as ( select ix.creator, ix.name, strip(strip(ix.tbcreator)||'.'||strip(ix.tbname)) tab, ic.colseq, ic.colname, ic.colorder from sysibm.sysindexes ix,sysibm.sysindexcoluse ic where ix.name=ic.indname and ix.creator=ic.indschema and ic.colseq<3 ), t2 as ( select tab,strip(creator)||'.'||strip(name) ind,listagg(strip(colseq)||'/'||strip(colname)||'/'||strip(colorder), '@') within group(order by colseq) cls from t1 group by tab,creator,name ), t3 as ( select ind, strip(tab) tab, strip(cls) cols from t2 ) select a.ind as ind,b.ind from t3 a, t3 b where a.ind<b.ind and a.cols=b.cols and a.tab=b.tab order by 1,2 with ur;
When you run the above statement it will list all the indexes in a database where the leading 2 columns are the same for any table. You sample output would look something like below, in my database when I originally ran this query it gave me hundreds of indexes that needs to be worked on.
In the above example, you have below options to handle the situation.
- Index on (COLA, COLB, COLC, COLD)
- Index on (COLA, COLB, COLD, COLC)
- Leave the indexes as is, because there is a legitimate use case for them (Which is not often the case).
How you merge these indexes really depends on the workload and how well you know the data/dependency between these columns/attributes. This is where DBA’s experience with the data model comes in handy. Another good practice is to add comments to the indexes so that you can read back on why an index is designed a certain way.
Also a lot of DBA’s are afraid to drop any existing indexes rightfully so, because you could slow down some part of application. But identifying these redundant indexes and cleaning them up by testing the appliation with new indexes will benefit you in the long run just like it did for us.
Similarly you may find unused indexes in the database/schema/table by using variation(s) of below SQL. Below SQL will give you list of all the indexes on table RAGHU.TEST that have not been used in the last 30 days.
select substr(s.indname,1,75) as indname, LASTUSED, t.IID, INDEX_SCANS, INDEX_ONLY_SCANS from table(MON_GET_INDEX('', '', -2)) as t, syscat.indexes as s where t.tabschema=s.tabschema and t.tabname=s.tabname and t.tabname='TEST' and t.tabschema='RAGHU' and t.iid=s.iid and INDEX_SCANS=0 and INDEX_ONLY_SCANS=0 and s.indname not like 'FK%' and s.UNIQUERULE not in ('P','U') and LASTUSED < current date - 30 days group by t.IID, LASTUSED, indname, INDEX_SCANS, INDEX_ONLY_SCANS with ur;
We recently dropped 57 indexes that have not been used in over a year, and reclaimed about 8% CPU on a box that usually runs at 40% CPU which is an enormous gain.
Like any other changes that you perform on your database environment, these changes needs to be tested in the lower environments!!
PS: You could also use tools such as USAGE LISTS and WLM to monitor the database object usage i.e. to capture the SQL statements that are being run against and Index/Table. They will ease your nervousness about what SQL may/may not be using a specific database object.