Raghu On Tech

Pro tips on managing data at scale

DB2 Finding Duplicate Rows

by | Oct 31, 2019 | 0 comments

Problem:

As a DBA, you may get asked by development/other teams to get the list of duplicate rows based on a certain column or a key (non-primary). In this short blog post I am going to discuss a couple of ways to get the list of duplicates.

Solution:

For e.g. lets assume you have a below data set

Now as you can see there are two sets of duplicates. One way to display the duplicates is to use the age old method of using “GROUP BY WITH HAVING”. In the past decade or so WINDOW functions aka OLAP functions became mainstream and lot of folks are using WINDOW functions as well.

db2 "select ID FROM DBA.TEST1 GROUP BY ID HAVING COUNT(ID)>1 WITH UR"

 

db2 "select ID FROM (select ID ,ROW_NUMBER() OVER(PARTITION BY ID) AS ROWNUM FROM DBA.TEST1) WHERE ROWNUM>1 WITH UR"

As you can see both methods produced the desired output and I am sure there are other ways to do it using self joins etc. but these are the two basic ways to pull out the duplicates.

Now lets say you want to delete one of these two duplicate records and you only want to keep the latest of the two based on a timestamp (see dataset below). You could easily accomplish this by using WINDOW/OLAP functions.

Before you delete anything always run a SELECT on the data just to make sure you are deleting the right data set.

db2 "SELECT ID, NAME, ROWTS FROM (select ID, NAME, ROWTS, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ROWTS DESC) AS ROWNUM from DBA.TEST1) WHERE ROWNUM>1"

As you can see from the above output, the older of the two records was selected. Now all you need to do is run a DELETE! Another thing that I always do on my DELETE/UPDATE’s is to run it with “-m” command line option.

db2 -m "DELETE FROM (select ID, NAME, ROWTS, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ROWTS DESC) AS ROWNUM from DBA.TEST1) WHERE ROWNUM>1"

It would be a great investment of your time to learn OLAP functions, if you use SQL in your day to day work. There are a lot of nice online resources or you can start with below IBM knowledge center link.

DB2 OLAP Specification

I am going to post another blog post about WLM and how I took advantage of OLAP functions to perform some really cool reporting.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Shares
Share This

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close