Raghu On Tech

Pro tips on managing data at scale

How To Deep Compare Two Db2 Environments?

by | Dec 19, 2017 | 19 comments

What is DB2 Deep Compare ?

Have you ever tried to compare DDL between two schemas or databases and had to deal with the tedious task of text processing to figure out the differences ? Also did you ever run into a situation where you need to compare two tables and its depending objects i.e. its columns, column order, data types, primary keys, foreign keys, tablespaces and bufferpools etc. If so have you tried to solve this problem with SQL and someone had to bring a fire extinguisher to stop your hair from catching fire ? If so, please read on… A picture is worth more than a 1000 words, so take a look at the below data tree and how would you go about comparing two similar data trees one from source and another from target ?
In the interest of readers time, before discussing the hoops that I had to jump through to arrive at a solution, I am going to show how to run Db2Diff and compare two DB2 environments. Remember you can even compare two different schemas with in the same database. EXPAND/COLLAPSE → What does Db2Diff compare anyway ?
It compares a whole lot! This initial developed version compares the following

Operating System:

  • OS NAME
  • KERNEL VERSION
  • TOTAL CPU’s
  • CONFIGURED CPU’s
  • TOTAL MEMORY
  • OS VERSION
  • ARCHITECTURE TYPE

Registry Variables:

  • MISSING/UNSET Registry Variables between the two environments
  • Registry Variables set to different values

Instance Environment:

  • INSTANCE NAME
  • RELEASE NUMBER
  • SERVICE LEVEL
  • BUILD LEVEL
  • PROGRAM TEMPORARY FIX (PTF)
  • FIXPACK NUMBER
  • DB2 VERSION

DBM Configuration (DBM CFG):

  • COMPARE DBM CFG Parameters when they were not set to AUTOMATIC and reports mismatches
  • When either one of the DBM CFG is set to AUTOMATIC, reports only when the other one is not set to AUTOMATIC

DB Configuration (DB CFG):

  • COMPARE DB CFG Parameters when they were not set to AUTOMATIC and reports mismatches
  • When either one of the DB CFG is set to AUTOMATIC, reports only when the other one is not set to AUTOMATIC

Database Objects:

  • As of now only tables and all its dependent objects such as PK’s, FK’s, Indexes, Tablespaces, bufferpools and their properties are being compared. You can view high level things that are being compared in the image shown above.

How do you run Db2Diff ?

  1. Determine the version of your java run time environment, remember you can run this utility from linux, unix and windows anywhere as long as you have java installed. This utility comes prepackaged with all the libraries you need. You can run the following command to determine the version of your JRE.
java -version
java version "1.6.0_40"
OpenJDK Runtime Environment (IcedTea6 1.13.12) (rhel-1.13.12.6.el6_8-x86_64)
OpenJDK 64-Bit Server VM (build 23.40-b40, mixed mode)

2. Download the respective db2utils.jar and db2diff.prop files depending on the version of your JRE from below link.

Download db2utils.jar

3. Update the properties file (db2diff.prop) with proper connection information for your source and target databases. If you want to compare the whole databases between the two environments then simply leave srcschema and trgschema properties empty. If you just want to compare just single schema (schemas can be different e.g. srcschema=”PROD” and trgschema=”TEST”) populate srcschema and trgschema respectively.

4. Run the downloaded jar file as follows, remember to keep db2diff.prop in the same directory as the db2utils.jar file. It will display a clean and formatted output of all related database objects for the tables as shown below.

5. It is important to remember in the command “java -cp db2utils-1.x.jar Db2Diff”, Db2Diff is case sensitive. It is the name of the  java class that gets called up on running the command. This utility should work on all DB2 versions above V10.1.

6. If you run this utility for comparing just one schema, environment comparison will be bypassed i.e. Db2Diff will not compare    Hardware Info, Registry Variables, DBM CFG, DB CFG.

 
[U:\Desktop\db2utils]java -cp db2utils-1.7.jar Db2Diff
Connected Successfully to jdbc:db2://server1:12345/sample:user=dbuser
Connected Successfully to jdbc:db2://server2:12345/test:user=dbuser
COMPARING SYSTEM CONFIGURATION BETWEEN SOURCE AND TARGET:
        Comparing HARDWARE & OS Configurations:
                No Discrepancies
        Comparing REGISTRY VARIABLE Configuration:
                MISSING/UNSET REGISTRY VARIABLE IN TARGET -->                             "DB2DBDFT", SOURCE set to "SAMPLE"
        Comparing DATABASE MANAGER Configuration:
                DBM CFG Parameter "NODETYPE" MISMATCH -->                                 SOURCE "DPF_SERVER" TARGET "DSF_SERVER"
                DBM CFG Parameter "SVCENAME" MISMATCH -->                                 SOURCE "50010" TARGET "60008"
                DBM CFG Parameter "SPM_NAME" MISMATCH -->                                 SOURCE "xxxxxxx" TARGET "yyyyyyy"
                DBM CFG Parameter "CUR_EFF_CODE_LVL" MISMATCH -->                         SOURCE "V:10 R:5 M:0 F:8 I:0 SB:0" TARGET "V:10 R:5 M:0 F:6 I:0 SB:0"
                DBM CFG Parameter "CUR_EFF_ARCH_LVL" MISMATCH -->                         SOURCE "V:10 R:5 M:0 F:8 I:0 SB:0" TARGET "V:10 R:5 M:0 F:6 I:0 SB:0"
        Comparing DATABASE Configuration:
                DB CFG Parameter "SORTHEAP" AUTOMATIC FLAG MISMATCH -->                   SOURCE "NONE" TARGET "AUTOMATIC"
                DB CFG Parameter "LOCKLIST" AUTOMATIC FLAG MISMATCH -->                   SOURCE "NONE" TARGET "AUTOMATIC"
                DB CFG Parameter "PCKCACHESZ" AUTOMATIC FLAG MISMATCH -->                 SOURCE "NONE" TARGET "AUTOMATIC"
                DB CFG Parameter "SHEAPTHRES_SHR" AUTOMATIC FLAG MISMATCH -->             SOURCE "NONE" TARGET "AUTOMATIC"
                DB CFG Parameter "CATALOGCACHE_SZ" MISMATCH -->                           SOURCE "-1" TARGET "300"
                DB CFG Parameter "LOGPRIMARY" MISMATCH -->                                SOURCE "3" TARGET "13"
                DB CFG Parameter "PAGESIZE" MISMATCH -->                                  SOURCE "8192" TARGET "4096"
                DB CFG Parameter "MAXLOCKS" AUTOMATIC FLAG MISMATCH -->                   SOURCE "NONE" TARGET "AUTOMATIC"
                DB CFG Parameter "CHNGPGS_THRESH" MISMATCH -->                            SOURCE "60" TARGET "80"
                DB CFG Parameter "LOGSECOND" MISMATCH -->                                 SOURCE "10" TARGET "12"
                DB CFG Parameter "LOGBUFSZ" MISMATCH -->                                  SOURCE "256" TARGET "2149"
                DB CFG Parameter "SELF_TUNING_MEM" MISMATCH -->                           SOURCE "OFF" TARGET "ON (Active)"
                DB CFG Parameter "DB_SEED" MISMATCH -->                                   SOURCE "3802569814" TARGET "3957787418"
                DB CFG Parameter "LOGFILSIZ" MISMATCH -->                                 SOURCE "1000" TARGET "1024"

COMPARING SCHEMA(S) BETWEEN SOURCE AND TARGET:
        Db2Diff is now comparing the source schema TEST with target schema TEST
        Following table(s) do not exist in target schema TEST
                MISSING TABLE -->                                                         CUSTMER

        Comparing table ACCOUNT between source and target
                WARNING SYSTEM GENERATED PRIMARY KEY NAMES DIFFER -->                     SOURCE "SQL170901163232470" TARGET "SQL171220162026040"
                MISMATCH PRIMARY KEY COLUMN SEQUENCE -->                                  SOURCE "ACCTID" POSITION 1 TARGET "ACCTID" POSITION 2
                MISMATCH PRIMARY KEY COLUMN SEQUENCE -->                                  SOURCE "NAME" POSITION 2 TARGET "NAME" POSITION 1
                DATA TS BUFFERPOOL "NPAGES" MISMATCH -->                                  SOURCE "BP_16K" -2 TARGET BP_16K 10000
                DATA TS BUFFERPOOL NUMBER OF BLOCK PAGES MISMATCH -->                     SOURCE "BP_16K" 0 TARGET BP_16K 1000
                DATA TS BUFFERPOOL "BLOCKSIZE" MISMATCH -->                               SOURCE "BP_16K" 0 TARGET BP_16K 50

        Comparing table CHECKACCT between source and target
                WARNING SYSTEM GENERATED PRIMARY KEY NAMES DIFFER -->                     SOURCE "SQL170901163233010" TARGET "SQL171220162026080"
                MISMATCH FOREIGN KEY CHECKACCT_FK's COLUMN "NAME" SEQUENCE -->            SOURCE POSITION 2 TARGET POSITION 1
                MISMATCH FOREIGN KEY CHECKACCT_FK's COLUMN "ACCTID" SEQUENCE -->          SOURCE POSITION 1 TARGET POSITION 2
                DATA TS BUFFERPOOL "NPAGES" MISMATCH -->                                  SOURCE "BP_16K" -2 TARGET BP_16K 10000
                DATA TS BUFFERPOOL NUMBER OF BLOCK PAGES MISMATCH -->                     SOURCE "BP_16K" 0 TARGET BP_16K 1000
                DATA TS BUFFERPOOL "BLOCKSIZE" MISMATCH -->                               SOURCE "BP_16K" 0 TARGET BP_16K 50

        Comparing table CHECKBOOK between source and target
                WARNING SYSTEM GENERATED PRIMARY KEY NAMES DIFFER -->                     SOURCE "SQL170901163233370" TARGET "SQL171220162026120"
                DATA TS BUFFERPOOL "NPAGES" MISMATCH -->                                  SOURCE "BP_16K" -2 TARGET BP_16K 10000
                DATA TS BUFFERPOOL NUMBER OF BLOCK PAGES MISMATCH -->                     SOURCE "BP_16K" 0 TARGET BP_16K 1000
                DATA TS BUFFERPOOL "BLOCKSIZE" MISMATCH -->                               SOURCE "BP_16K" 0 TARGET BP_16K 50

        Comparing table CUSTOMER between source and target
                TABLE "COLCOUNT" MISMATCH -->                                             SOURCE 2 TARGET 3
                TABLE "COMPRESSION" MISMATCH -->                                          SOURCE R TARGET N
                TABLE "DATACAPTURE"  MISMATCH -->                                         SOURCE Y TARGET N
                MISMATCH COLUMN ORDER BETWEEN SOURCE AND TARGET -->                       "NAME" SOURCE POSITION 1 TARGET POSITION 0
                MISMATCH COLUMN ORDER BETWEEN SOURCE AND TARGET -->                       "ID" SOURCE POSITION 0 TARGET POSITION 1
                INDEX TS "PAGESIZE" MISMATCH -->                                          SOURCE USERSPACE1 8192 TARGET MYTBSP 4096
                INDEX TS BUFFERPOOL "NPAGES" MISMATCH -->                                 SOURCE "IBMDEFAULTBP" 1000 TARGET IBMDEFAULTBP -2
                INDEX TS BUFFERPOOL "PAGESIZE" MISMATCH -->                               SOURCE "IBMDEFAULTBP" 8192 TARGET IBMDEFAULTBP 4096
                DATA TS "PAGESIZE" MISMATCH -->                                           SOURCE USERSPACE1 8192 TARGET MYTBSP 4096
                DATA TS BUFFERPOOL "NPAGES" MISMATCH -->                                  SOURCE "IBMDEFAULTBP" 1000 TARGET IBMDEFAULTBP -2
                DATA TS BUFFERPOOL "PAGESIZE" MISMATCH -->                                SOURCE "IBMDEFAULTBP" 8192 TARGET IBMDEFAULTBP 4096

        Comparing table SALARY between source and target
                TABLE INDEX TS NAME MISMATCH -->                                          SOURCE INDEX_TS TARGET INDEX_TS1
                MISMATCH COLUMN "NAME" DATATYPE -->                                       SOURCE VARCHAR(30,0) TARGET VARCHAR(25,0)
                MISMATCH COLUMN "EMP_DESC" CLOB(102400,0) INLINE LENGTH -->               SOURCE 6000 TARGET 140
                MISMATCH COLUMN "BONUS" NULLABILITY -->                                   SOURCE Y TARGET N
                INDEX "SALARY_IDX1" PROPERTY UNIQUERULE MISMATCH -->                      SOURCE U TARGET  D
                INDEX "SALARY_IDX1" PROPERTY COMPRESSION MISMATCH -->                     SOURCE N TARGET  Y
                INDEX TS "PAGESIZE" MISMATCH -->                                          SOURCE INDEX_TS 8192 TARGET INDEX_TS1 4096
                INDEX TS BUFFERPOOL "NPAGES" MISMATCH -->                                 SOURCE "IBMDEFAULTBP" 1000 TARGET IBMDEFAULTBP -2
                INDEX TS BUFFERPOOL "PAGESIZE" MISMATCH -->                               SOURCE "IBMDEFAULTBP" 8192 TARGET IBMDEFAULTBP 4096
                DATA TS "PAGESIZE" MISMATCH -->                                           SOURCE USERSPACE1 8192 TARGET USERSPACE1 4096
                DATA TS BUFFERPOOL "NPAGES" MISMATCH -->                                  SOURCE "IBMDEFAULTBP" 1000 TARGET IBMDEFAULTBP -2
                DATA TS BUFFERPOOL "PAGESIZE" MISMATCH -->                                SOURCE "IBMDEFAULTBP" 8192 TARGET IBMDEFAULTBP 4096

        Comparing table VACATION between source and target
                DATA TS BUFFERPOOL "NPAGES" MISMATCH -->                                  SOURCE "BP_16K" -2 TARGET BP_16K 10000
                DATA TS BUFFERPOOL NUMBER OF BLOCK PAGES MISMATCH -->                     SOURCE "BP_16K" 0 TARGET BP_16K 1000
                DATA TS BUFFERPOOL "BLOCKSIZE" MISMATCH -->                               SOURCE "BP_16K" 0 TARGET BP_16K 50

My Journey Towards The Solution:

I tried multiple things before getting to the point of building an all in one jar file. Please read on if you would like to know a little bit more about the journey…

Method 1 (SQL):

Like any normal DBA I tried to solve this problem using export, import, custom SQL with multiple joins etc. and ended up with a horribly performing SQL. I can not run a nasty SQL against a database in good conscience. On top of poor performing SQL, for using native SQL capabilities I had to do some data movement from one database to another to compare the data between two databases. At this point I simply came to an understanding that using SQL to solve this problem is a bad idea. Even using federation is not flexible.

Method 2 (Perl):

Next, I turned to my favorite scripting language perl to solve this problem and I must admit things were looking really good with perl. As perl DBI::DBD drivers let you handle connections to any number of databases and you can fetch the data from the SQL queries in the form of different data structures or fabricate the data structures from the data to suit your need. Although I made significant progress with perl towards reaching a solution, few things bothered me with this approach. I have few people who will use this tool other than myself. These users may run this utility from different environments and for this utility to work with perl, the user needs to configure perl DB2 driver which can take a little bit of time and work. What if I need to run this utility on a windows environment ? Not all windows boxes have active perl installed. Perl was initially developed with Unix flavors in mind. Perl is outstanding to do something quick and dirty, but when the complexity of the programs grow and as the number of lines of code increases it can bring in some maintenance difficulties (especially when there are a lot of interdependent data structures). I learned it first hand by looking over what I wrote. You can take a look at the perl code that I wrote below to solve this problem and understand what I am trying to say.
#!/usr/bin/perl
# Compare two schemas or tables for names, types and length.
# Author                        Version         Comments
# Raghu.Cherukuru               v1.0            Initial version
# Raghu.Cherukuru               v1.1            Made this program strict pragma compatible
# Raghu.Cherukuru               v1.2            Version prior to DBI changes
# Raghu.Cherukuru               v1.3            Added DBI code to build source and target data structures for comparision
# Raghu.Cherukuru               v1.4            Moved variables to right section. E.g. DBI variables and also made the password user enterable
# Raghu.Cherukuru               v1.5	        Making this script more functional by putting code in functions, good for making modules later
# commenting out all the say statements for successful returns or matches. Turn on for debugging

use v5.10;
use strict;

use DBI;

# Script variables
my $srctabhash_ref;
my $srctab_name;
my $trgtabhash_ref;
my $trgtab_name;
my $srccol_name;
my $trgcol_name;
my %srctabs_lookup=();
my %trgtabs_lookup=();
my %srccol_lookup=();
my %trgcol_lookup=();
my %src_tab_to_cols_hash=();
my %trg_tab_to_cols_hash=();
my %srctab_to_coltype_lookup_hoh=();
my %trgtab_to_coltype_lookup_hoh=();

# DBI variables UPDATABLE Variables.
my $sourcedb="XXXXXXXX";
my $targetdb="XXXXXXXX";
my $src_hostname="xxxxxxx.yyy.zzz";
my $trg_hostname="xxxxxxx.yyy.zzz";
my $src_port="12345";
my $trg_port="12345";
my $source_dsn="DBI:DB2:database=$sourcedb;hostname=$src_hostname;port=$src_port";
my $target_dsn="DBI:DB2:database=$targetdb;hostname=$trg_hostname;port=$trg_port";;
my $src_schema="XXXXXXX";
my $trg_schema="XXXXXXX";
my $src_uname="xxxxxxxx";
my $trg_uname="xxxxxxxx";

# Temporarily disable the screen input printing, so that password is not visible to other users
system("stty -echo");
say "Please enter the source db password:";
chomp(my $src_passwd=);
say "Please enter the target db password:";
chomp(my $trg_passwd=);
# Turn echo back on the screen for the rest of the program
system("stty echo");

# DBI Non-updatable variables.
my $src_dbh=DBI->connect($source_dsn, $src_uname, $src_passwd) or die "Could not connect to database $sourcedb";
my $trg_dbh=DBI->connect($target_dsn, $trg_uname, $trg_passwd) or die "Could not connect to database $targetdb";
my $src_tab_sth;
my $trg_tab_sth;
my $src_col_sth;
my $trg_col_sth;
my %SOURCE_TAB_DS=();
my %TARGET_TAB_DS=();


sub Build_Sourcetab_Ds
   {
    my $src_int_arrayref;
    my @src_table_names;
    $src_tab_sth = $src_dbh->prepare("SELECT TABNAME FROM SYSCAT.TABLES WHERE TABSCHEMA='$src_schema'");
    $src_tab_sth->execute();

    while (my $tab_row = $src_tab_sth->fetchrow_hashref())
       {
        # Build the Source tables list into an array
        push @src_table_names, $tab_row->{TABNAME};
       }

       foreach my $src_tab_name (@src_table_names)
          {
           $src_col_sth = $src_dbh->prepare("SELECT COLNAME,TYPENAME,LENGTH FROM SYSCAT.COLUMNS WHERE TABNAME='$src_tab_name' AND TABSCHEMA='$src_schema'");
           $src_col_sth->execute();

           my %tab_to_cols_hash=();    
    
           # Empty out the table hash before each iteration, so that array will have only one hash corresponding to a table.
           # Remember %tab_to_cols_hash=undef vs %tab_to_cols_hash=() not the same. First one will assign an undef value and key to the hash, nasty...
           %tab_to_cols_hash=(); 
     
           # Need to empty out the hash for each iteration of the table, since datatypes are not unique for each table and column.
           # Also this is needed to build hoh with tablename as key and hashref as value, where hashref contains column names as key and datatype as value.
           my %cols_to_types_hash=();

           while (my $col_row = $src_col_sth->fetchrow_hashref()) 
              {
               $cols_to_types_hash{$col_row->{COLNAME}}="$col_row->{TYPENAME}"."($col_row->{LENGTH})";
              }
            $tab_to_cols_hash{$src_tab_name}={%cols_to_types_hash};
            
            # Each element of the array here is the anonymous hash reference with source tablename as key and another anonymous hashref as value.
            # The deepest anonymous hashref contains values of column names as keys and their data types as values.
            push @{$src_int_arrayref}, {%tab_to_cols_hash};
           }

<<'TESTCODE';
       foreach my $int_hashref (@{$src_int_arrayref})
          {
           say "$int_hashref";
           while ( my ($tab_key, $col_hash_value) = each %{$int_hashref} )
              {
               say "Table Name: $tab_key, $col_hash_value";
              }
           say;
          }
TESTCODE

   # Build the final Data structure for source schema
   $SOURCE_TAB_DS{$src_schema}=[@{$src_int_arrayref}];

   } 


sub Build_Targettab_Ds
   {
    my $trg_int_arrayref;
    my @trg_table_names;
    $trg_tab_sth = $trg_dbh->prepare("SELECT TABNAME FROM SYSCAT.TABLES WHERE TABSCHEMA='$trg_schema'");
    $trg_tab_sth->execute();

    while (my $tab_row = $trg_tab_sth->fetchrow_hashref())
       {
        # Build the target tables list into an array
        push @trg_table_names, $tab_row->{TABNAME};
       }

       foreach my $trg_tab_name (@trg_table_names)
          {
           $trg_col_sth = $trg_dbh->prepare("SELECT COLNAME,TYPENAME,LENGTH FROM SYSCAT.COLUMNS WHERE TABNAME='$trg_tab_name' AND TABSCHEMA='$trg_schema'");
           $trg_col_sth->execute();

           my %tab_to_cols_hash=();

           # Empty out the table hash before each iteration, so that array will have only one hash corresponding to a table.
           # Remember %tab_to_cols_hash=undef vs %tab_to_cols_hash=() not the same. First one will assign an undef value and key to the hash, nasty... 
           %tab_to_cols_hash=();

           # Need to empty out the hash for each iteration of the table, since datatypes are not unique for each table and column. 
           # Also this is needed to build hoh with tablename as key and hashref as value, where hashref contains column names as key and datatype as value.
           my %cols_to_types_hash=();

           while (my $col_row = $trg_col_sth->fetchrow_hashref())
              {
               $cols_to_types_hash{$col_row->{COLNAME}}="$col_row->{TYPENAME}"."($col_row->{LENGTH})";
              }
            $tab_to_cols_hash{$trg_tab_name}={%cols_to_types_hash};

            # Each element of the array here is the anonymous hash reference with target tablename as key and another anonymous hashref as value.
            # The deepest anonymous hashref contains values of column names as keys and their data types as values.
            push @{$trg_int_arrayref}, {%tab_to_cols_hash};
           }

<<'TESTCODE';
       foreach my $int_hashref (@{$trg_int_arrayref})
          {
           say "$int_hashref";
           while ( my ($tab_key, $col_hash_value) = each %{$int_hashref} )
              {
               say "Table Name: $tab_key, $col_hash_value";
              }
           say;
          }
TESTCODE

   
   $TARGET_TAB_DS{$trg_schema}=[@{$trg_int_arrayref}];

   }


sub Compare_Source_To_Target {
   foreach my $srcschema (keys %SOURCE_TAB_DS)
      {
      if (exists $TARGET_TAB_DS{$srcschema})
         {
          # say "Schema $srcschema exist in target.\n";
         } else {
          die "Schema $srcschema does not exist in target.\n";
         }
   
   
      # Build look up hashes with tables list for source schema from SOURCE_TAB_DS.
      foreach my $srctabhash_ref (@{$SOURCE_TAB_DS{$srcschema}})
         {
          foreach my $srctab_name (keys %{$srctabhash_ref})
             {
             # Build the hash for tables in source schema for lookup.
             $srctabs_lookup{$srctab_name}=1;
          
             # Build a hash with keys of source table names and values referencing to the deepest hash with columns information
             $srctab_to_coltype_lookup_hoh{$srctab_name}={%{$srctabhash_ref->{$srctab_name}}};
            }
         }

   
       # Build look up hashes with tables list for target schema from TARGET_TAB_DS.
       foreach my $trgtabhash_ref (@{$TARGET_TAB_DS{$srcschema}})
         {
          foreach my $trgtab_name (keys %{$trgtabhash_ref})
            {
             # Build the hash for tables in target schema for lookup
             $trgtabs_lookup{$trgtab_name}=1;
          
             # Build a hash with keys of target table names and values referencing to the deepest hash with columns information
             $trgtab_to_coltype_lookup_hoh{$trgtab_name}={%{$trgtabhash_ref->{$trgtab_name}}};
            }
         }
  
       # Compare if all the source tables exist in the target.
       foreach my $srctab_name(keys %srctabs_lookup) 
          {
          if (! exists $trgtabs_lookup{$srctab_name})
             {
              say "$srctab_name table DOES NOT EXIST in target schema\n";
             }
          else
             {
              
              # say "$srctab_name table exists in target schema";
              # Now lets compare the columns and types for each of those columns
              &Col_Typediff_Checker(\%{$srctab_to_coltype_lookup_hoh{$srctab_name}},\%{$trgtab_to_coltype_lookup_hoh{$srctab_name}},$srctab_name);
             }
          }
      }
}

   # In this function we will compare to see if the column names and types are identical between source and target otherwise complain

sub Col_Typediff_Checker
   {
    my ($source_coltype_hashref, $target_coltype_hashref, $tabname)=@_;
    my @source_cols=(keys %{$source_coltype_hashref});
    my @target_cols=(keys %{$target_coltype_hashref});
    my $missing_col_ind=0;
     
    if ( @source_cols != @target_cols )
       {
        say "\tNumber of columns for source: " . scalar(@source_cols) . " differ from number of columns on Target: " . scalar(@target_cols);
       }
       else
       {
        # say "\tNumber of columns between source: " . scalar(@source_cols) . " and target: " . scalar(@target_cols) . " are same";
       }

    foreach my $srccol_name (keys %{$source_coltype_hashref})
       {
        $srccol_lookup{$srccol_name}=1;
       }

    foreach my $trgcol_name (keys %{$target_coltype_hashref})
       {
        $trgcol_lookup{$trgcol_name}=1;
       }

    # Are the column names same for both source and target and whether all the columns in source are present in target and NOT viceversa

    foreach my $srccol_name(keys %{$source_coltype_hashref})
       {
        if (! exists $trgcol_lookup{$srccol_name})
           {
            ++$missing_col_ind;
            say "\t$srccol_name column is NOT PRESENT in target";
           }
        else
           {
            # say "\t$srccol_name column is present in target";
           }
        }


    # Are the datatypes same for both source and target ? Lets check...

    if (scalar grep { ($source_coltype_hashref->{$_} ne $target_coltype_hashref->{$_}) and (defined $target_coltype_hashref->{$_}) } (keys %{$source_coltype_hashref}))
       {
        say "Table: $tabname";
        say "\t\tAtleast one of the data types between Source and Target columns DID NOT MATCH";

        # Use a while loop if you would like to save memory. Every time each is called it only returns a pair of (key, value) element.

        while (my ($key,$value) = each %{$source_coltype_hashref})
           {
            say "\t\tSource column: $key -> $value, Target column: $key -> $target_coltype_hashref->{$key}" if (($target_coltype_hashref->{$key} ne $value) and (defined $trgcol_lookup{$key}));
           }
        say; # Format I say :-)
       }
    else
       {
        say "\t\tThere are some missing column(s): Otherwise all the remaining column's data types between source and target columns matched" if ($missing_col_ind >= 1);
        # say "\tAll the column's data types between source and target columns matched" if ($missing_col_ind == 0);
       } 
     %srccol_lookup=();      # Empty the hash lookup for the next table's columns iteration
     undef %trgcol_lookup;   # Apparently this does the same as above
    }

# Check if user wants to compare tables between source and target schemas ?
say;
say "\t$0 will let you control what database objects to compare by prompting you for yes/no";
say "\tEnter either y|yes, case does not matter to compare or anything else to skip";
print "\t$0 will compare tables between source and target schemas would you like to proceed ?[y|yes]";
chomp(my $tab_ind=);
say;

if ($tab_ind =~ /(y|yes)/i) {
   Build_Sourcetab_Ds();
   Build_Targettab_Ds();
   Compare_Source_To_Target();
}

$src_dbh->disconnect() or say "Failed to disconnect from $sourcedb";
$trg_dbh->disconnect() or say "Failed to disconnect from $targetdb";

So, I want a solution that I can use with out having to worry about causing adverse impact to the environment and I want to use it anywhere i.e. from my desktop, my laptop, my server etc. with out needing to do any extra setup.

I took a step back and clearly wrote down the problem definition. It is at this point I got enlightened that most people try to solve a problem with what ever tools they know well, even though other tools provide a more comprehensive solution. I learned the following things
  • I need a tool that can be executed from a variety of environments with minimal configuration.
  • I need a language with good support for data structures in an object oriented fashion (as objects may contain nested data structures).
  • Language should have easy garbage collection, as this tool should run with minimal memory foot print.

Method 3 (Java):

After doing some analysis, I determined that I could easily address all the aforementioned concerns raised during SQL and perl solutions using java. Using an object oriented language such as java, you can easily build things (tables, indexes etc.) as objects. In our case think of table as an object, PK as an object, Index as an object etc. You can put PK object, Indexes object etc. inside a table object. Once you have the nested object created, simply compare the objects between source and target. java has been around for a while, has strong garbage collection capabilities, platform independent etc. so I decided to go ahead with java as programming language of choice.  

Advantages of using Db2Diff:

  1. It can be run from any command line (Windows, Linux, OS X, AIX etc.) as long as it has java runtime environment installed.
  2. Minimal memory foot print as java is pretty good at garbage collection.
  3. Only meta data will be fetched from the database. Data structures along with their comparison is handled inside the java JVM (which could be on your desktop or mac), as a result this utility runs separately from database engine.
 

Whats next for Db2Diff:

  1. Initial version of db2utils.jar got shipped with only deep comparison of most fundamental RDBMS object i.e. table. Next versions will handle Routines, Stored Procedures, functions and other object comparisons.
  2. Would love to hear back from the community on what other functionality you would like to see as part of this utility ?
  3. Please leave comments or suggestions if you see any bugs while using this utility and I will try to fix them as soon as I can.

19 Comments

  1. Chandra Sekhar Nookarapu

    Very detailed written utility that can be useful for DBAs , developers and test analysts. Need to have deep understanding of DB2 internals and functionality to develop this utility. Also very well explained. Thank you for Sharing this.

    Reply
    • Raghu

      Thank you Sekhar! I am glad you find this utility useful. I hope to find more time to incorporate more features into this utility.

      Reply
  2. Ram

    Superb.Thanks Raghu

    Reply
    • Raghu

      Thank you Ram!

      Reply
  3. markus fraune

    hey raghu, sounds like a really helpful tool. will test next week and come back to you!

    Reply
    • Raghu

      Markus, thank you for taking time to test this out. Let me know how it goes.

      Reply
  4. Steven James

    I’m trying to run your tool against two databases stored on an iSeries. It’s asking for a DB Connect license. I’ve used jdbc connections in the past without this license… do you know why it would be asking for this?

    Reply
    • Raghu

      Hi Steven,

      Sorry I should have made it clear in the blog post. This tool is only designed to work with DB2 for LUW.

      Thanks!
      Raghu

      Reply
      • JW

        Are you going to develop this in the future? WHat about licence, it is open source? Are you piblishing source code?

        Reply
        • Raghu

          I have plans of developing this in future potentially with support from community. I am not spending as much time as I would love to on developing this utility.

          Plan is to get a github account and publish the source code/document. It will be open source.

          Reply
          • Sandeep

            Hi Raghu,

            It was indeed the best script i have ever used till date.Can you also add report writer when we ran the java utility for comparison .Also i compared two different schemas i could not see system comparisons from the redirected log.It checked all the tables.please let me know how can we achieve it.Really appreciate for the tool.

          • Raghu

            Sandeep,

            For it to compare system configuration you need to compare the whole database and not just one schema.
            If you are asking for a schema, it bypasses system comparison.

            Thanks!
            Raghu

  5. Hung Tam Nguyen

    Looks like a great tool, been looking for this in the past. Take a look at lightweight commercial tools such as DBSolo if you need ideas for further improvements. 😉

    Reply
  6. Ruchita Panicker

    Hi Raghu,
    I am doing in-place masking in which i want pre-masking data and post masking data. So for that how can i use this utility? Because i don’t want to store the data anywhere.

    Reply
    • Raghu

      This tool is used for DDL comparison and not for data comparison…

      Reply
  7. K.Ericsson

    I like your process. It shows the differences. However, we have many differences that we need to apply from the source to the target db. How do you generate the DDL so that the target environment will resemble the source environment, please? Thank you!

    Reply
    • Raghu

      Tool today do not possess this feature. As there are a number of variations to performing the alters and some additional steps that may need to be performed such as performing REORG’s, rebuilding dependent objects etc.
      I have that in my mind but could not find enough time to do it. I have plans of making the code open source so that community can contribute to the addition of new features.

      Thanks!
      Raghu

      Reply
  8. ljc

    Hi,Raghu:
    how to enable ssl connection ?
    i tried add in db2diff.prop ,but not work.

    sslConnection=true
    sslTrustStoreLocation=/db2_src/jcekdb.jks
    sslTrustStorePassword=xxxx

    Reply
  9. ljc

    how to use ssl connection?

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Shares
Share This

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close