Monday, December 01, 2008

SQL - Find Possible Duplicate Records in Multiple Columns

 

Below is the code used to find possible duplicate records across multiple columns. Below I am trying to find duplicate rows in col1, col2 combined eg.

Consider the following ERD :- 

ERD

Table data:-

table data

Here row 2 and 3 are duplicate and the following SQL detects and finds such rows:-

SELECT a.* FROM test a INNER JOIN 
( 
    SELECT col1, col2 FROM test GROUP BY col1, col2 HAVING Count(*) >1 
) b 
ON (a.col1 = b.col1 
AND a.col2 = b.col2) 

3 comments:

  1. How about:

    select col1, col2, count(col2) from a group by col1,col2 having count(col2) > 1

    ReplyDelete
  2. try it out, this will too. that's why the group by is there :)

    ReplyDelete
  3. Anil, you are right...but the point I am trying to get out here is : Your query does "NOT" show the rows containing the dupes. It tells me what is duplicated. I want to print out all rows that are duplicated. Like in the above case - my query prints out A - B 2 times.

    Perhaps I have stripped off the example to a bare minimum and hence I am not able to explain the complexity involved. My work related query had sub selects - you really don't want to see PSoft queries do you ;)

    ReplyDelete

Thank you for your feedback