How to find duplicate records in table is a popular SQL interview
question which has been asked as many times as difference
between truncate and delete in SQL or finding second highest salary of
employee. Both of these SQL queries are must know for any one who is appearing
on any programming
interview where some questions on database and SQL are expected. In order
to find duplicate records in database
table you need to confirm definition of duplicates, for example in below contact table
which is suppose to store name and phone number of contact,
a record is considered to be duplicate if both name and phone number is same
but unique if either of them varies. Problem of duplicates in database arise
when you don't have a primary key or unique key on database and that's why its recommended to have a key
column in table. Anyway its easy to find duplicate records in table by using group
by clause of ANSI SQL. Group by clause is used to group data based upon any
column or a number of columns. Here in order to locate duplicate records we
need to use group by clause on both name and phone as shown
in second SQL SELECT query example. You can see in first query that it listed Ruby as duplicate
record even though both Ruby have different phone number because we only
performed group by on name. Once you have grouped data you can filter out
duplicates by using having clause. Having clause is counter part of where
clause for aggregation queries. Just remember to provide temporary name to count() data in
order to use them in having clause.
question which has been asked as many times as difference
between truncate and delete in SQL or finding second highest salary of
employee. Both of these SQL queries are must know for any one who is appearing
on any programming
interview where some questions on database and SQL are expected. In order
to find duplicate records in database
table you need to confirm definition of duplicates, for example in below contact table
which is suppose to store name and phone number of contact,
a record is considered to be duplicate if both name and phone number is same
but unique if either of them varies. Problem of duplicates in database arise
when you don't have a primary key or unique key on database and that's why its recommended to have a key
column in table. Anyway its easy to find duplicate records in table by using group
by clause of ANSI SQL. Group by clause is used to group data based upon any
column or a number of columns. Here in order to locate duplicate records we
need to use group by clause on both name and phone as shown
in second SQL SELECT query example. You can see in first query that it listed Ruby as duplicate
record even though both Ruby have different phone number because we only
performed group by on name. Once you have grouped data you can filter out
duplicates by using having clause. Having clause is counter part of where
clause for aggregation queries. Just remember to provide temporary name to count() data in
order to use them in having clause.