Online College Courses for Credit

+
Find Duplicate Rows

Find Duplicate Rows

Rating:
(0)
Author: Sophia Tutorial
Description:

Identify a strategy that would enable a database administrator to identify duplicate rows in a table.

(more)
See More
Tutorial

what's covered
This tutorial explores finding duplicate rows of data in a table.

Finding duplicate data in a table can be quite useful as it can help us identify potential issues or matches. A duplicate row is one that refers to the same thing or person as a whole other row. However, it is important to note that not all duplicate rows will have completely identical information as it will depend on what columns of data we want to search on. For example, we may have a large employee table that stores the employee’s social security number that uniquely identifies the employee in the US. We may want to use this to ensure that the employee is only listed once. If we had customers, we could search for duplicate accounts with an email address as traditionally, an email address should only belong to a single customer for most eCommerce sites. If we had multiple records for a single customer, it would be difficult to get a full order history for the customer as they would have several rows that we would have to compare against.

The fundamental of the structure of the query to find duplicates would look like the following:

SELECT <columnlist>
FROM <tablename>
GROUP BY <columnlist>
HAVING COUNT(*) > 1;

Note that the column list in the SELECT clause should match the column list in the GROUP BY clause. In addition, we could add a COUNT(*) in the SELECT clause so we can identify how many duplicates there were of the same criteria that we grouped by.

For example, we may want to verify that all of our customers have a unique phone number, we could do so by doing:

SELECT phone, COUNT(*)
FROM customer
GROUP BY phone
HAVING COUNT(*) > 1;

We can verify that there are no customers that meet this criterion:

Successful Query Result Message

Perhaps our organization has a criterion that only allows a customer to have a single order. This query could be used to check which customers have more than one order. If they have, how many are listed:

SELECT customer_id, COUNT(*)
FROM invoice
GROUP BY customer_id
HAVING COUNT(*) > 1;

Query Result Example

At a quick glimpse, we can be able to quickly identify those types of scenarios.


2. Duplicate for counting

Note too that the finding of duplicates can also be useful to identify rows of data that may occur more than once with the appropriate count. For example, we may want to identify the support_rep_id if they have served more than 1 customer:

SELECT support_rep_id, COUNT(*)
FROM customer
GROUP BY support_rep_id
HAVING COUNT(*) > 1;

Query Result Example

Or perhaps we may have a list of state/country of those that have more than one customer to gear our marketing to:

SELECT state, country, COUNT(*)
FROM customer
GROUP BY state, country
HAVING COUNT(*) > 1;

Query Result Example


try it
Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then enter in one of the examples above and see how it works. Next, try your own choices for which columns you want the query to provide.

summary
The use of finding duplicate rows in a table can be very useful for counting purposes.

Source: Authored by Vincent Tran