Tom’s Tutorials For Excel: Deleting Duplicate Records

Tom’s Tutorials For Excel: Deleting Duplicate Records

When you have a list of data, sometimes it is not enough to simply delete rows with duplicated information based only on the items in one column. Multi-column lists can have duplicated records based on the fact that every item in every column of a row’s data matches that of another row’s entire data, cell for cell. In those cases, you need to compare a concatenated (combined) string of each record’s (row’s) data, and compare that to the concatenated strings of all the other rows.

Take a close look at this Before and After picture. In the original list, all the items in rows 5 and 7 match, as do all the items in rows 3 and 10. This is a short list for demonstration purposes. If your list were a couple hundred thousand rows deep, such as an annual list of bank transactions or department store purchases, you would need a fast and easy way to delete duplicated records. The macro that follows the picture is one way to do the job. The comments in green explain each step.



Sub DeletingDuplicateRecords()

'Turn off ScreenUpdating to speed up the macro.
Application.ScreenUpdating = False

'Declare a range variable for the helper column being used.
Dim FilterRange As Range
'Define the range variables dynamic range.
Set FilterRange = Range("E1:E" & Cells(Rows.Count, 1).End(xlUp).Row)

'For efficiency, open a With structure for the FilterRange variable.
With FilterRange
'Enter the formula
'=SUMPRODUCT(($A$1:$A1=$A1)*($B$1:$B1=$B1)*($C$1:$C1=$C1)*($D$1:$D1=$D1))>1
'in all cells in column E (the helper column) that returns either TRUE
'if the record is a duplicate of a previous one, or FALSE if the record
'is unique among the records in all previous rows in the list.
.FormulaR1C1 = _
"=SUMPRODUCT((R1C1:RC1=RC1)*(R1C2:RC2=RC2)*(R1C3:RC3=RC3)*(R1C4:RC4=RC4))>1"
'Turn the formulas into static values because they will be filtered,
'and maybe deleted if any return TRUE.
.Value = .Value
'AutoFilter the helper column for TRUE.
.AutoFilter Field:=1, Criteria1:="TRUE"
'Error bypass in case no TRUEs exist in the helper column.
On Error Resume Next
'This next line resizes the FilterRange variable to exclude the first row.
'Then, it deletes all visible filtered rows, abbreviated by SpecialCells
'constant number 12.
.Offset(1).Resize(.Rows.Count - 1).SpecialCells(12).EntireRow.Delete
'Clear the Error object in case a run time error would have occurred,
'that is, if no TRUEs existed in the helper column to be deleted.
Err.Clear
'Close the With structure for the FilterRange variable object.
End With

'Exit (stop using) AutoFilter.
ActiveSheet.AutoFilterMode = False

'Clear all helper values (there would only be FALSEs at this moment).
'Note that Columns(5) means column E which is the fifth column from the left
'on a standard Excel spreadsheet.
Columns(5).Clear

'Clear the range object variable to restore system memory.
Set FilterRange = Nothing

'Turn ScreenUpdating back on.
Application.ScreenUpdating = True

End Sub

Share Button
Posted in Tom's Tutorials for Excel
Tags: , , , , , , , , , , , , ,
One comment on “Tom’s Tutorials For Excel: Deleting Duplicate Records
  1. Doris says:

    Good clean and clear cut. Nice, lots of explanation easy to follow.
    Thank you for sharing.

Leave a Reply

Your email address will not be published.

*