How do you remove duplicates in Numbers?
I have a Numbers file with about 5,000 entries. They are theres's a column for the first name and a column for the last name. How do I search and remove duplicate entries in sheet?
I have a Numbers file with about 5,000 entries. They are theres's a column for the first name and a column for the last name. How do I search and remove duplicate entries in sheet?
Here's a (moderately tested) script that should help:
SG
tell application "Numbers"
tell front document's active sheet
tell (first table whose selection range's class is range)
set vv to rows's cells's value
end tell
end tell
end tell
set {distinctFirstLast, newList} to {{}, {}}
repeat with r from 1 to vv's length
set fl to vv's item r's item 2 & vv's item r's item 3
if fl is not {missing value, missing value} then
if fl is not in distinctFirstLast then
copy fl to distinctFirstLast's end
copy vv's item r to newList's end
end if
end if
end repeat
set pasteStr to ""
set text item delimiters to tab
repeat with r from 1 to newList's length
set pasteStr to (pasteStr & newList'sitemr as string) & return
end repeat
set the clipboard topasteStr
-- return pasteStr -- can uncomment this line for testing
To avoid finding 'duplicates', where only the first name or the last name is repeated on another line, you should add two new columns; one in which you concatenate the first name and last name entries , the second (needed for the formula method, but not for the script method) in which you test for duplicates in the first.
For the example, the new columns are both inserted after column B, and become the new columns C and D.
Formula (assumes first name in column A, last name in column B. Will also work with LN in column A, FN in column B):
=A&B
Enter the formula in column C in the first row containing data, then fill down to the rest of the new column.
& is the concatenation operator, and simply joins the strings in 'this row' of columns A and B.
John Smith becomes JohnSmith
OR
Smith John becomes SmithJohn
formula method:
Using the new column C as the test-values column, enter the formula below in column D, in th first data row (assumed to be row 2).
D2: COUNTIF(C$1:C2,C2)
Fill down to the last row in the table.
The formula counts the number of appearances of the name in this row in all rows above this row and this row itself.
Before sorting, you need to convert this calculated value to a fixed value. To do that:
With all cells in this column (D) selected:
Copy
Go Edit > Paste Formula Results.
This replaces the formulas with the most recently calculated results.
Sort the table ascending on column D.
Scroll to the bottom of the table, and select all rows where the count in column D is greater that 1, then Delete Rows, using either the Table menu or the contextual menu opened by clicking the v that appears when you bring the mouse pointer to one of the selected rows.
When the duplicate rows have been deleted, the new columns (C and D) may also be selected then deleted.
Script method:
SGIII has written an AppleScript/Automator service to do this task, and will likely post that within the next 24 hours. You may find it a simpler method that the formulas above.
Regards,
Barry
For this problem, where you are dealing with more than one column, I think Barry's concatenate and COUNTIF method is probably the easiest. If you are doing this a lot (as opposed to it being just an occasional task) then I can post a customized script.
SG
I will be doing this several times over with most sheets having 5,000-10,000 rows. If you have script that would be amazing!
Ok, can you describe exactly do the tables look like. First name in column A, last name in column B?
SG
Column A: ID Number
Column B: First Name
Column C: Last Name
Column D: Email Adress
Thanks again!
When there is a duplicate first and last name will the IDs and email addresses be duplicated too? Or might they be different and you need to save both IDs and both address?
SG
ID numbers would be duplicated, as for the email address's, some are missing so I wouldn't rely on those.
!
This solved it! Thank you!
How do you remove duplicates in Numbers?