11 Replies Latest reply: Jun 25, 2013 5:48 PM by hotwheels 22
hotwheels 22 Level 1 Level 1 (0 points)

i have two csv databases with contact information from my laptop and my desktop.

 

i would like to /merge/ these two files and delete duplicate entries.

 

is this possible?

 

TIA


Mac Pro, Mac OS X (10.7.4), 64 bit Win7 in Parallels | iCloud
  • 1. Re: merging files and deleting duplicates?
    Wayne Contello Level 6 Level 6 (13,620 points)

    Sure it is possible

  • 2. Re: merging files and deleting duplicates?
    hotwheels 22 Level 1 Level 1 (0 points)

    hi wayne. how does one do this?

  • 3. Re: merging files and deleting duplicates?
    Wayne Contello Level 6 Level 6 (13,620 points)

    I would open both documents, then copy one table and paste after the other.  Then sort by the relavent columns so that like entries are near each other.  If the volumne of entries is large then another solution that is more automated may be necessary.

  • 4. Re: merging files and deleting duplicates?
    hotwheels 22 Level 1 Level 1 (0 points)

    hi, wayne. yes! thanks!!!

     

    i need an automated solution.

     

    do you know of one?

  • 5. Re: merging files and deleting duplicates?
    Wayne Contello Level 6 Level 6 (13,620 points)

    Can you provide any details regarding the volume of records? what fields are in each record?  The order of the fields (just so any example is close to what you have)?

  • 6. Re: merging files and deleting duplicates?
    Wayne Contello Level 6 Level 6 (13,620 points)

    Here is the least automated and most "Numbers" solution (that is no scripting):

     

    Assuming you have opened the two csv files in numbers and they are in tables (whether in the same doc or not) you can merge the two tables simply by select the rows or one table, then using the copy command (menu item "Edit > Copy"), then adding a row to the other table, select the first cell new row, then pasting (menu item "Edit > Paste):

     

    Screen Shot 2013-06-24 at 8.28.40 PM.png

     

    In my example I have two datasets in two different tables shrewdly titles "Dataset 1" and "Dataset 2".  Then I select the values in the table "Dataset 2", then copy and paste to the end of the table "Dataset 1" after adding a new row (select the table and use the menu item "Table > Add Row Below"  or "Above" depending on where the cursor is):

    Screen Shot 2013-06-24 at 8.30.03 PM.png

     

    Once you have moved the datainto a common table you can sort the table using the reorganize panel.  To open the reorganize panel select the table (in this example "Dataset 1") then select the menu item "Table > Show Reorganize Panel".

     

    Now you can sort by the last name, then first name in a single action:

    Screen Shot 2013-06-24 at 8.38.20 PM.png

     

    Now you can search through the list and delete duplicates.  This is safest incase there is info in one that you want to keep

  • 7. Re: merging files and deleting duplicates?
    hotwheels 22 Level 1 Level 1 (0 points)

    let me get this straight.

     

    you are suggesting i manually copy and paste one document into the other and then manually delete the duplicates?

     

    i have 3500 listing with two or three or four dupes.

     

    i wonder if anyone knows of an automated way to do this. i mean, the copying and pasting and re-sorting is helpful but i think i am needing an automated solution here ideally...

     

    THANKS!

  • 8. Re: merging files and deleting duplicates?
    Wayne Contello Level 6 Level 6 (13,620 points)

    No I made a suggestion based on no information (except two CSV files one from two different computers) with some of my own assumptions.

     

    Do you want the automated solution to decide whether information in one record is correct and throw out the other information?  Which duplicates are the ones to delete?

     

    With the sorting method you can add a single column (after you sort) that will easily identify the few dups and highlight the corresponding row to draw your attention.  Like this:

    Screen Shot 2013-06-24 at 9.09.25 PM.png

    D2=IF(AND(A2=A1, B2=B1), 1, 0)

     

    select D2 and fill down as needed or copy D2, then select the column then paste

  • 9. Re: merging files and deleting duplicates?
    Hiroto Level 5 Level 5 (5,025 points)

    Hello

     

    Try the following AppleScript script which is actually a simple wrapper of Perl script.

    It will ask you to choose source csv files and then yield output file named "merged.csv" on desktop. Source csv files are assumed to be in UTF-8 and use LF or CRLF for line ending. Output csv file will be in UTF-8 and use LF for line ending.

     

    main()
    on main()
        set outfile to (path to desktop)'s POSIX path & "merged.csv"
        set aa to choose file with prompt "Choose CSV files to merge." with multiple selections allowed
        set argv to ""
        repeat with a in aa
            set argv to argv & a's POSIX path's quoted form & space
        end repeat
        
        do shell script "/usr/bin/perl -CSDA <<'EOF' - " & argv & " > " & outfile's quoted form & "
    use strict;
    use open IN => ':crlf:utf8';
    
    local $\\ = qq(\\n);
    my %hash = ();
    my $h;
    while (<ARGV>) {
        chomp;
        $h = $_ unless $h;
        $hash{$_} = $_;
    }
    print $hash{$h};
    for (sort keys %hash) {
        print $hash{$_} if $_ ne $h;
    }
    EOF
    "
    end main
    

     

    Good luck,

    H

  • 10. Re: merging files and deleting duplicates?
    hotwheels 22 Level 1 Level 1 (0 points)

    thanks H.

     

    i have used applescript a couple of times and will find the brainspace to try this in a little bit.

     

    looks like a very interesting little tool you created.

     

    regards and thanks,

     

    jonathan

  • 11. Re: merging files and deleting duplicates?
    hotwheels 22 Level 1 Level 1 (0 points)

    hi wayne. thank you for your help.

     

    i am not sure i am following this exactly but are you indicating that i can make a visual comparison and then color coding the duplicates so that i can manually delete them?

     

    or are you indicating that somehow there is a feature in numbers that will identify exact duplicates...?

     

    THANKS