Skip navigation

merging files and deleting duplicates?

1014 Views 11 Replies Latest reply: Jun 25, 2013 5:48 PM by hotwheels 22 RSS
hotwheels 22 Level 1 Level 1 (0 points)
Currently Being Moderated
Jun 24, 2013 9:10 AM

i have two csv databases with contact information from my laptop and my desktop.

 

i would like to /merge/ these two files and delete duplicate entries.

 

is this possible?

 

TIA

Mac Pro, Mac OS X (10.7.4), 64 bit Win7 in Parallels | iCloud
  • Wayne Contello Level 6 Level 6 (12,800 points)
    Currently Being Moderated
    Jun 24, 2013 2:40 PM (in response to hotwheels 22)

    Sure it is possible

  • Wayne Contello Level 6 Level 6 (12,800 points)
    Currently Being Moderated
    Jun 24, 2013 3:50 PM (in response to hotwheels 22)

    I would open both documents, then copy one table and paste after the other.  Then sort by the relavent columns so that like entries are near each other.  If the volumne of entries is large then another solution that is more automated may be necessary.

  • Wayne Contello Level 6 Level 6 (12,800 points)
    Currently Being Moderated
    Jun 24, 2013 6:20 PM (in response to hotwheels 22)

    Can you provide any details regarding the volume of records? what fields are in each record?  The order of the fields (just so any example is close to what you have)?

  • Wayne Contello Level 6 Level 6 (12,800 points)
    Currently Being Moderated
    Jun 24, 2013 6:39 PM (in response to hotwheels 22)

    Here is the least automated and most "Numbers" solution (that is no scripting):

     

    Assuming you have opened the two csv files in numbers and they are in tables (whether in the same doc or not) you can merge the two tables simply by select the rows or one table, then using the copy command (menu item "Edit > Copy"), then adding a row to the other table, select the first cell new row, then pasting (menu item "Edit > Paste):

     

    Screen Shot 2013-06-24 at 8.28.40 PM.png

     

    In my example I have two datasets in two different tables shrewdly titles "Dataset 1" and "Dataset 2".  Then I select the values in the table "Dataset 2", then copy and paste to the end of the table "Dataset 1" after adding a new row (select the table and use the menu item "Table > Add Row Below"  or "Above" depending on where the cursor is):

    Screen Shot 2013-06-24 at 8.30.03 PM.png

     

    Once you have moved the datainto a common table you can sort the table using the reorganize panel.  To open the reorganize panel select the table (in this example "Dataset 1") then select the menu item "Table > Show Reorganize Panel".

     

    Now you can sort by the last name, then first name in a single action:

    Screen Shot 2013-06-24 at 8.38.20 PM.png

     

    Now you can search through the list and delete duplicates.  This is safest incase there is info in one that you want to keep

  • Wayne Contello Level 6 Level 6 (12,800 points)
    Currently Being Moderated
    Jun 24, 2013 7:10 PM (in response to hotwheels 22)

    No I made a suggestion based on no information (except two CSV files one from two different computers) with some of my own assumptions.

     

    Do you want the automated solution to decide whether information in one record is correct and throw out the other information?  Which duplicates are the ones to delete?

     

    With the sorting method you can add a single column (after you sort) that will easily identify the few dups and highlight the corresponding row to draw your attention.  Like this:

    Screen Shot 2013-06-24 at 9.09.25 PM.png

    D2=IF(AND(A2=A1, B2=B1), 1, 0)

     

    select D2 and fill down as needed or copy D2, then select the column then paste

  • Hiroto Level 5 Level 5 (4,815 points)
    Currently Being Moderated
    Jun 24, 2013 11:37 PM (in response to hotwheels 22)

    Hello

     

    Try the following AppleScript script which is actually a simple wrapper of Perl script.

    It will ask you to choose source csv files and then yield output file named "merged.csv" on desktop. Source csv files are assumed to be in UTF-8 and use LF or CRLF for line ending. Output csv file will be in UTF-8 and use LF for line ending.

     

    main()
    on main()
        set outfile to (path to desktop)'s POSIX path & "merged.csv"
        set aa to choose file with prompt "Choose CSV files to merge." with multiple selections allowed
        set argv to ""
        repeat with a in aa
            set argv to argv & a's POSIX path's quoted form & space
        end repeat
        
        do shell script "/usr/bin/perl -CSDA <<'EOF' - " & argv & " > " & outfile's quoted form & "
    use strict;
    use open IN => ':crlf:utf8';
    
    local $\\ = qq(\\n);
    my %hash = ();
    my $h;
    while (<ARGV>) {
        chomp;
        $h = $_ unless $h;
        $hash{$_} = $_;
    }
    print $hash{$h};
    for (sort keys %hash) {
        print $hash{$_} if $_ ne $h;
    }
    EOF
    "
    end main
    

     

    Good luck,

    H

Actions

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • This solved my question - 10 points
  • This helped me - 5 points
This site contains user submitted content, comments and opinions and is for informational purposes only. Apple disclaims any and all liability for the acts, omissions and conduct of any third parties in connection with or related to your use of the site. All postings and use of the content on this site are subject to the Apple Support Communities Terms of Use.