How to merge many XML files into one?

Hi: I got a small project to combine many XML files into one and convert the combined XML file in Excel using AppleScript. My XML files look like this:


<?xml version="1.0" encoding="UTF-8"?>
<Metadataobject>
    <from>thorsten.jock@medienpartner-int.com</from>
    <jobname>B3_IM09MBDUF</jobname>
    <pages>2</pages>
    <priority>3</priority>
    <timezone>CEST</timezone>
    <year>2013</year>
    <month>7</month>
    <day>15</day>
    <hour>11</hour>
</Metadataobject>


and like this...


<?xml version="1.0" encoding="UTF-8"?>
<Metadataobject>
    <from>frank.falk@medienpartner-int.com</from>
    <jobname>P1_FR1330G006007_Kate_van der Vaart</jobname>
    <pages>2</pages>
    <priority>1</priority>
    <timezone>CEST</timezone>
    <year>2013</year>
    <month>7</month>
    <day>12</day>
    <hour>16</hour>
</Metadataobject>


I get many XML files like this. And I want them to be combined and shown like this:


<?xml version="1.0" encoding="UTF-8"?>
<Metadataobject>
    <job id="1">
    <from>thorsten.jock@medienpartner-int.com</from>
    <jobname>B3_IM09MBDUF</jobname>
    <pages>2</pages>
    <priority>3</priority>
    <timezone>CEST</timezone>
    <year>2013</year>
    <month>7</month>
    <day>15</day>
    <hour>11</hour>
    </job>
    <job id="1">
    <from>frank.falk@medienpartner-int.com</from>
    <jobname>P1_FR1330G006007_Kate_van der Vaart</jobname>
    <pages>2</pages>
    <priority>1</priority>
    <timezone>CEST</timezone>
    <year>2013</year>
    <month>7</month>
    <day>12</day>
    <hour>16</hour>
    </job>
</Metadataobject>


And finally the combined XML file converts in Excel sheet with column headings "Job ID", "From", "Job Name" and so on...


Or there is another best way to get the same result...


Thanks

Mac OS X (10.7)

Posted on Jul 15, 2013 7:18 AM

Reply
15 replies

Jul 15, 2013 12:31 PM in response to twtwtw

That is just an intermediary state to get to the excel version. Actually I get many small XML files (as shown above) from client and I want them all combined in an excel sheet with common column headings... like this...


fromjobname pagesprioritytimezoneyearmonthdayhourid
thorsten.jock@medienpartner-int.comB3_IM09MBDUF 23CEST2013715111
thorsten.jock@medienpartner-int.comB3_IM09MBDUF 23CEST2013715112


Thanks for your response.

Jul 17, 2013 5:54 AM in response to mrindia

mrindia wrote:


Is there nobody to help me?

I think the problem is that no one wants to help you make your life more difficult and your tasks more manual. You are taking data designed for automated processing and stuffing it into a tool designed for manual processing. That's backwards.


What is your end result? What do you want to accomplish? You can't say "get the data into Excel". There has to be some end goal in mind. It doesn't matter what the end goal is, the fastest way to it is NOT through Excel.

Jul 18, 2013 6:13 PM in response to mrindia

Hello


You may try something like the following AppleScript script to combine individual xml files into one xml file as you described.

Set the indir variable to the POSIX path of input directory under which tree the source xml files reside and set the outfile to the POSIX path of output xml file, which is currently set to file named "out.xml" in your desktop.


Excel scripting is another story. I'll try later.

H



set indir to (choose folder with prompt "choose input folder")'s POSIX path's text 1 thru -2
set outfile to (path to desktop)'s POSIX path & "out.xml"

do shell script "/usr/bin/perl -CSDA <<'EOF' - " & indir's quoted form & " > " & outfile's quoted form & "
use strict;
use XML::LibXML;

chdir $ARGV[0] or die qq($!);
local $/ = qq(\\0);
my @ff = qx(find . -type f -iname '*.xml' -print0);

my $parser = XML::LibXML->new();
my $doc0 = XML::LibXML::Document->new('1.0', 'UTF-8');
my $meta0 = $doc0->createElement('Metadataobject');

my $i = 0;
for my $f (sort @ff) {
    chomp $f;
    my $doc = $parser->parse_file($f);
    my @metas = $doc->findnodes('/Metadataobject');
    next unless @metas;
    my $meta = $metas[0];

    my $job = $doc0->createElement('job');
    $job->setAttribute('id', ++$i);
    for  ( $meta->childNodes() ) {
        $job->appendChild($_);
    }    
    $meta0->appendChild($job);
}
$doc0->addChild($meta0);
print $doc0->toString(1);
EOF"

Jul 19, 2013 9:29 AM in response to mrindia

Hello


Here're two scrips you may try.


The SCRIPT 1 is intended to save the data as Excel file but the part populating and saving Excel sheet is not tested because I don't have Excel at hand. Note that, even if it works, numbers will be treated as text in the result sheet. If you want numbers as numbers, we need to modify the code as such.


The SCRIPT 2 is a provision in case SCRIPT 1 fails. It will save the data as csv file which may be opened by Excel. Note that the field separator is set to comma and not semi-colon. You can change it to semi-colon if you wish by setting $, to qq(;) in Perl script.


In both scripts, you need to set infile to the combined xml file created by the previous script and set the outfile to the result file. Currently, infile is assumed to be a file named out.xml in your desktop and outfile will be out.xls (in SCRIPT 1) or out.csv (in SCRIPT 2) in your desktop.


Good luck,

H



--SCRIPT1
(*
    save xml data as xls file
*)
main()
on main()
    set infile to (path to desktop)'s POSIX path & "out.xml"
    set outfile to (path to desktop)'s POSIX path & "out.xls"

    set r to do shell script "/usr/bin/perl -CSDA <<'EOF' - " & infile's quoted form & "
use strict;
use XML::LibXML;

my $f = $ARGV[0] or die qq(Usage: $0 input_xml_file);
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($f);

local $\\ = qq(\\n);
local $, = qq(\\t);
print qw(from jobname pages priority timezone year month day hour id);

my @jobs = $doc->findnodes('/Metadataobject/job');
for my $j ( @jobs ) {
    my $from = $j->find('from');
    my $jobname = $j->find('jobname');
    my $pages = $j->find('pages');
    my $priority = $j->find('priority');
    my $timezone = $j->find('timezone');
    my $year = $j->find('year');
    my $month = $j->find('month');
    my $day = $j->find('day');
    my $hour = $j->find('hour');
    my $id = $j->getAttribute('id');

    print ($from, $jobname, $pages, $priority, $timezone, $year, $month, $day, $hour, $id);
}    
EOF"

    set rr to _text2array(r, tab, return) -- 2d-array representation of data
    --return rr -- for test
    save_array_as_xls(rr, outfile) -- # not tested
end main

on _text2array(t, cs, rs)
    script o
        property pp : _split(rs, t)
        property qq : {}
        repeat with p in my pp
            set end of my qq to _split(cs, p's contents)
        end repeat
        return my qq's contents
    end script
    tell o to run
end _text2array

on _split(d, t)
    (*
        string or list d : separator(s)
        string t : source string
        return list : t splitted by d
    *)
    local astid, astid0, tt
    set astid to a reference to AppleScript's text item delimiters
    try
        set {astid0, astid's contents} to {astid's contents, {} & d}
        set tt to t's text items
        set astid's contents to astid0
    on error errs number errn
        set astid's contents to astid0
        error errs number errn
    end try
    return tt
end _split

on save_array_as_xls(_array, _file) -- # not tested
    (*
        list _array : list of lists representing 2d array of values of target range
        string _file : POSIX path of file to be saved

        * target range starts with $A$1 and ends with the cell determined by the size of _array
    *)
    set ix to count _array -- row count in _array
    set jx to count (_array's item 1) -- column count in _array

    tell application "Microsoft Excel"
        set wbk to make new workbook
        tell active sheet
            set _range to "$A$1:" & (get address (row ix's cell jx))
            set value of range _range to _array
        end tell
        save workbook as wbk filename (_file as POSIX file as string)
        close active workbook saving no
    end tell
end save_array_as_xls
--END OF SCRIPT 1



--SCRIPT 2
(*
    save xml data as csv file
*)
set infile to (path to desktop)'s POSIX path & "out.xml"
set outfile to (path to desktop)'s POSIX path & "out.csv"

do shell script "/usr/bin/perl -CSDA <<'EOF' - " & infile's quoted form & " > " & outfile's quoted form & "
use strict;
use XML::LibXML;

my $f = $ARGV[0] or die qq(Usage: $0 input_xml_file);
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($f);

# 
#     CSV spec
# 
#     - record separator is CRLF
#     - field separator is comma
#     - every field is quoted
#     - text encoding is UTF-8
# 
local $\\ = qq(\\r\\n);    # CRLF
local $, = qq(,);        # comma
print qw(\"from\" \"jobname\" \"pages\" \"priority\" \"timezone\" \"year\" \"month\" \"day\" \"hour\" \"id\");    # header

my @jobs = $doc->findnodes('/Metadataobject/job');
for my $j ( @jobs ) {
    my $from = $j->find('from');
    my $jobname = $j->find('jobname');
    my $pages = $j->find('pages');
    my $priority = $j->find('priority');
    my $timezone = $j->find('timezone');
    my $year = $j->find('year');
    my $month = $j->find('month');
    my $day = $j->find('day');
    my $hour = $j->find('hour');
    my $id = $j->getAttribute('id');

    print map { s/\"/\"\"/og; qq(\").$_.qq(\"); } ($from, $jobname, $pages, $priority, $timezone, $year, $month, $day, $hour, $id);
}    
EOF"
--END OF SCRIPT 2


Message was edited by: Hiroto (Oops, fixed some code in SCRIPT 1. Sorry.)

Jul 19, 2013 12:30 PM in response to mrindia

I had to think about this for a couple of days - interesting problem - but here's an applescript script that should do what you want. Obvious once you get your head around it, not at all obvious until you do - lol.


-- choose list of xml files

set fileList to choose file with multiple selections allowed


-- reads data out of xml files into applescript list

tell application "System Events"

set dataList to {}

repeat with thisXMLFilePath in fileList

set thisXMLFile to XML file "/Users/tedwrigley/Test area/join test 1.xml"

set end of dataList to value of XML elements of XML element "Metadataobject" of thisXMLFile

end repeat

end tell


-- gets dimensions of list

set w to count of first item of dataList

set h to count of dataList


-- writes list to excel worksheet

tell application "Microsoft Excel"

tell worksheet 1 of workbook 1

set currRange to cell "A1"

set pasteRange to get resizecurrRangerow sizehcolumn sizew

set value of pasteRange to dataList

end tell

end tell

Jul 20, 2013 6:41 AM in response to mrindia

Hmm. That's strange, I mean that specific error is strange, because the variable r is the result of "do shell script" command, which is anything written to STDOUT or empty string if nothing is written to STDOUT in shell script and thus is always defined unless "do shell script" command raises error, in which case the error message is different.


As far as I can tell, I cannot reproduce the error and cannot think of a reason especially when the SCRIPT 2 works. Two Perl scripts only differ in the output form, one is TSV and another is CSV.


Please copy and paste the code of SCRIPT 1 into new script and try it afresh.


Regards,

H

Jul 20, 2013 6:48 AM in response to mrindia

Hello


That's easy. You may try this revised version. Now the year, month and day columns are combined into one date column whose value is in mm.dd.yyyy format.


--SCRIPT 2a
set infile to (path to desktop)'s POSIX path & "out.xml"
set outfile to (path to desktop)'s POSIX path & "out.csv"

do shell script "/usr/bin/perl -CSDA <<'EOF' - " & infile's quoted form & " > " & outfile's quoted form & "
use strict;
use XML::LibXML;

my $f = $ARGV[0] or die qq(Usage: $0 input_xml_file);
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($f);

# 
#     CSV spec
# 
#     - record separator is CRLF
#     - field separator is comma
#     - every field is quoted
#     - text encoding is UTF-8
# 
local $\\ = qq(\\r\\n);    # CRLF
local $, = qq(,);        # comma
print qw(\"from\" \"jobname\" \"pages\" \"priority\" \"timezone\" \"date\" \"hour\" \"id\");    # header

my @jobs = $doc->findnodes('/Metadataobject/job');
for my $j ( @jobs ) {
    my $from = $j->find('from');
    my $jobname = $j->find('jobname');
    my $pages = $j->find('pages');
    my $priority = $j->find('priority');
    my $timezone = $j->find('timezone');
    my $year = $j->find('year');
    my $month = $j->find('month');
    my $day = $j->find('day');
    my $hour = $j->find('hour');
    my $id = $j->getAttribute('id');
    
    my $date = sprintf('%02d.%02d.%04d', $day, $month, $year);
    print map { s/\"/\"\"/og; qq(\").$_.qq(\"); } ($from, $jobname, $pages, $priority, $timezone, $date, $hour, $id);
}    
EOF"
--END OF SCRIPT 2a



And just in case, here's the revised version of SCRIPT 1's main() handler to implement the same change. You can simply replace the original main() handler with this new main() in SCRIPT 1 in order to get mm.dd.yyyy date column.


--SCRIPT 1a : main()
on main()
    set infile to (path to desktop)'s POSIX path & "out.xml"
    set outfile to (path to desktop)'s POSIX path & "out.xls"
    
    set r to do shell script "/usr/bin/perl -CSDA <<'EOF' - " & infile's quoted form & "
use strict;
use XML::LibXML;

my $f = $ARGV[0] or die qq(Usage: $0 input_xml_file);
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($f);

local $\\ = qq(\\n);
local $, = qq(\\t);
print qw(from jobname pages priority timezone date hour id);

my @jobs = $doc->findnodes('/Metadataobject/job');
for my $j ( @jobs ) {
    my $from = $j->find('from');
    my $jobname = $j->find('jobname');
    my $pages = $j->find('pages');
    my $priority = $j->find('priority');
    my $timezone = $j->find('timezone');
    my $year = $j->find('year');
    my $month = $j->find('month');
    my $day = $j->find('day');
    my $hour = $j->find('hour');
    my $id = $j->getAttribute('id');
    
    my $date = sprintf('%02d.%02d.%04d', $day, $month, $year);
    print ($from, $jobname, $pages, $priority, $timezone, $date, $hour, $id);
}    
EOF"
    
    --return r -- for test
    set rr to _text2array(r, tab, return) -- 2d-array representation of data
    --return rr -- for test
    save_array_as_xls(rr, outfile)
end main
--END OF SCRIPT 1a  : main()



Good luck,

Hiroto

Jul 25, 2013 8:40 AM in response to Hiroto

I changed the script a little bit. I've given the path of indir and outfile instead of choose folder. But it shows error in do shell command "sh: /Volumes/Users/Admin/Desktop/Script/Merged_XML.xml: No such file or directory.



--Merge Multiple XMLs


set indir to ("Volumes:Users:Admin:Desktop:Script:XMLs:")'s POSIX path's text 1 thru -2

set outfile to ("Volumes:Users:Admin:Desktop:Script:Merged_XML:")'s POSIX path & "Merged_XML.xml"


do shell script "/usr/bin/perl -CSDA <<'EOF' - " & indir's quoted form & " > " & outfile's quoted form & "

use strict;

use XML::LibXML;


chdir $ARGV[0] or die qq($!);

local $/ = qq(\\0);

my @ff = qx(find . -type f -iname '*.xml' -print0);


my $parser = XML::LibXML->new();

my $doc0 = XML::LibXML::Document->new('1.0', 'UTF-8');

my $meta0 = $doc0->createElement('Metadataobject');


my $i = 0;

for my $f (sort @ff) {

chomp $f;

my $doc = $parser->parse_file($f);

my @metas = $doc->findnodes('/Metadataobject');

next unless @metas;

my $meta = $metas[0];


my $job = $doc0->createElement('job');

$job->setAttribute('id', ++$i);

for ( $meta->childNodes() ) {

$job->appendChild($_);

}

$meta0->appendChild($job);

}

$doc0->addChild($meta0);

print $doc0->toString(1);

EOF"

Jul 25, 2013 9:12 AM in response to mrindia

Hello


These POSIX paths look not correct:

/Volumes/Users/Admin/Desktop/Script/XMLs
/Volumes/Users/Admin/Desktop/Script/Merged_XML/Merged_XML.xml


They should have been either:

/Users/Admin/Desktop/Script/XMLs
/Users/Admin/Desktop/Script/Merged_XML/Merged_XML.xml


or:

/Volumes/VOLUME_NAME/Users/Admin/Desktop/Script/XMLs
/Volumes/VOLUME_NAME/Users/Admin/Desktop/Script/Merged_XML/Merged_XML.xml


where VOLUME_NAME is the mount point name of the remote volume.



In order to get correct POSIX paths, you may use these statements:

return (choose folder)'s POSIX path


or:

return (choose file)'s POSIX path


and the correct POSIX path will be given in result pane (or window) of AppleScript Editor.


Regards,

H

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

How to merge many XML files into one?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.