Merge XML Files as Automatic Script

Hey!
I got multiple xml files and want them to merge...

they are all like that with different informations.->


<?xml version="1.0" encoding="UTF-8" ?>
<tv generator-info-name="Rytec" generator-info-url="http://forums.openpli.org">
<channel id="Classica.de">
    <display-name lang="de">CLASSICA</display-name>
  </channel>
<channel id="VH1Classic.de">
     <display-name lang="se">VH1 Classic</display-name>
</channel>
<programme start="20160115011500 +0100" stop="20160115023000 +0100" channel="Classica.de">
    <title lang="de">Bach, Partiten für Violine solo</title>
    <sub-title lang="de">Solist Instr.: Gidon Kremer (Violine), Bildregie: Daniel Finkernagel, Alexander Lück</sub-title>
    <desc lang="de">Gidon Kremer spielt die Partiten für Violine solo Nr. 1 h-Moll BWV 1002, Nr. 2 d-Moll BWV 1004 und Nr. 3 E-Dur BWV 1004 von Johann Sebastian Bach (1685-1750). Anfang der siebziger Jahre begann Gidon Kremers Karriere, und seitdem hat er sich den Ruf eines Geigers von internationalem Format und ganz eigenem Stil erworben. Geboren wurde Kremer 1947 in der lettischen Hauptstadt Riga. Als Achtzehnjähriger spielte er David Oistrach vor, der ihn als einen seiner wenigen Schüler annahm und am Moskauer Konservatorium unterrichtete. 1967 errang Gidon Kremer beim Brüsseler Wettbewerb "Reine Elisabeth" seinen ersten internationalen Preis. 1970 erreichte er mit dem Ersten Preis des Moskauer Tschaikowsky-Wettbewerbs den vorläufigen Höhepunkt seiner noch jungen Laufbahn. Bald darauf zählte Kremer zu den gefragtesten Geigern der Welt. Nachdem Gidon Kremer 1981 die Sowjetunion für immer verlassen hatte, gründete er das Festival in Lockenhaus, bei dem sich alljährlich Musiker aus aller Welt treffen. Hier entstand diese Aufnahme.</desc>
  </programme>
</tv>

MacBook Pro, OS X El Capitan (10.11.2)

Posted on Jan 16, 2016 1:19 AM

Reply
10 replies

Jan 16, 2016 6:06 AM in response to HumanK

The following will merge .xml files that have your same structure, but different information. Based on your example, I created 4 test XML files with different information content, and ran the following Python script. All 4 XML files were merged into a combined.xml file. Quickly. Tested with default Python 2.7.10 on El Capitan 10.11.2.


The Python script, and its command-line invocation shown here are from a post on stackoverflow, named High-tech answer.


Copy the following Python source into a programming editor. Save filename as xmlcombine.py. Follow the interactive steps in the Terminal. Assumption, that all of your XML files that you want to combine are in the same folder as your Python script, and you are using the Bash shell.


chmod +x ./xmlcombine.py

shopt -s extglob

./xmlcombine.py !(combined).xml > combined.xml

shopt -u extglob


#!/usr/bin/env python

import sys

from xml.etree import ElementTree



def run(files):

first = None

for filename in files:

data = ElementTree.parse(filename).getroot()

if first is None:

first = data

else:

first.extend(data)

if first is not None:

print ElementTree.tostring(first)



if __name__ == "__main__":

run(sys.argv[1:])



Structure example:

User uploaded file

Jan 16, 2016 7:35 AM in response to VikingOSX

okay sorry..there is an issue... The problem is the created combine.xml is disordered.
means



<tv generator-info-name="Rytec" generator-info-url="http://forums.openpli.org">
<channel id="Classica.de">
    <display-name lang="de">CLASSICA</display-name>
  </channel>
  <channel id="W9.ch">
    <display-name lang="fr">W9</display-name>
  </channel>
<programme channel="Classica.de" start="20160115011500 +0100" stop="20160115023000 +0100">
    <title lang="de">Bach, Partiten f&#252;r Violine solo</title>
    <sub-title lang="de">Solist Instr.: Gidon Kremer (Violine), Bildregie: Daniel Finkernagel, Alexander L&#252;ck</sub-title>
    <desc lang="de">Gidon Kremer spielt die Partiten f&#252;r Violine solo Nr. 1 h-Moll BWV 1002, Nr. 2 d-Moll BWV 1004 und Nr. 3 E-Dur BWV 1004 von Johann Sebastian Bach (1685-1750). Anfang der siebziger Jahre begann Gidon Kremers Karriere, und seitdem hat er sich den Ruf eines Geigers von internationalem Format und ganz eigenem Stil erworben. Geboren wurde Kremer 1947 in der lettischen Hauptstadt Riga. Als Achtzehnj&#228;hriger spielte er David Oistrach vor, der ihn als einen seiner wenigen Sch&#252;ler annahm und am Moskauer Konservatorium unterrichtete. 1967 errang Gidon Kremer beim Br&#252;sseler Wettbewerb "Reine Elisabeth" seinen ersten internationalen Preis. 1970 erreichte er mit dem Ersten Preis des Moskauer Tschaikowsky-Wettbewerbs den vorl&#228;ufigen H&#246;hepunkt seiner noch jungen Laufbahn. Bald darauf z&#228;hlte Kremer zu den gefragtesten Geigern der Welt. Nachdem Gidon Kremer 1981 die Sowjetunion f&#252;r immer verlassen hatte, gr&#252;ndete er das Festival in Lockenhaus, bei dem sich allj&#228;hrlich Musiker aus aller Welt treffen. Hier entstand diese Aufnahme.</desc>
  </programme>
  <programme channel="Classica.de" start="20160115023000 +0100" stop="20160115030000 +0100">
    <title lang="de">Die sch&#246;nsten Opern aller Zeiten - Fidelio</title>
    <sub-title lang="de">Bildregie: J&#252;rgen Schindler</sub-title>
    <desc lang="de">Diese Folge der zehnteiligen Serie &#252;ber gro&#223;e Publikumslieblinge stellt die Oper "Fidelio" vor. Musik: Ludwig van Beethoven - Libretto: Joseph Ferdinand von Sonnleithner, Stephan von Breuning und Georg Friedrich Treitschke. Interviews mit namhaften S&#228;ngern, Dirigenten und Regisseuren, Ausschnitte aus bedeutenden Auff&#252;hrungen und nachgespielte Szenen geben einen informativen Einblick in das B&#252;hnenwerk, seine Entstehung und seine Handlung.</desc>
  </programme>
</programme>
<channel id="eXplora.it">
    <display-name lang="it">eXplora HD</display-name>
  </channel>
  <channel id="DMAX.it">
    <display-name lang="it">DMAX</display-name>
  </channel>
<programme channel="VH1Classic.de" start="20160121220000 +0100" stop="20160121230000 +0100">
<title lang="se">Classic power ballads</title>
<desc>91491315</desc>
</programme>
<programme channel="VH1Classic.de" start="20160121230000 +0100" stop="20160122000000 +0100">
<title lang="se">The rock show</title>
<desc>91415995</desc>
</programme>


Means....it just added it to the xml, but it has to be all channels first then program

now it is channel program from a.xml and then channel program from b.xml just putted behind.

Jan 16, 2016 9:54 AM in response to HumanK

Just out of purism: XML is not supposed to be order-dependent. All semantic (meaning) elements should be part of the data contents; imposing them on the syntactic (structural) form is bad practice. Add a new ordering index key to the data, or sort them alphabetically or by date when the data is being used, but don't rely on the document structure for meaningful information.

Jan 16, 2016 12:51 PM in response to VikingOSX

VikingOSX wrote:


[I]s the output of the Python script for the OP correct now, or would it benefit from code change?

Proper format is a function of how the data is processed by whatever application is using it, so it's hard to say without knowing anything about that. From the initial example it looks as though the end-product wants a collection of channels and programs, and since each program has a channel key, just glomming them together ought to work right, so what you've done should be fine. But there are a lot of unknowns. Best thing would be for the OP to find a premade XML that contains multiple programs and channels, and see what its structure looks like (though again, I have no information about how these XML files are made, so that may not be feasible).

Jan 17, 2016 8:06 AM in response to HumanK

Yes. Updated code was tested with same data as previous on OS X 10.11.2 using Python 2.7.10 (default). Invocation steps in Terminal same as previous post.


#!/usr/bin/env python

# coding: utf-8


import sys

from xml.etree import ElementTree as ET



def run(files):

first = None

for filename in files:

data = ET.parse(filename).getroot()

if first is None:

first = data

else:

first.extend(data)


ET.ElementTree(first).write(sys.stdout, encoding="utf-8",

xml_declaration=True)


if __name__ == "__main__":

run(sys.argv[1:])


Header is now written automatically after xml file merge:


<?xml version='1.0' encoding='utf-8'?>

<tv generator-info-name="Rytec" generator-info-url="http://forums.openpli.org">

<channel id="Classica.de">

<display-name lang="de">CLASSICA</display-name>

</channel>


Code structure:

User uploaded file

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Merge XML Files as Automatic Script

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.