Looking for a glossary split up script

Hello,


I'm a novice to AppleScript and I'm looking for a script that will be useful for me as a translator: a script to split up a glossary with source-side alternatives, separated by semicolons. As a matter of fact I'm not sure whether this is something that can easily be done with AppleScript or better should be handled with Perl or another solution.


This is how the glossary looks (TAB stands for the tabulator character):


Ethanol;Ethylalkohol;Äthanol;Äthylalkohol;Weingeist;Spiritus;Sprit;Alkohol;Hydro xyethanTABEthanol;ethyl alcohol;pure alcohol;beverage alcohol;drinking alcohol;CH3CH2OH;C2H5OH;C2H6O

Quarz;Tiefquarz;α-QuarzTABSiO2;quartz;citrine;rose quartz;smoky quartz;milk quartz;milky quartz


And this is how this UTF-8 text file should be rewritten:


Ethanol TAB Ethanol;ethyl alcohol;pure alcohol;beverage alcohol;drinking alcohol;CH3CH2OH;C2H5OH;C2H6O

Ethylalkohol TAB Ethanol;ethyl alcohol;pure alcohol;beverage alcohol;drinking alcohol;CH3CH2OH;C2H5OH;C2H6O

etc.


Quarz TAB SiO2;quartz;citrine;rose quartz;smoky quartz;milk quartz;milky quartz

Tiefquarz TAB SiO2;quartz;citrine;rose quartz;smoky quartz;milk quartz;milky quartz

α-Quarz TAB SiO2;quartz;citrine;rose quartz;smoky quartz;milk quartz;milky quartz


Thank you for any suggestions!


Hans

iMac, Mac OS X (10.6.2)

Posted on Jun 1, 2014 6:08 AM

Reply
6 replies

Jun 2, 2014 12:13 AM in response to Mark Jalbert

Hello Mark,


The script works beautifully. I now have:


α-Quarz TAB SiO2;quartz;citrine;rose quartz;smoky quartz;milk quartz;milky quartz


If I want to split up the right side of the tab too, I could of course use a spreadsheet program and swap the columns. But that is a little clumsy, especially with large glossaries (500,000+ lines).


Would it be possible that you provide a script for 'the right side' too?


So that I'd get:


α-Quarz TAB SiO2

α-Quarz TAB quartz

α-Quarz TAB citrine

α-Quarz TAB rose

etc.


Many thanks in advance!


Hans

Jun 2, 2014 8:28 PM in response to Hiroto

Last attempt to post.


perl -CSDA -w <<'EOF' - in.txt > out.txt
use strict;
while (<>) {
    chomp;
    my ($x, $y) = split "\t";
    my @xx = split ';', $x;
    my @yy = split ';', $y;
    for $x (@xx) {
        for $y (@yy) {
            printf "%s\t%s\n", $x, $y;
        }
    }    
}
EOF


- The in.txt can be either the original text or the "half-expanded" text by the previous awk script.


- Caution: if the original text contains 500K lines, this script will yield huge text file of 25M ~ 50M or more lines.


Regards,

H


PS. Hmm. This has passed. There seems to be some content filtering to determine whether to accept or deny the post request. As far as I can tell, U+0023 NUMBER SIGN, such as in shebang, lets the post be denied.


/H


Message was edited by: Hiroto (added PS)

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Looking for a glossary split up script

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.