Want to highlight a helpful answer? Upvote!

Did someone help you, or did an answer or User Tip resolve your issue? Upvote by selecting the upvote arrow. Your feedback helps others! Learn more about when to upvote >

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

How to prevent Automator's "run shell script" to create fully decomposed forms of my strings ?

I am using Automator's "run shell script" and I am seeing that it outputs fully decomposed forms of my strings.


For example, when I set the action to "echo été" in a service (with "Replace selected text" activated) and run that into a Textwrangler window, I'll get fully decomposed forms that Textwrangler won't understand. But when I simply type that command into Terminal, I get my string in composed form.


The problem is not the display issue, but the fact that if I want to run grep for example in "run shell script", I will not be able to find the proper strings since the forms are different.

iMac, Mac OS X (10.6.7)

Posted on Jun 14, 2011 8:39 PM

Reply
24 replies

Jun 14, 2011 9:33 PM in response to Jean-Christophe Helary

I'm not seeing the problem. I did this:

  • created new automator service.
  • set it to Service receives text in any application (with Replaces selected text checked).
  • added a Run Shell Script action.
  • added the text echo été in the action.
  • saved the service as Test.
  • opened TextWrangler, typed in some garbage text, selected it and ran the service

output in TW was été, not the decomposed form. Is your version of TW up-to-date? Mine is version 3.5.3. is the text encoding set to UTF-8? (check TW's preferences, Text Encoding panel for the default).

Jun 14, 2011 10:16 PM in response to Jean-Christophe Helary

Thanks twtwtw for trying.


I use TW 3.5.3 and Unicode (UTF-8) as default for new documents, and UTF-8 for unix scripts I/O.


How have you checked it was not the decomposed form ? Here, besides for the display that was obviously wrong, I used the UnicodeChecker.app to make sure I was interpreting things correctly.


Another way to reproduce the issue is to use :


echo été | pbcopy


in the action, and after running the sevice, do a "paste" (Cmd+V) in a TW window.

Jun 14, 2011 10:33 PM in response to Jean-Christophe Helary

well, yeah, if I toss in the pbcopy bit it decomposes. That actually doesn't surprise me: in man pbcopy it says -


* Encoding:


pbcopy and pbpaste use locale environment variables to determine the

encoding to be used for input and output. For example, absent other

locale settings, setting the environment variable LANG=en_US.UTF-8 will

cause pbcopy and pbpaste to use UTF-8 for input and output. If an

encoding cannot be determined from the locale, the standard C encoding

will be used. Use of UTF-8 is recommended. Note that by default the

Terminal application uses the UTF-8 encoding and automatically sets the

appropriate locale environment variable.


standard C encoding is plain text, and would naturally misread unicode.


Otherwise, unicode checker says it's unicode, the text looks correct in TextWrangler the characters are treated as single characters

Jun 14, 2011 10:48 PM in response to Jean-Christophe Helary

Ok. However, the original problem still appears with my TW/Terminal/Automator settings with or without the pbcopy addition.


And generally speaking, when I use Automator's run shell script to process the selection in a script by using the &@ variable I do not get satisfying results because the forms are decomposed.


Could that be related to my locale ? Should I add a locale setting to the shell command in the action so that it does not decomposes the forms ?

Jun 14, 2011 10:56 PM in response to Jean-Christophe Helary

Ok, the whole &@ thing is new information - where does that come into the picture? and are you passing text into the action from stdin or as arguments? (pull down menu on the left.


did you try following my steps above, exactly, to make a new service, and seeing if the problem recurs? it may be that you and I did something subtly different making the service that's goofing things up for you but not for me.


maybe it would be best if you gave a full example of what you're trying to do.

Jun 15, 2011 1:07 AM in response to Jean-Christophe Helary

Originally I was using $@ to parse a string and get the result pasted by the service. That was a while ago. There, I noticed that some Japanese characters were messed up. Basically all the kana characters that come with voicing markers like が-ga (instead of か-ka) etc. I did not have the time to pursue that issue though.


Then, last night, I found that a colleague of mine had tried to use $@ to feed to a local dictionary application called ding (http://ftp.tu-chemnitz.de/pub/Local/urz/ding/). His problem was with characters that had umlauts. After verifying how he wrote his action I remembered that I had similar issues with Japanese.


Basically his command was "/path/to/ding $@"

That's supposed to use the selected string as an argument to pass to ding, which will launch a Wish application where the string is used as the searched item.


From Terminal, that works a treat. But the exact same line in Automator (with input as argument, not as stdin) messed the composition and the resulting string was not recognized by ding as a match to what it was supposed to match.


So, I tried a few things to get to the core of the issue and found that a simple "echo [accented characters]" was enough to reproduce the difference in string handling between Automator and Terminal. That difference is also reproduced on a number of person's machines.


I have a number of services that basically revolve on "run shell script" actions and involve 3rd party application outputs, preference files etc. so it would not be convenient to show that to you.


I have sent a mail about this issue to the automator list yesterday too:

http://lists.apple.com/archives/Automator-users/2011/Jun/msg00004.html

Jun 15, 2011 8:00 AM in response to Jean-Christophe Helary

Alright, I guess I need to give the spiel:


Diagnosing computer problems needs a scientific approach - few things on a compter are actually random, so to discover why things are working incorrectly, one needs to isolate factors. I mean, this is what I see:

  • You have an automator service that always produces a particular mistake.
  • I have an automator service that supposedly is identical, but never produces that mistake

When we discover what difference between your service/computer and my service/computer - and it might be a very tiny, seemingly insignificant and irrelevant difference - we will most likely be able to fix the problem entirely.


We're both using 10.6.7, and it's doubtful that you've done anything that would modify automator or unix in a significant way. So, what I'm asking is for you to carefully and attentively redo the bullet points I did in first point, so that we can compare results and procedures - I'm looking for some tiny difference between what I did and what you did which explains the differing outcome.


getting second and third opinions is fine, on the off-chance someone has diagnosed this problem previously. But if not, the only way this will be solved is if we sit down together and walk through the steps of what we each did in painful detail to find what we did differently.

Jun 15, 2011 5:35 PM in response to Jean-Christophe Helary

I could send you a movie to prove you that I followed exactly the same steps and still obtained a fully decomposed form.


I'm thinking that my locale could make a difference:

$ echo $LANG

fr_FR.UTF-8


I could set it to what is yours to see if there is a difference. After all, the shell must take a locale value from somewhere ?


By the way, instead of running "echo été" in the action, I also tried "echo $LANG" and the action did not paste anything back into the TW window.

Jun 15, 2011 6:05 PM in response to Jean-Christophe Helary

well, in terminal echo $LANG results in en_US.us-ascii - that's probably the setting in my local bash shell, which may not be at all the generic setting for other things (that would probably be set in .bashrc or some such, and I don't think those files are run by default in AppleScript or Automator).


from a Run Shell Script action I also get a blank result.


Just to be sure, your Run Shell Script action is using /bin/bash and and input is passed to stdin? the latter shouldn't matter for this, but it's worth checking to be sure.

Jun 15, 2011 6:38 PM in response to Jean-Christophe Helary

I used the default action settings so, yes: /bin/bash and "to stdin".


Regarding the locale, what I noticed (I discussed this on the Applescript list a while ago) is that $LANG is taken from System Preferences > Language and Text > Formats > Region and has nothing to do with any shell setting file.


I've just changed it to United States (Computer) and after restarting Terminal I get :


$ echo $LANG

en_US.UTF-8


I don't get a different result in TW with that setting though.


Also, the discussion that I started 2 days ago on the Automator list about that issue kind of reached a conclusion (although not 100% satisfying): if I use unicode escape sequences in Automator, I'll get the expected result. I'm guessing that iconv would help me there.


If I could find how you managed to get your locale to be .ascii and not .UTF-8, then I could see if that's the problem...

Jun 15, 2011 7:10 PM in response to Jean-Christophe Helary

fascinating. I can't get it to be UTF-8. that's a real head-scratcher; it ought to be set in Language and Text, but I can't see any option for it, and can't find a link to it on the web.


I have a dim memory of specifying ascii somewhere at one point (because I do a lot of scripting, and I didn't want unicode hassles in Terminal) but for the life of me I can't remember where. I'll have to think on that.


just for sureness, do you get the same effect in TextEdit (in both plain text and rich text modes?) I just want to be sure we're looking at a system issue and not a TW issue.

Jun 15, 2011 7:27 PM in response to Jean-Christophe Helary

In fact, TextEdit _displays_ the characters properly but when I copy-paste one of the "é" into Unicodechecker, it gives me "COMBINING ACUTE ACCENT" only and not "LATIN SMALL LETTER E WITH ACUTE", although the character that I have in the action is "LATIN SMALL LETTER E WITH ACUTE".


Since my issue is not so much how the characters displays but the reason why it is decomposed (and how to prevent that), I'd appreciate if you could check the value of the character you see in TW.


Sorry for the hassle !

Jun 15, 2011 8:21 PM in response to Jean-Christophe Helary

that's the problem that's bugging you? I wish you'd clarified that earlier. yes, I get what you're calling the 'COMBINING' letter in TW and TextEdit, though they both look perfectly appropriate. I suspect that the reason the COMBINING one is being used is that the actual figure is part of the Latin character supplement, with the combining form listed as an equivalent, and the system designers probably thought the combining form would port better between different contexts.


at any rate, you probably can resolve this by using unix' iconv utility to convert the text to an appropriate encoding, so that the grep search doesn't fail. either that, or run your search criteria through unix as well: whichever way gets the two texts into the same encoding.

Jun 15, 2011 9:37 PM in response to Jean-Christophe Helary

I think my problem was stated quite clearly from the start 🙂


"The problem is not the display issue, but the fact that if I want to run grep for example in "run shell script", I will not be able to find the proper strings since the forms are different."


The discrepancy between the Terminal and "Run shell script" makes it impossible to use international characters without using convoluted conversions that _are_ the subject of this question:


"How to prevent Automator's "run shell script" to create fully decomposed forms of my strings?"


Now that we agree that we are seeing the same thing even though your environment makes TW behave better than mine, I'd love to investigate what would be required to keep Automator from doing that.


Honestly, I have checked iconv's man page and I have no idea who to make it deal with composed forms...

Jun 15, 2011 9:46 PM in response to Jean-Christophe Helary

probably you were clear, I just didn't get it properly for some reason.


you might check out the answer on this page. for the purposes of grep, it's not so much that you need to convert them to a particular encoding; you just need to make sure that they are in the same encoding. that makes the task a little easier.


If you want to do complex or frequent regexping with unicode, then I suggest you switch to AppleScript and get the Satimage osax. satimage is well-designed, and may avoid the problems you're having (which may come from the fact that your process is switching contexts so much - file to automator to shell to clipboard to…). If you can do it all in AppleScript you may sidestep all of that.

How to prevent Automator's "run shell script" to create fully decomposed forms of my strings ?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.