Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Wildcards within AppleScript

I'm generating HTML and CSS automatically from Scrivener, a Mac-only author's writing program (a very good one).

I've got a lot of documents to process and I want to tweak the CSS beyond what Scrivener automatically generates, and at the same time make all documents adhere to a single standard. In order to make this happen I've created an AppleScript to work to run through the HTML text to do a stream of search and replace functions. So far, this works very well.

However, there are some segments where the string between one predictable delimiter and another is unpredictable. What's called for is a wildcard to select all the text between the delimiters and replace it with something else.

Someone elsewhere has said there's no syntax for wildcards within AppleScript. Not sure of the truth of that. What's the procedure to achieve my objective?

Much appreciated.

G5 iMac 1.8Ghz, Mac OS X (10.4.11)

Posted on Jan 31, 2008 12:08 PM

Reply
11 replies

Jan 31, 2008 2:08 PM in response to Morley Chalmers

Sounds like what you're looking for is "regular expressions" (RegEx) search capability. This osax is free and has a very good implementation of it.

<http://www.satimage.fr/software/downloads/Satimage320.dmg>

The link should d/l the dmg file for you if you haven't got it and if you don't, you should. 🙂

You could also use the UNIX shell or PERL but the learning curve will be steeper.

Feb 1, 2008 8:44 AM in response to AvionicsTech

I've been told by others as well the regular expressions learning curve is steep. Have yet to find an effective tutorial. As far as I'm aware, what I'm trying to accomplish is fairly simple and straightforward -- select all the text between this delimiter and that one. I have no aspirations to master all the permutations of RegEx. Or am I misjudging the situation?

Know of an appropriate tutorial?

Much appreciated.

Feb 1, 2008 10:31 AM in response to Morley Chalmers

The Regular expressions learning curve is indeed steep, but it becomes a useful skill for anyone doing UNIX admin. I learned regular expressions by reading a terrific O'Reilly book titled "Mastering Regular Expressions". The writing style keeps you engaged and the coverage gets very detailed. Here is the URL.

http://www.oreilly.com/catalog/regex3/index.html

Feb 1, 2008 10:09 PM in response to Morley Chalmers

Hello Morley Chalmers,

In addition to concise regular expressions, you may try something like the following code written in plain AppleScript. It will find text blocks defined by given starting tag (=x) and ending tag (=y) and replace each of them with given string (=z).
(Please copy code from this web page, not from subscribed email text.)

Hope this may help,
H

PS.
Satimage-software's site has brief tutorial on regular expression:
http://www.satimage.fr/software/en/downloads/downloadscompanionosaxen.html
http://www.satimage.fr/software/en/smile/text/index.html

If I'm not mistaken, however, 'find' and 'change' commands in Satimage.osax neither support Unicode text nor 'lazy' matching. You'd need Smile (free regular edition or paid full edition) for 'ufind' and 'uchange' commands, which support Unicode text and 'lazy' matching.



--SCRIPT
set t to "1 (tag1)some text(/tag1) 2 (tag1)other text(/tag1) 3"
--set t to "(tag1)text in outer block and (tag1)text in inner block(/tag1)(/tag1)"
return replaceBlocks({"(tag1)", "(/tag1)"}, "REPLACED", t)

on replaceBlocks({x, y}, z, t)
(*
string x, y : block start tag, block end tag
string z : replacing string for each block "x..y"
string t : source text
return string : replaced string -- [1]

[1] This handler does not support nested blocks.
(Only the inner most block will be replaced if nested.)
*)
script o
property tt : {}
property uu : {}
property rr : {}
property astid : a reference to AppleScript's text item delimiters
try
set astid0 to astid's contents
set astid's contents to {x}
set tt to t's text items
set end of my rr to my tt's item 1
set astid's contents to {y}
repeat with i from 2 to count my tt
set uu to my tt's item i's text items
if (count my uu) = 1 then -- y not found after x in this segment
set end of my rr to x & my tt's item i
else
set end of my rr to z & my uu's rest
end if
end repeat
set astid's contents to astid0
on error errs number errn
set astid's contents to astid0
error "replaceBlocks(): " & errs number errn
end try
return my rr's item 1 & my rr's rest
end script
tell o to run
end replaceBlocks
--END OF SCRIPT

Feb 2, 2008 10:46 AM in response to Morley Chalmers

I posted this in your topic at macosxhints, but I'll post it here as well - it uses offset (as well as being a little easier to read):

<pre style="
font-family: Monaco, 'Courier New', Courier, monospace;
font-size: 10px;
margin: 0px;
padding: 5px;
border: 1px solid #000000;
width: 720px; height: 335px;
color: #000000;
background-color: #FFDDFF;
overflow: auto;"
title="this text can be pasted into the Script Editor">
on run -- example
display dialog quote & (GetSubText of "This is some testing text" from "some" to "text") & quote
end run

to GetSubText of SomeText from StartItem to EndItem
(*
get a substring from SomeText, from StartItem to EndItem
parameters - SomeText [mixed]: the text to get the substring from
StartItem [mixed]: the starting item (or 1 for the beginning)
EndItem [mixed]: the ending item (or -1 for the end)
returns [text]: the substring, or "" if not found
*)
set SomeText to SomeText as text
if StartItem is in {1, "1", ""} then
set Here to 1
else
get offset of StartItem in SomeText
if result is 0 then
return ""
else
set Here to result + (length of StartItem)
end if
end if
if EndItem is in {-1, "-1", ""} then
set There to -1
else
get offset of EndItem in (text Here thru -1 of SomeText)
if result is 0 then
return ""
else
set There to (Here + result) - 2
end if
end if
return (text Here thru There of SomeText)
end GetSubText</pre>

Feb 2, 2008 10:47 AM in response to Hiroto

Hello Hiroto

As expected, I'm too new to AppleScript syntax to quickly put your code into service without a bit of hand holding. Your assistance requested. Two issues:

1. I'm wanting my AppleScript to apply to a BBEdit file. I've built another AppleScript routine to do specific search and replace and it works beautifully. Therefor I copied the following to the top and bottom of your script:

"tell application "BBEdit 6.5.3"
activate"
Your script
"end tell"

Which produces this error: "Expected “end” or “end tell” but found “on”."

2. My two delimiters are "<style" and "</style>". I'm unclear where or how to declare them. My current guess, and it's only a guess, is to replace all instances of "(tag1)" and "(/tag1)" with "(<script)" and "(</script>)", leaving the brackets in place. Probably more efficient to simply declare:
tag1 = <script
and
/tag1 = </script>

Unsure. Prompts appreciated.

Feb 2, 2008 3:07 PM in response to Morley Chalmers

Ah - you are wanting to extract HTML elements. This is a little bit different than wildcards and substrings, and dropping some script snippets into an application tell statement. There are some existing scripts out there for extracting HTML, as well as those that read text files. BBEdit is also scriptable, so I think this wound up being a little more difficult than it needed to be. Anyway, now that I know more about what you are wanting to do, give the following script a try - I didn't know anyone else used a version of BBEdit as old as mine (TextWrangler works too).

The following script takes the text of the front BBEdit window, extracts the HTML elements specified, and creates a new window with the results:
<pre style="
font-family: Monaco, 'Courier New', Courier, monospace;
font-size: 10px;
margin: 0px;
padding: 5px;
border: 1px solid #000000;
width: 720px; height: 335px;
color: #000000;
background-color: #FFDDFF;
overflow: auto;"
title="this text can be pasted into the Script Editor">
on run
tell application "BBEdit 6.5.3"
get the text of the front window
my GetHTMLElements(the result, "<script", "</script>", false)
set text of (make new window) to the result as text
end tell
end run

to GetHTMLElements(SomeText, OpenTag, CloseTag, ContentsOnly)
(*
return a list of the specified HTML elements in SomeText
parameters - SomeText [text]: the text to look at
OpenTag [text]: the opening tag (can be a partial)
CloseTag [text]: the complete closing tag
ContentsOnly [boolean]: return just the contents, or the entire element
returns [list]: a list of the HTML elements found - {""} if none
*)
set TextBuffer to SomeText as text
set CurrentOffset to 1 -- the current offset in the text buffer
set ElementList to {} -- the list of elements found
try
repeat
set Here to offset of OpenTag in (text CurrentOffset thru -1 of TextBuffer) -- start of opening tag
if Here is 0 then exit repeat -- not found
set CurrentOffset to CurrentOffset + Here
set CurrentTag to CurrentOffset - 1 -- mark the start of the element
if OpenTag does not end with ">" then -- find the close of the tag
set Here to offset of ">" in (text CurrentOffset thru -1 of TextBuffer) -- end of opening tag
if Here is 0 then exit repeat -- not found
set CurrentOffset to CurrentOffset + Here
end if
set Here to CurrentOffset

set There to offset of CloseTag in (text CurrentOffset thru -1 of TextBuffer) -- end tag
if There is 0 then exit repeat -- not found
set CurrentOffset to CurrentOffset + There - 2
set There to CurrentOffset

if ContentsOnly then
set the end of ElementList to text Here thru There of TextBuffer & return
else
set the end of ElementList to text CurrentTag thru (There + (length of CloseTag)) of TextBuffer & return
end if

end repeat
on error ErrorMessage number ErrorNumber
if (ErrorNumber is -128) or (ErrorNumber is -1711) then -- nothing (user cancelled)
else
activate me
display alert "Error " & (ErrorNumber as string) message ErrorMessage as warning buttons {"OK"} default button "OK"
end if
end try

if ElementList is {} then set ElementList to {""}
return ElementList
end GetHTMLElements
</pre>

Feb 2, 2008 11:36 PM in response to Morley Chalmers

Hello Morley Chalmers,

I don't have BBEdit 6.5.3 and not sure how the rest of your script is written. Maybe you're using BBEdit's own commands to search and replace text directly in its front window?
Anyway, the code I posted does not work in such way. You have to get the (whole) text of target document first, then process it by the given handler and finally set the (whole) text of target document to the result.

Something like the following code, which will replace the contents of style elements with string ' REPLACED '. If you want to replace both tags and contents, use the line currently commented out. Also if you're replacing 'script' elements, use "<script" and "</script>" instead of "<style" and "</style>".

(I borrowed the part to get and set the text of front window from red_menace's code, for I don't know how to script BBEdit. And again, please copy the code from this web page, not from subscribed email text because I escaped some characters to prevent fora software from intervening.)

Good luck,
Hiroto



--SCRIPT
local t, x, y, z
tell application "BBEdit 6.5.3"
set t to text of window 1
set {x, y, z} to {"<style", "</style>", "<style REPLACED </style>"} -- to replace contents only
--set {x, y, z} to {"<style", "</style>", "REPLACED"} -- to replace both tags and contents
set t to my replaceBlocks({x, y}, z, t)
set text of window 1 to t
end tell

on replaceBlocks({x, y}, z, t)
(*
string x, y : block start tag, block end tag
string z : replacing string for each block "x..y"
string t : source text
return string : replaced string -- [1]

[1] This handler does not support nested blocks.
(Only the inner most block will be replaced if nested.)
*)
script o
property tt : {}
property uu : {}
property rr : {}
property astid : a reference to AppleScript's text item delimiters
try
set astid0 to astid's contents
set astid's contents to {x}
set tt to t's text items
set end of my rr to my tt's item 1
set astid's contents to {y}
repeat with i from 2 to count my tt
set uu to my tt's item i's text items
if (count my uu) = 1 then -- y not found after x in this segment
set end of my rr to x & my tt's item i
else
set end of my rr to z & my uu's rest
end if
end repeat
set astid's contents to astid0
on error errs number errn
set astid's contents to astid0
error "replaceBlocks(): " & errs number errn
end try
return my rr's item 1 & my rr's rest
end script
tell o to run
end replaceBlocks
--END OF SCRIPT



Message was edited by: Hiroto (minor correction to typo and wording)

Feb 3, 2008 4:27 AM in response to Morley Chalmers

Hello Morley Chalmers,

From what I've found about BBEdit's scripting via Google, you may also try something like the following code. It uses regular expression to replace 'style' element in the text of front text window.

It is NOT tested. If it works, you may change the attributes (currently "NEW ATTRS") and contents (currently "NEW CONTENTS") to suit your needs.
(Please copy code from this web page, not from subscribed email text.)

All the best,
Hiroto



--SCRIPT (NOT TESTED)
tell application "BBEdit 6.5.3"
replace "<style[^>] >[^<]</style>" using "<style NEW ATTRS> NEW CONTENTS </style>" searching in text 1 of text window 1 ¬
options {search mode:grep, starting at top:true, case sensitive:false}
end tell
--END OF SCRIPT

Wildcards within AppleScript

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.