Apple Event: May 7th at 7 am PT

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Extract URL from Mail message

Hello Applescript-Meisters,

I have the following problem:
Every week we receive an email which contains a secure URL ( https://) with always different username and password to download a ".tar" file.

Since I'm new to Applescript I tried to extract the paragraph of the email message with the line which contains the URL, but doesn't seem to work:

-------------------------------------------------------------------------------
tell application "Mail"

set theMessages to message 1 of mailbox "EXAMPLE" of account "AccountTest" whose subject begins with "Data request"
set theContent to content of theMessages

set someData to paragraphs of theContent
repeat with someData in theContent
if someData begins with "https://" then
set theURL to paragraph of someData
else
set theURL to paragraph 18 of theContent -- this works but this line can change weekly
end if
end repeat

end tell
--------------------------------------------------------------------------------

Any help or suggestions would be greatly appreciated.

Cheers,

gilles

iMac 24, Mac OS X (10.5.5)

Posted on Nov 27, 2008 12:11 PM

Reply
7 replies

Nov 27, 2008 1:08 PM in response to gilcelli

Paragraphs are delimited by returns, so I am guessing that the URL is on a line all by itself, with no other text. You are setting the someData variable to the paragraphs, but then later use it as a variable in a repeat statement, where it becomes an item of theContent (which is a text string). You can just step through the paragraphs with the repeat statement:

<pre style="
font-family: Monaco, 'Courier New', Courier, monospace;
font-size: 10px;
margin: 0px;
padding: 5px;
border: 1px solid #000000;
width: 720px;
color: #000000;
background-color: #F4A460;
overflow: auto;"
title="this text can be pasted into the Script Editor">
set theURL to ""
repeat with someData in (get paragraphs of theContent)
if someData begins with "https://" then set theURL to someData
end repeat
if theURL is "" then
-- not found
end if
</pre>

Message was edited by: red_menace

Nov 27, 2008 1:15 PM in response to gilcelli

The most obvious problem is how you're trying to iterate through the paragraphs.

You start off with:

set theContent to content of theMessages


which is fine - theContent now contains the text of the email.

You then extract the paragraphs of that text:

set someData to paragraphs of theContent


But you then reset 'someData' to be the iterator in a loop:

repeat with someData in theContent


Additionally, you're iterating through theContent, which is a single block of text.

I think what you mean to do is iterate through the paragraphs of theContent, like:

repeat with someData in (get paragraphs of theContent)


Once you have this, you know someData is the paragraph you're currently looking at so you can:

set theURL to someData


Finally, you do not want an 'else ... set theURL' statement because that will reset theURL every time you find a line that doesn't begin with "https://" (e.g. if paragraph 10 starts with https:// you set theURL to that paragraph, but as soon as you move onto line 11 you reset theURL because the paragraph doesn't begin with https://
To address this you should just iterate through the paragraphs and only set theURL to paragraph 18 if you didn't find any matching lines.

Putting it all together:

set theURL to "" -- default value for theURL
theContent to content of theMessage
repeat with someData in (get paragraphs of theContent)
if someData begins with "http://" then
set theURL to someData
end if
end repeat
if theURL = "" then
try
set theURL to paragraph 18 of theContent
end try
end if


Note that I've included the 'paragraph 18' line in a try block. That's so that the script doesn't fail if there are less than 18 paragraphs in the message (you can't get paragraph 18 of a 17-paragraph document).

Nov 27, 2008 1:21 PM in response to red_menace

thanks for the quick reply, but if I use your code, the variable 'theURL' takes the whole content of the message.
If this could help, here is an example of the email we receive:

-------------------------------------------------------------------------------- ----------------
Dear User,

the products you have ordered are available on a tar file named
ABC.YYYY DOY_DD_00010.tar, on line on the server "example.com", in the home directory of the account (case sensitive):

In order to access the product by HTTP, you can use the following URL (note the change of username & password):

https://abcde:123456@example.com/pub/

-------------------------------------------------------------------------------- ----------------

thanx

Nov 27, 2008 2:27 PM in response to gilcelli

Something must be delimiting the line endings - is the content in HTML? If the content is indeed HTML, the script could look for a link tag (if it is a link), otherwise could break the message up using the br tags. If you could post the raw source of some of the content it would help to see what the format is, so that the end of the link could be determined.

Message was edited by: red_menace

Nov 28, 2008 2:29 AM in response to gilcelli

Hello

Since 'paragraph 18 of theContent' can return a URL, I'd suspect what you're getting by theURL in the red_menace's script is something like -

item 18 of every paragraph of "..."

instead of -

"https://..."



If that is the case, you're very close to your goal.
Just try something like the following script.
It will return the first paragraph which begins with "https://".

--SCRIPT
set theURL to ""
repeat with p in (get paragraphs of theContent)
set p to p's contents
if p begins with "https://" then
set theURL to p
exit repeat
end if
end repeat
return theURL
--END OF SCRIPT

* The loop variable in this form of repeat statement is a reference to an item in the given list, not an item itself. You may dereference the reference by getting the 'contents' of the reference.

Hope this may help,
H

Nov 28, 2008 3:38 AM in response to Hiroto

Thanks Hiroto,

Your script solved my problem!!!
Just for correction: To get the URL from paragraph 18 was only an example that I forgot to delete from the code.

So this morning I've found another solution with Camelot's & red_menace's scripts:

I just need to pick up "paragraph 1" of the whole content to get the URL, like this:

-- SCRIPT
repeat with someData in (get paragraphs of theContent)
if someData begins with "https://" then
set theURL to paragraph 1 of someData
end if
end repeat
-- END SCRIPT

Thanks again all AppleScript-Meisters for your time and precious help.

Cheers,

gilles

Message was edited by: GilCel

Extract URL from Mail message

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.