Reading XML data from .pages file

Hi All!


I've been trying to get a script up and running, but frankly I'm lost. The script should read the date, lesson no, and name fields from a .pages file and then use that information to send an e-mail.


What I'm having a problem with is reading the data from the .pages file and I don't know what the best way to do it is because I'm still a novice at this.


So there are two ways I can think of:


1. Read directly from the Pages document when it is open. The problem with this is that it's a Pages template and it contains tables. Perhaps the easiest thing would be just to select certain cells directly from pages, but I have no idea how they're labeled or how to access them so I came up with a second idea. If you happen to know how to do this, I would greatly appreciate it.


2. Pages documents can be unzipped and one of the files is an xml file with a lot of tags and the data I need. I found the following post that talks about how to parse XML via an XSL stylesheet. After going through some tutorials, I got stuck because the xml itself in the Pages file is just a mess to me and I'm confused about which node to choose. In other xml docs you can see a clear, nicely laid out tabbed structure so you can at least figure out the path to the node you want. I stumbled across a nice script for TextWrangler which cleans up XML, but it just barks at the Pages file.


3. On top of that, I will need to use the name field and match that with an Address Book e-mail address. Should I change the template and make it into an address book field?


Thanks in advance,


Paul

Posted on Mar 23, 2012 12:19 PM

Reply
17 replies

Mar 23, 2012 3:24 PM in response to pmrozik

Okay I figured out that I could use something like this:


set myText to object text of text box 1 of front document


I couldn't find a way to get to the table cells. The data I want is not in text boxes and as far as XML goes I read somewhere that Apple hasn't released the Pages specs so perhaps it's time to give that up.


Right now I'm thinking that I should just add text boxes to those cells and it'll do for what I need. Any other ideas?

Mar 25, 2012 11:02 PM in response to Pierre L.

Thanks for your reply. I think I'd come across that page before but, and I quote


"This sample script is able to apply the “Strong Emphasis” style to any selected text in a Pages ’09 document, including selected text in a table cell. "


I'm pretty much done with the script, so I didn't look into the code on that page too deeply, but I'd also like the script to be able to get text that's not selected so this may not be what I'm looking for.


I could probably tweak the current script a bit to make it faster, but so far it does what I expect it to, which is.


1. Exports Pages file to PDF

2. Reads the first and last names of the student, as well as the date, from the filename where the filename format is Student-Name-ddmmyyyy.

3. Gets the lesson no. via an input box

4. Extracts e-mail address from address book based on name

5. Makes a new e-mail message with the attachment, and adds the date and lesson number info in the subject field.

6. Deletes the PDF file



I wish it could just read the lesson no. from the file, but for now it'll do. The script takes ~8 seconds to execute, while it took me at least a minute to do the same.


Thanks again.

Mar 26, 2012 11:13 AM in response to pmrozik

Perhaps you could work out some hack. Ignore the fact that the file is xml.


Open the file in Textwrangler. I mean display the file in unparsed. Identify some unique string like:

lesson number 1. Do a string search for lesson number & pull out the one.


This will be a bit flacky, but might work.


If you post an example .pages file on the Internet somewhere, I'll take a look at it.


Robert

Mar 27, 2012 3:04 PM in response to rccharles

Hi Robert,


I did exactly as you said and opened up the file in TextWrangler and did a search. The data I want is still inside tags. I'd probably have to do some parsing and pull it out like you said.


I'm planning on moving to PHP sometime in the near future and having these reviews in XML form. I'd like to use the words and stick them in a database, but this time I want everything to be kept in XML format and the review itself to be ultimately html/css as I'm starting to see the limits of closed standards.


As I'd mentioned earlier, at the moment I have script working and it pretty much does what I need it to. There's a bit of redundancy I suppose because I have the date, student's name, and lesson number in the review file itself and the name of the file also contains the student's name and the date. I'll have to work on eliminating that.


As far as the template and script I have so far goes, I'm thinking about adding additional functions, such as:


1. Generating end-of-month reports to be sent via e-mail about the number of lessons completed for the month based on iCal or lesson review data

2. Perhaps adding a checkbox for "Homework completed" and then using it in the report as such, "This month, you completed 80% of all homework assignments."

3. Having lesson number/contract info in Excel and then sending reminders when the contract is almost up



If you have time, here's the .pages file zipped and I'm all ears should you have any suggestions and/or comments regarding what it looks like or how it can be improved and streamlined.


Thanks in advance.

Mar 27, 2012 4:26 PM in response to pmrozik

The script should read the date, lesson no, and name fields from a .pages file and then use that information to send an e-mail.


Maybe you might want to try the following script:


tell application "Pages"

activate

tell foreground layer of page 1 of front document

select (text box 1 whose vertical position < 0.5)

tell application "System Events"

keystroke "c" using {command down} -- C

keystroke "a" using {shift down, command down} -- ⇧⌘A

end tell

set theText to the clipboard

end tell

end tell


set TID to AppleScript's text item delimiters

set AppleScript's text item delimiters to tab

set theTextFields to text items of theText

set AppleScript's text item delimiters to TID


set {theDate, theLessonNo, theStudent} to {item 2, item 4, item 6} of theTextFields

Mar 28, 2012 10:39 AM in response to Pierre L.

Thanks Pierre. Unfortunately, text box 1 doesn't exist at vertical position < 0.5. There is no text box, there are just table cells and I think that's the main problem.


The file has 14 text boxes total and unfortunately the top is a table with cells and it seems like it's impossible to read from them. What I will try to do though is change them into text boxes in the template. That should definitely work.


Okay, nevermind. The table is inside the text box, and I know which number it is. Let me try to fiddle around with the script a bit more.

Mar 29, 2012 9:10 AM in response to pmrozik

I think that Pierre L. has the right idea of using AppleScript to read the data.


I played around with PHP to attempt to analyze the data. I'm learning XML. I didn't recognize the style of xml used by pages.


I'll have to study more up to see if the : has any special meaning.


<?xml version="1.0" ?>
<sl:document xmlns:sfa="http://developer.apple.com/namespaces/sfa" 
xmlns:sf="http://developer.apple.com/namespaces/sf" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns:sl="http://developer.apple.com/namespaces/sl" sl:version="72007061400"
 sfa:ID="SLPublicationModel-0"
 sl:generator="slingshot" sl:app_build_date="Jan 22 2008, 01:09:42">
 <sl:version-history>
 <sl:number sfa:number="2004042200" sfa:type="i"/>
 <sl:number sfa:number="2004060800" sfa:type="i"/>
 <sl:number sfa:number="2004061600" sfa:type="i"/>
 <sl:number sfa:number="2004062200"
 sfa:type="i"/><sl:number sfa:number="2004062900"
 sfa:type="i"/><sl:number sfa:number="2004072200"
 sfa:type="i"/><sl:number sfa:number="2004091600"
 sfa:type="i"/><sl:number sfa:number="2004093000"
 sfa:type="i"/><sl:number sfa:number="2005091000"
 sfa:type="i"/><sl:number sfa:number="2005091200"
 sfa:type="i"/><sl:number sfa:number="2005140600"
 sfa:type="i"/><sl:number sfa:number="72006110200"
 sfa:type="q"/><sl:number sfa:number="72006110901"
 sfa:type="q"/><sl:number sfa:number="72006111601"
 sfa:type="q"/><sl:number sfa:number="72007010801"
 sfa:type="q"/><sl:number sfa:number="72007012700"
 sfa:type="q"/><sl:number sfa:number="72007061400"
 sfa:type="q"/></sl:version-history>
<sl:publication-info>
<sl:SFWPCTShowDeletedTextProperty>
<sl:number sfa:number="1"
 sfa:type="c"/></sl:SFWPCTShowDeletedTextProperty>
<sl:SLCreationLocaleProperty>
<sl:string sfa:string="pl_PL"/>
</sl:SLCreationLocaleProperty>
<sl:decimalTab>
<sl:string sfa:string="."/>
</sl:decimalTab>
<sl:kSFWPHyperlinksEnabledProperty>
.. massively clipped ...


I did look at PHP and I was able to understand it's xml extraction language. The PHP print statements stripped out the xml and left just the text.


I'll need to find out more about xml the next time I am at a llibrary.

I'll need to find a tidy program so to make the xml more readable.


I'll leave this be unless the applescript solution doesn't pan out.


I ran this php program from inside apache. I haven't figured out how to run php from the command line.


<html>
<head><title>Print XML</title></head>
<body>
<?php

// Implement multiple levels of debugging.
define('RUNsGREAT',0);   // Normal level for production
define('MINIMUM',1);
define('MEDIUM',2);
define('MAXIMUM',3);
// Current level of debugging.
define('DEBUG',MAXIMUM);

define('BR',"<br>");

// Display one line of debug information.
function debug($displayLine)
{
  if ( DEBUG >= MINIMUM )
  {
   echo $displayLine . BR;
  }
}

function formatCommandOutput($myArray)
{
  if ( DEBUG >= MEDIUM )
  {
    echo "\n" . BR . "in formatCommandOutput" .BR;
    //var_dump($myArray);
    //echo BR;
  }
  
  echo BR . "<tt>";
  // Display everything that was returned 
  foreach ($myArray as $string)
  {
    echo BR . "      " .  htmlentities( $string );
  }
  echo BR . "</tt>";
  
  return;
}

debug("Starting in " . __File__ );

// Invoke a Unix pwd command
$output = array();
$result = exec("pwd ", $output, $rc );
debug("rc= ".$rc);
formatCommandOutput($output);


$fileList = array();
$result = 
  exec("cat  ".escapeshellarg("/Users/mac/Sites/createDoc/apparent.xml"), $fileList, $rc );
debug("rc= ".$rc);
//formatCommandOutput($fileList);

$oneList = implode("\n",$fileList);

echo "------------------------------------" . BR . BR ;

$xml = simplexml_load_string($oneList);

//$xml->asXML("bigDoc.xml");

print $xml->asXML();



?>
</body>
</html>

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Reading XML data from .pages file

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.