scrutinizer82

Q: Pages and Terminal output different results of word count

If pass my .rtf file to wc -w command, it returns me the figure of 6909 words, whereas the words counter of Apple's Pages shows 2617 ones. Why?

Mac OS X (10.7.5), MacBook Pro 15.4 mid-2012

Posted on Apr 29, 2016 3:42 PM

Close

Q: Pages and Terminal output different results of word count

  • All replies
  • Helpful answers

Previous Page 2 of 3 last Next
  • by scrutinizer82,

    scrutinizer82 scrutinizer82 May 4, 2016 3:35 PM in response to VikingOSX
    Level 1 (43 points)
    Mac OS X
    May 4, 2016 3:35 PM in response to VikingOSX

    It doesn't accept input as "arguments", only as "stdin". But the Apple Script here that should trigger dialog pop up simply doesn't work. I guess System Events.app is rather vague to respond (but that's just my guess).

  • by VikingOSX,

    VikingOSX VikingOSX May 4, 2016 3:40 PM in response to scrutinizer82
    Level 7 (20,606 points)
    Mac OS X
    May 4, 2016 3:40 PM in response to scrutinizer82

    With the exception of replacing $f with "${f}" — your Run Shell Script action syntax is identical to mine. However, there may be additional code below your existing content that does not show.

     

    Hover your mouse over the bottom blue border of the the Run Shell Script action. When this turns in a horizontal bar with opposing vertical arrows, click and drag to expand your action downwards to see if there is extra code in there that is bombing the action.

  • by VikingOSX,

    VikingOSX VikingOSX May 4, 2016 3:44 PM in response to scrutinizer82
    Level 7 (20,606 points)
    Mac OS X
    May 4, 2016 3:44 PM in response to scrutinizer82

    You didn't follow my Extract PDF Text action settings exactly. You told it to save output directly to the Desktop, but that should read path and the path variable is edited to use Desktop. The Run Shell Script action will now accept the path variable as arguments.

    Screen Shot 2016-05-04 at 6.42.27 PM.jpg

  • by scrutinizer82,

    scrutinizer82 scrutinizer82 May 4, 2016 4:02 PM in response to VikingOSX
    Level 1 (43 points)
    Mac OS X
    May 4, 2016 4:02 PM in response to VikingOSX

    Ok, so changed the output to Variable Path (set itself to Desktop instantly). Still getting that error (I even changed the action to custom its name to more Unix-friendly - to no avails). My version of Automator is 2.2.4, what's yours? I feel mine's buggy as ****. I checked - no extra code hiding.

     

    How do I run that code from within Terminal - just to check how it would work?

  • by VikingOSX,

    VikingOSX VikingOSX May 4, 2016 4:55 PM in response to scrutinizer82
    Level 7 (20,606 points)
    Mac OS X
    May 4, 2016 4:55 PM in response to scrutinizer82

    Type only the blue text in Terminal. Assumption is that you are in the same directory location as the space punctuated PDF file. The first line is going to wrap here — there is no return between the first and last parenthesis.

    $ words=$(textutil -stdout -convert txt ./"that long space punctuated name.PDF" | wc -w | awk '$1=$1')

    $ osascript -e "display dialog \"Word Count: $words\" as text"

  • by scrutinizer82,

    scrutinizer82 scrutinizer82 May 5, 2016 4:58 AM in response to VikingOSX
    Level 1 (43 points)
    Mac OS X
    May 5, 2016 4:58 AM in response to VikingOSX

    Thank you for your patience trying to help, but I let it go. The action still fails. I still get either "file doesn't exist" or "user interaction not allowed" messages.

     

    Moreover I even couldn't execute these shell lines in Terminal. It will display the same message "file doesn't exist" or produce output that is a mess (in the latter case I omitted -stdout flag) of creepy symbols. I then figured out the problems with the filename and corrected (it contained ./, it was incorrect, so I deleted these), re-added -stdout, but it went nowhere while on conversion displaying rtf text handling data with the converted file's location undetectable.

     

    What a headache!!

  • by VikingOSX,

    VikingOSX VikingOSX May 5, 2016 5:11 AM in response to scrutinizer82
    Level 7 (20,606 points)
    Mac OS X
    May 5, 2016 5:11 AM in response to scrutinizer82

    Well, an oversight on my behalf. Textutil wants the Rich Text document (RTF) that is the result of the PDF to Text action. It does not handle PDF documents as input files. Change that, and the command-line syntax will work as posted. The './' syntax just tells Textutil that the input file is located in the current directory location. I use it in examples because your Bash startup files may not have incorporated the current directory location in their directory search hierarchy.

  • by scrutinizer82,

    scrutinizer82 scrutinizer82 May 5, 2016 3:58 PM in response to VikingOSX
    Level 1 (43 points)
    Mac OS X
    May 5, 2016 3:58 PM in response to VikingOSX

    I'm not familiar with the shell scripting syntax that much. I don't know, for ex, what the combination $f is for and what I have to change or what IT changes or stores. However, simple logic tells me that since the result of Extract text from PDF is to be passed to that Shell Script all it has to do is to count words and pop a window and to accomplish that the input should be the text (not necessarily rtf, it can be txt as well).

     

    I also would like to be able just drag-n-drop pdf file on top of my-made App's icon to trigger the workflow.

  • by scrutinizer82,

    scrutinizer82 scrutinizer82 May 6, 2016 3:18 PM in response to VikingOSX
    Level 1 (43 points)
    Mac OS X
    May 6, 2016 3:18 PM in response to VikingOSX

    Hi again, VikingOSX. Return here just to report on some progress I made during the last few days of investigating the situation with these errors. Your code had errors in the osascript part. Here's how it should've look

     

    for f in "$@"
    do
    words=$(textutil -stdout -convert txt  | wc -w | awk '$1=$1')
    osascript -e 'tell application "Finder"' -e 'activate' -e 'display dialog "Word Count: $words" as text
    end tell'
    done
    

      I now get that Finder window popping up but with "Word Count: $words" line instead of "Word count: [number]".

     

    Script.png

  • by VikingOSX,

    VikingOSX VikingOSX May 6, 2016 5:27 PM in response to scrutinizer82
    Level 7 (20,606 points)
    Mac OS X
    May 6, 2016 5:27 PM in response to scrutinizer82

    The following lines in the Run Shell Script that is set to pass input: as arguments will produce a dialog with a word count value in it. There is no need to use the Finder tell block.

     

    for f in "$@"

    do

      words=$(textutil -stdout -convert txt "${f}" | wc -w | awk '$1=$1')

      osascript -e "display dialog \"Word Count:  $words\" as text & return & current date"

    done

    Screen Shot 2016-05-06 at 8.27.09 PM.jpg

  • by scrutinizer82,

    scrutinizer82 scrutinizer82 May 7, 2016 4:29 AM in response to VikingOSX
    Level 1 (43 points)
    Mac OS X
    May 7, 2016 4:29 AM in response to VikingOSX

    VikingOSX wrote:

     

    The following lines in the Run Shell Script that is set to pass input: as arguments will produce a dialog with a word count value in it. There is no need to use the Finder tell block.

     

    for f in "$@"

    do

      words=$(textutil -stdout -convert txt "${f}" | wc -w | awk '$1=$1')

      osascript -e "display dialog \"Word Count:  $words\" as text & return & current date"

    done

     

    And yet these lines produce constant error messages. The corresponding log entries (at the bottom of Automator worksheet) will produce sort of "no user interaction allowed", wrong syntax (like "expected expression but found end of line, expected end of line but found identifier/expresion/parameter etc") warnings. I dig up on the issue and here's what I found:

     

    http://stackoverflow.com/questions/13484482/no-user-interaction-allowed-when-run ning-applescript-in-python

     

    The quote:

    from the command line notice this won't work...

    osascript -e 'display dialog "Now it will not work."'

    But this will work since we tell the Finder to do it...

    osascript -e 'tell application "Finder"' -e 'activate' -e 'display dialog "Now it works!"' -e 'end tell'


    It was not unless I omitted "\" symbol and added tell blocks when it stopped spitting out that error messages and finally diplayed that dialog albeit containing not what it was meant to. Else I can't in the world discover what interrupts it.



  • by VikingOSX,

    VikingOSX VikingOSX May 7, 2016 8:07 AM in response to scrutinizer82
    Level 7 (20,606 points)
    Mac OS X
    May 7, 2016 8:07 AM in response to scrutinizer82

    Apple usually makes changes to AppleScript (and Automator) with each new release of OS X. Some of these changes are not backwards compatible to older releases of OS X. The workflow and shell examples that I provided you are accurate and functional on El Capitan, but break on versions of OS X older than Mavericks. It was in Mavericks that one could begin using the display dialog in osascript without a tell block to support it. I believe you may still be on Lion.

     

    This morning, I took my El Capitan workflow over to Snow Leopard, and had to rework the Run Shell Script action contents to get a word count dialog. When I brought this back to El Capitan, and ran the recompiled Automator workflow, it also worked as expected without any modifications. The link to stackoverflow that you provided was from 2012, and for that timeframe, a Finder tell block was required with display dialog.

     

    So here it is. It incorporates a HERE document with osascript to cut back on quote escaping.

    Screen shot 2016-05-07 at 10.46.48 AM.png

    Screen Shot 2016-05-07 at 10.57.05 AM.jpg

  • by scrutinizer82,

    scrutinizer82 scrutinizer82 May 7, 2016 3:48 PM in response to VikingOSX
    Level 1 (43 points)
    Mac OS X
    May 7, 2016 3:48 PM in response to VikingOSX

    Shell.png

    for f  in "$@" is also present there, just outside the capture

    Yes, I'm on Lion

  • by BobHarris,

    BobHarris BobHarris May 7, 2016 3:53 PM in response to scrutinizer82
    Level 6 (19,272 points)
    Mac OS X
    May 7, 2016 3:53 PM in response to scrutinizer82

    From "man sh":

       If the redirection operator is <<−, then all leading tab characters are
       stripped from input lines and  the  line  containing  delimiter.   This
       allows  here‐documents within shell scripts to be indented in a natural
       fashion.

    Chances are you have spaces before 'AS' and the <<- operator wants to see a tab characters.  Just a guess.

  • by etresoft,

    etresoft etresoft May 7, 2016 3:58 PM in response to scrutinizer82
    Level 7 (29,056 points)
    May 7, 2016 3:58 PM in response to scrutinizer82

    Hello scrutinizer82,

    Before you get too far with this, you should realize that PDF is strictly a print format. It is designed to be printed, on a printer, and read by a human. There is really no other use case. From time to time you may get lucky and have a PDF that appears to have text content. But in no case will you ever extract the complete text from a PDF file.

Previous Page 2 of 3 last Next