Skip navigation

Count words in a document

1354 Views 16 Replies Latest reply: Nov 16, 2012 5:34 AM by VikingOSX RSS
  • Jongware Calculating status...
    Currently Being Moderated
    Nov 16, 2012 1:49 AM (in response to VikingOSX)

    VikingOSX, rather than writing my own code, good or bad as it may turn out, I was in fact hoping the OP would be able to convert the suggested algorithm into code (or *at least* give it a honest try). That's based upon my personal assessment the OP is learning how to write code -- quite possibly this is even a school assignment --, rather than having to perform this task as part of a larger piece of best-selling software that's due to hit the App Store in a few months.

     

    If the OP only needs to count words as a real-world task (unrelated to "programming" as a separate topic), he'd be best off with tried-and-tested code as provided by Apple:

     

    grep -o "the" | wc -l

     

    There is nothing intrinsically "wrong" with your code, as it demonstrates proper initialization, assignment, looping, and a potentially very useful function I didn't even know existed ("strtok_r"). And it does the job.

    As for potential issues, a couple that pop into mind are:

     

    1. buffer length is limited to 80 characters:

     

    Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.

    If a single line of input exceeds 80 characters, it may end in the middle of a word. Simply increasing the buffer length does not solve the problem, it only shifts it forward

     

    2. keyword length is limited to 20. Same sort of problem as above; why a limit of 20 characters? You would be better off with a char *, and use strdup.

    In itself there is no reason to create this variable and copy the argument into it; you could either use argv[1] directly, or make keyword a char * and have it point to argv[1].

     

    3. The delim array should consist of *all* not-a-word characters.

     

    4. The code assumes the text file is 8-bit plain ASCII. My suggested algorithm would work with 16- or 32-bit Unicode, and with UTF-8 encoded text, with very minor adjustments. (In case you are wondering: (a) instead of using fgetc(), write a function to read one single character code in your encoding of choice; (b) adjust is-a-word and/or is-not-a-word to work with the full range of available codes in your encoding of choice.)

     

    5. The for loop starting on line 29 could be rewritten for clarity (it took me several readings to guess what happens in there).

  • VikingOSX Level 5 Level 5 (4,725 points)
    Currently Being Moderated
    Nov 16, 2012 5:34 AM (in response to Jongware)

    Agreed.

     

    1. Yes, a little small for practical purposes.
    2. This was an arbitrary length that assumed few “words” would exceed 20 chars. Your approach sounds better.
    3. Yes.
    4. Plain ascii took less of my time. You are right about the need for unicode flexibility.
    5. Was that obnoxious or what? That came right from the example in strtok_r(3).
1 2 Previous Next

Actions

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • This solved my question - 10 points
  • This helped me - 5 points
This site contains user submitted content, comments and opinions and is for informational purposes only. Apple disclaims any and all liability for the acts, omissions and conduct of any third parties in connection with or related to your use of the site. All postings and use of the content on this site are subject to the Apple Support Communities Terms of Use.