Previous 1 2 Next 16 Replies Latest reply: Nov 16, 2012 5:34 AM by VikingOSX Go to original post
  • Jongware Level 2 Level 2 (265 points)

    VikingOSX, rather than writing my own code, good or bad as it may turn out, I was in fact hoping the OP would be able to convert the suggested algorithm into code (or *at least* give it a honest try). That's based upon my personal assessment the OP is learning how to write code -- quite possibly this is even a school assignment --, rather than having to perform this task as part of a larger piece of best-selling software that's due to hit the App Store in a few months.

     

    If the OP only needs to count words as a real-world task (unrelated to "programming" as a separate topic), he'd be best off with tried-and-tested code as provided by Apple:

     

    grep -o "the" | wc -l

     

    There is nothing intrinsically "wrong" with your code, as it demonstrates proper initialization, assignment, looping, and a potentially very useful function I didn't even know existed ("strtok_r"). And it does the job.

    As for potential issues, a couple that pop into mind are:

     

    1. buffer length is limited to 80 characters:

     

    Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.

    If a single line of input exceeds 80 characters, it may end in the middle of a word. Simply increasing the buffer length does not solve the problem, it only shifts it forward

     

    2. keyword length is limited to 20. Same sort of problem as above; why a limit of 20 characters? You would be better off with a char *, and use strdup.

    In itself there is no reason to create this variable and copy the argument into it; you could either use argv[1] directly, or make keyword a char * and have it point to argv[1].

     

    3. The delim array should consist of *all* not-a-word characters.

     

    4. The code assumes the text file is 8-bit plain ASCII. My suggested algorithm would work with 16- or 32-bit Unicode, and with UTF-8 encoded text, with very minor adjustments. (In case you are wondering: (a) instead of using fgetc(), write a function to read one single character code in your encoding of choice; (b) adjust is-a-word and/or is-not-a-word to work with the full range of available codes in your encoding of choice.)

     

    5. The for loop starting on line 29 could be rewritten for clarity (it took me several readings to guess what happens in there).

  • VikingOSX Level 6 Level 6 (10,960 points)

    Agreed.

     

    1. Yes, a little small for practical purposes.
    2. This was an arbitrary length that assumed few “words” would exceed 20 chars. Your approach sounds better.
    3. Yes.
    4. Plain ascii took less of my time. You are right about the need for unicode flexibility.
    5. Was that obnoxious or what? That came right from the example in strtok_r(3).
Previous 1 2 Next