eliminating duplicate word entries in list

Is there anyway of sorting a huge list of words that it deletes duplicate entries?
I don't have iwork. at the moment the list is in .rtf
created in text edit.

I have ms office but cant find a solution in that.
I am looking for freeware if there is anything available.

Mac OS X (10.4.8)

Posted on Aug 16, 2007 6:19 PM

Reply
Question marked as Top-ranking reply

Posted on Aug 17, 2007 3:23 PM

You can do this in Terminal quite easily.

Start Terminal and change directories to the one containing the file you want to sort. At the prompt, type:

sort -u filename1 > filename2


If you need more detail on how to use the Terminal, you can ask in the Unix Forum:

http://discussions.apple.com/forum.jspa?forumID=735
12 replies

Aug 17, 2007 6:10 PM in response to Dieop

This can be easily done in Excel or other spreadsheets with a simple IF statement and some copy and pasting.

- You want to have each word separated by a carriage return in your word processor to begin with.

- Select the entire list and paste it into the spreadsheet. In Excel paste into cell A1

- Next sort column A. This will group any duplicates together.

- In B1 enter the following formula: =IF(A1=A2,"",A1)

Note: "" represent two consecutive quotation marks.

- Double click the autofill/fill-down handle in cell B1 (located on the lower right corner of the cell) to apply the formula to all contiguous occupied cells in column A.

- Copy the results that now appear in column B.

- Click into any non-adjacent column and choose "Paste special" from the Edit menu.

- Click the "Values" option in the resulting dialog box and then click OK.

- Sort the data appearing in the new column and do with it want you want such as copying and pasting it back into a word processing document.

I hope the above is correct and helpful. I apologize for any mistakes or misstatements.

Stu B

Aug 17, 2007 6:54 PM in response to Sagesse

Let me make sure I understand what you are trying to do, before I go into a long explanation of how to do the wrong thing.

You have a .rtf file with a list of many words in it, lets say a chapter of a novel. You would like to have a file of all the individual words in that file, with all the duplicates removed.

Do I have that right?

Is each word on a separate line, or is there more than one word per line?

Is it OK if the output file is sorted?

Just to be super clear: Am I right to assume your .rtf file might start something like:

Now we are engaged in a great civil war, testing whether that nation, 
or any nation so conceived and so dedicated, can long endure.


And you want the output file to look like:

a
and
any
are
can
civil
conceived
dedicated
endure
engaged
great
in
long
nation (x2)
now
or
so (x2)
testing
that
war
we
whether


Right?

If so I can probably "talk" you through Terminal. It may also be a good use of AppleScript or Automator. I haven't played with them much, but I'm kind of looking for an excuse.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

eliminating duplicate word entries in list

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.