7 Replies Latest reply: Oct 24, 2011 3:04 PM by KOENIG Yvan
Rasuul Level 1 (0 points)

Hi there -


I have a series of text files with leading numbers that I'd like to remove. The files are lists, where each line is enumerated sequentially & these line-numbers have to be removed for further processing.


This is too painful to do manually (takes forever), but the content is not in English and so Excel, which allows me to set the delimited & correctly parses the information, doesn't read the content properly (even after copy paste into a word processor that usually does) whereas Pages will handle the content, but for the life of me I can't get it to parse the information - so it's effectively useless.


The other threads I've read basically suggest that one has to manually insert tabs between the fields - is this actually the case? Is there really no way to control how Pages '09 parses imported data?


As a final add-on: I know this would be relatively easy to do with a script, but I havne't a clue where to get started with scripting for the Mac, so if you read this & have a suggestion as to where I might get started (generally, or w.r.t. this particular project), I would very much appreciate the enlightenment.


Thanks in advance and all the best,



  • Jerrold Green1 Level 7 (29,960 points)



    In Pages or another text editor, replace whatever now separates the fields with a Tab character. Then Paste into Numbers.



  • Rasuul Level 1 (0 points)

    Hi Jerry -


    Thanks. I had actually seen your thread. As I mention above: I'm in disbelief that one would have to manually insert a tab delimited: that's pretty silly. Controlling the parsing of incoming data is usually a feature available in the most basic of data-processing programs, so I wanted to mine the fields of knowledge here to know if there really isn't an automated method for this result. Thanks for the reply, though. Apprciate your taking the time to help.


    For those who are tackling an issue similar to the one I outlined above, the following is a command-line resolution. Open Terminal and navigate to the folder containing the document. Then run:


    sed -i.bak -e 's/^[0-9]* *//' t.txt

    Given that t.txt contains lines which begih with numbers, it will remove the leading numbers and trailing spaces after the numbers.  The .bak is optional, but if present, this is the suffix added to the backup.

    To iterate over all .txt files in current directory

    for f in *.txt ; do sed -i.bak -e 's/^[0-9]* *//' $f ; done



    That should do the trick. However, I'm still interested to know from the rest of the community if there is not a work around within Pages or iWork? It just seems crazy that they wouldn't have this.

  • Jerrold Green1 Level 7 (29,960 points)

    Most files that are intended to be imported into either a database or spreadsheet will have proper delimiters already placed appropriately. It's only if the files are not properly formatted that you will need to insert either commas or tabs, and Find and Replace will do that nicely in most situations.


    I will not judge or comment further on Apple's design decision in this case.



  • Rasuul Level 1 (0 points)

    Jerrold Green1 wrote:


    Most files that are intended to be imported into either a database or spreadsheet will have proper delimiters already placed appropriately.


    I don't really know how to respond to that: every time I've gone to Pages this has been an issue, and yet my data is regularly delimited &, when not in a foreign language, can be handled Excel & other data software. I have a perfectly fine document with valid delimiters regularly placed, but there is no control over the import. I don't doubt that you're right: it can handle some documents, it just doesn't seem nearly as flexible as one might expect (or hope for) from Apple, which is often ahead of the curve.



    It's only if the files are not properly formatted that you will need to insert either commas or tabs, and Find and Replace will do that nicely in most situations.

    What are you using for "Find and Replace"? Another frustration I've had is that Automator is equipped to do this for a Word document, but not a pages or TE document. If there is a good find & replace that makes such substitutions > or < readily available to the average user, I'd agree fully that this is alright (though still, adding a simple interface and avoiding the whole issue seems preferable. I mean, if Microsoft can do it . . . ).


    Thanks again for sharing your thoughts. I look forward to hearing 'bout the find & replace bit.




  • Jerrold Green1 Level 7 (29,960 points)



    Numbers is a much leaner application than Excel. It costs less, it does less, and in the process of doing less, has a rather cleaner and easier interface, for the things that it does do.


    Regarding parsing the raw data file with automation, I believe that Yvan has developed script to parse some types of data and if it doesn't serve your needs as-is, it could probably be modified. He will probably chime in here, but if not, you can search the discussions for his posts on the subject.


    Using Find and Replace in Pages is a basic skill that you can study in the Pages User Guide, downloadable from the Help menu. In short, you "Find" the current delimiters, perhaps space characters, and you "Replace" with the character(s) of your choice, either one at a time, or "Replace All" globally. The only trick in using Find and Replace is to make sure that you are using the long form of the tool. If the disclosure triangle is pointing down, click it to expand the pane.


    If more logic is required to parse the document, it can be done with formulas in Numbers, or with a script.



  • Badunit Level 6 (11,615 points)

    Find and Replace is a feature of TextEdit, Numbers, Pages, etc.  It does not require Automator.  Open your document then do Command F to bring up the Find/Replace window.  Type the incorrect delimiter in the "find" field and type the correct one in the "replace" field.  If you want to replace with a tab, use Option Tab to type the tab.

  • Level 8 (41,790 points)

    If the leading numbers are separated from the rest of lines by space character, you file is not a tab delimited one.

    If a file whose name ends with .txt is a tab delimited one, it may be opened by Numbers with no problem.

    If its name ends with .csv, thingsa re more complicated.

    If the system use the period as decimal separator, the values MUST be separated by commas


    if the system use the comma as decimal separator, the values MUST be separated by semi-colons.


    I have a possible understanding of your problem.

    Maybe values are separated by single or several spaces which is the awful format used in the typewriter era.

    I guess that with this structure, Excel use its knowledge of English language to split the lines.


    As I don't like to work in the Terminal, I wrote a short script achieving what you described.



    set fichier_source to choose file with prompt "Select a text file" of type {"public.plain-text"}

    set le_fichier to POSIX path of (fichier_source as text)

    do shell script "sed -i.bak -e 's/^[0-9]* *//' " & quoted form of le_fichier

    tell application "Pages"

      open le_fichier

    end tell



    Yvan KOENIG (VALLAURIS, France) mardi 25 octobre 2011 00:03:57

    iMac 21”5, i7, 2.8 GHz, 4 Gbytes, 1 Tbytes, mac OS X 10.6.8 and 10.7.2

    My iDisk is : <http://public.me.com/koenigyvan>

    Please : Search for questions similar to your own before submitting them to the community