remove formatting in a pages document

I am importing (copy & paste) text into a new document in Pages. There are paragraph commands at the end of each line, visible by clicking on Show Invisibles under the View tab. When I try to format the text, I get weird results like single words on a line. If I remove the backward "P's" at the end of each line manually, that solves the problem; everything then acts as it should. I tried using "find and replace" but that does not work because it removes all the invisibles so you get one long sentence with no space between words. I tried using TextEdit to remove unwanted formatting commands, as found in another link, but that did not work either. Any suggestions on how to remove all the "P's"? I have a 50 page document to reformat so doing this manually is nuts.

MacBook Air, OS X El Capitan (10.11.2)

Posted on Feb 23, 2016 1:27 PM

Reply
10 replies

Feb 24, 2016 5:43 AM in response to Yellowbox

Hi Dwight,


In Pages 5.6.1, you can't type an Invisible such as a Return character into the Find & Replace box. The trick is to select the Invisible character(s) in the document and Menu > Edit > Find > Use Selection for Find (or Use Selection for Replace).


Regards,

Ian.


Edit: The time-out beat me.


Hi again Dwight, after thinking this through, my workflow might go something like this:

  1. Preserve all the double Returns (the ones that you want to keep) by selecting a double Return, replace with some nonsense text that can not possibly occur in your document. Maybe !@#$
  2. Replace all the unwanted single Returns with either nothing (blank) or a space* character.
  3. Replace !@#$ with double Returns.
  4. *Replace all double spaces with a single space.


Ian.

Feb 24, 2016 11:22 AM in response to dwightfromapex

To Barry and Ian: Thanks but I'm not understanding so I'll try the following. I have used Find and Replace before, replacing items with $%^# or whatever then working back to get the result needed, but none of that seems to work here; I keep getting thwarted.


The document was created by OCR'ing a PDF document using FineReader OCR Pro for MAC. When I OCR'd it, the end result was almost useless when converted to Word. So I tried outputting it to Excel, which seemed to clean it up more in that there were no blank pages, no double spaced paragraphs, and some other weird things that happened when I tried using Word as the target output. Below (a partial screen shot) is a portion of what I get when I copy and paste the result from Excel to Pages (6x9 page layout). I can highlight one "P" at the end of a line and paste it into the Find and Replace, and try to replace it with something, but then all the blue dot's between each word also get replaced. Somehow, the "P's" and dots appear the same to Find and Replace. So if I could get rid of all the P's, even if that meant that all paragraph designations would be obliterated and the whole thing became one long paragraph, that would be fine. I do have the original 50 page book so I could manually insert the paragraph and page breaks manually. And if there is no solution, I can manually remove the "P's" (highlight and delete), but that would take some time, but would be better than retyping the whole book. (The book was self-published by the author back in 2002 and there are no electronic records (I called the publisher). Thanks.




User uploaded file

Feb 25, 2016 12:51 AM in response to Yellowbox

"Thanks for the screen shot. It seems that some app (PDF or OCR or Word or Excel or Pages) has inserted extra paragraph marks (backward Ps) by replacing space characters. That would be to make the text fit between your left and right page margins. I would blame PDF ."


I'd guess the ODR software is to blame for that—inserting a return at the end of each line.


Regards,

Barry

Feb 23, 2016 6:34 PM in response to dwightfromapex

HI Dwight,


Where are you copying the text from? I would guess that the hard returns have been inserted by a text editor app at some point in the process. If you are copying from a text editor, you may be able to find a setting that will toggle between having and not having hard returns at the end of each line. If so, set it to 'off' before doing the select and Copy.


Is there any way to distinguish these extra returns from the legitimate ones by examining the characters that immediately precede and follow them?

Returns at the end of a paragraph should immediately follow a punctuation mark; period, question mark, or exclamation mark, and should be immediately followed by the first character of the next paragraph.


End of line returns should not follow one of these marks except when that mark (and the space after it) come too close to the maximum line length to fit the following word onto the same line.


If you are lucky, and the document was produced by someone not wise to (or not trusting of) the ability of software to add space after a paragraph, you may find that two consecutive returns have been used (as I have in this message) to create that space.


End of line returns should not follow one of these marks except when that mark (and the space after it) come too close to the maximum line length to fit the following word onto the same line.


Knowledge of these patterns can give you a three pass method for bulk removal of those extra returns using Find/Replace.

On the first pass, MARK the returns you want to keep.

— If double returns have been used, use two returns as your Find string, and two identical markers as the replace string. The markers should be a pair of characters that do not appear, or do not appear in pairs, in the document. "##" comes to mind.

— If you need to rely on punctuation, you'll need to do a separate pass for each punctuation mark that has been used at the end of a paragraph. For each, the Find string will be markreturn, and the replace string mark##.

That first pass will replace all of the returns you want to keep with "##" (preceded in the second method by the punctuation mark that is also to be retained).


On the second pass, remove all of the returns that are left in the document. Find string: return Replace string: one space.

NOTE: This assumes that the end of line return has replaced the space that would normally occur between words at this point. If you have determined that there is an end of line return AND a space between the word at the end of one line and the word at the beginning of the next line, leave the Replace box empty.

CAUTIOUS route: Include the space, then see the fourth pass below.


On the third pass, re-insert the required returns. Find: ## Replace with: return

Note that if your first pass searched for and retained a punctuation mark, you do NOT need to restore this—it was kept during the first pass.


Optional fourth pass: One of the things I taught students when we started using word processing in the elementary grades was to never press the space bar twice in a row. You need one space between words, no spaces between a word and the punctuation mark after the word, and one space between the end mark of a sentence and the first word of the next sentence in the same paragraph. This Find/Replace pass removes extra spaces using Find: two spaces Replace with: one space.

Repeat this pass as many times as necessary (until Find/Replace reports 'none found' or a similar message).


Regards,

Barry


PS: This method can probably also be written into an AppleScript, which would cut down on the necessary steps and simplify the whole thing.

B

Feb 25, 2016 12:16 AM in response to dwightfromapex

Hi Dwight,


Thanks for the screen shot. It seems that some app (PDF or OCR or Word or Excel or Pages) has inserted extra paragraph marks (backward Ps) by replacing space characters. That would be to make the text fit between your left and right page margins. I would blame PDF 😉.


I wrote:

In Pages 5.6.1, you can't type an Invisible such as a Return character into the Find & Replace box. The trick is to select the Invisible character(s) in the document and Menu > Edit > Find > Use Selection for Find (or Use Selection for Replace).

assumed you are using Pages 5 (from your profile of OS X El Capitan).

However, your reply:

I can highlight one "P" at the end of a line and paste it into the Find and Replace

makes me think you are using Pages '09.


I am sure that we can find a solution for either version of Pages, but before we can move forward, please reply with the version of Pages that you using (in Pages, Menu > Pages > About Pages).


Regards,

Ian.


P.S. Those six consecutive returns are right aligned; strange, but we can handle them. Ian.

Feb 25, 2016 12:48 AM in response to dwightfromapex

Hi Dwight,


Actually, if that is a screen shot of the raw OCR output pasted into Word (or Pages) and displayed with Show Invisibles active, I am highly ompressed with the job the OCR software has done!


I see only a single OCR error in the sample—misreading a lower case h as two letters, li in the first group of lines.


"I can highlight one "P" at the end of a line and paste it into the Find and Replace, and try to replace it with something, but then all the blue dot's between each word also get replaced. Somehow, the "P's" and dots appear the same to Find and Replace."


The blue dots are showing spaces. The blue pilcrow sign ( ¶ ) shows a return. The single word lines in the first part of your sample are caused by the line of type being longer than will fit between the page margins. If the line is even one letter too long, pages (and any other word processor) will break the line at the preceding space, and push the rest down to the next line. The return will force the next line down even if there is room for all or part of it on the same line as the single word from the preceding line.


I have no idea why Find/replace in Pages treats both as the same character, unless return changes to space when pasted into the Find or Replace with box.


Can you successfully replace a text string (such as my example, "##" or Ian's "!@#$" so that, for example:


abc def ghij##klmn opqrs tuv wxyz


becomes


abc def ghij

klmn opqrs tuv wxyz


If so, then try this on a one page sample of your text:


Type the marker ## at the end of the first paragraph. Select both characters and press command-C to Copy.

Place the insertion point at the end of the next paragraph, then press command-V to paste.

Repeat last step for each paragraph on the page.


Now use Find: return Replace with: space

to replace all returns with single spaces.

If this also replaces the existing spaces with new spaces, that still produces what you want at this stage.


Finally, use Find: ## Replace with: return

OR

Copy a return, the manually select each ## and paste the return in its place

to put all the paragraph breaks back into the document.


Regards,

Barry

Feb 26, 2016 10:47 AM in response to Barry

Thanks to both Ian and Barry for your help. And yes, it was the OCR program that was inserting the extra "stuff" into the translated end result -- or more importantly, my ignorance in how the OCR program worked to make the right filtering selections in the program to produce a cleaner result. Because the search and replace and other suggestions was not quite making it, and your suggestion that maybe it's the OCR program that is the source of the extra "stuff", I finally called the support line for FineReader OCR Pro for Mac and within 10 minutes, I was off and running. The new OCR'd file that was created after support showed me how to "fine tune" the output cleaned it up almost without error. (Somehow I missed the fine tuning section.) Using the new output, in about an hour I was able to clean up 14 pages of the 53 total. I too was really impressed at how well FineReader works. Well worth the money and free support.


Again, thank you Ian and Barry.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

remove formatting in a pages document

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.