Automator to save rtf as html file

A few weeks ago I found a discussion that provided an Automator solution to convert an RTF file out of Word into an HTML file. And now I cannot find that discusssion (code). If anyone can point me to the discussion (or the code example) I would be extremely grateful! Thanks in advance!!

MBair, MBPro, Mini, iStuff

Posted on Oct 4, 2018 2:26 PM

Reply
4 replies

Oct 14, 2018 7:35 AM in response to leroydouglas

This automator action was written in 2005. What are the chances it will work with Automator now? From the internal files, it says that it is simply an AppleScript wrapper around the command-line textutil utility. So yes, textutil can convert a .doc/x, or .rtf file to html, but all text attributes are lost, and the text is just paragraph dumps into HTML with no styling.


MS Word, being HTML aware, will likely export content with approximate styling as shown in the source document, because it will add appropriate CSS stying to the HTML. If the source document were not .rtf, but rather .docx, then one could use the free pandoc, and supply it with a .css file, and generate a styled HTML5 document.


One would be better served to open the .rtf document in Word, and then save/export to HTML.

Oct 14, 2018 10:15 AM in response to keriah

Had you orginally mentioned Dreamweaver, I would not have suggested MS Word to HTML as I spent countless hours trying to clean out the word styling crap from Drupal editor sessions during a large site migration.


The command-line textutil utility would pass the minimum of code from .rtf to the final HTML output. As textutil is installed by the operating system, its textutil(1) man page gives additional information, and would even be suitable in a Run Shell Script action to pass n-tuple, .rtf file arguments for conversion to HTML.


I installed pandoc via homebrew (brew) package manager. I also have the following in my .bashrc file:


export HOMEBREW_NO_ANALYTICS=1


to gag Google.

Oct 14, 2018 7:37 AM in response to VikingOSX

Actually, it looks like it was bits from YOUR post in 2012 (Save Pages file as html?) that I found and used.


I'm not sure why you say that the styling is lost. Actually, bits of the styling are retained in the HTML that the action creates -- about a dozen 'span' styling elements and another dozen used on the paragraphs, named as generic 'sx' and 'px' styles. But, all in all, the action-generated HTML is decent and conforms to the basics of the very clean (nearly style-free) Word file that was the input.


So, indeed, the automator action works just fine -- better than Word.


Word produces a ton of 'junk' styling, all in conflict with the site's standards. I use Dreamweaver which can strip out some of the offensive 'junk' but a whole lot of of it remains. The HTML that's output from a Word-save is really, really ugly and the webmaster won't accept it as-is. Yes, I can edit in a text editor; yes, I can edit in Dreamweaver (which I what I had been doing). My objective here is to eliminate most of the 'hand work' of the HTML step and get acceptible output from Word, fairly directly.


The Word source is a docx. It uses a standard (minimal and standardized style) Word template to 'normalize' the content from various contributors into one uniform set of conventions. So perhaps pandoc will be even better for my purposes. Thanks for that suggestion.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Automator to save rtf as html file

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.