sips convert pdf to tiff to use in tesseract ocr

I'm using sips to convert a pdf file to a tiff file. I need to generate a tiff file without any alpha channels. I'm trying to run the resulting tiff through tesseract ocr. If I do this:

sips -s format tiff infile.pdf --out outfile.tif

It gives me a tiff file, but when I run it through tesseract, I get this:

command: tesseract outfile.tif ocr_data.txt
results: check legal_imagesize:Error:Only 1,2,3,5,6,8 bpp are supported:16

As a work around, I use sips to convert the pdf to a jpeg first, then convert the jpeg to a tiff. That generates a tiff file that works with tesseract ocr. However, I don't want to do two image conversions to get a working tiff file. Are there any settings that would allow me to convert the pdf directly to a tiff file that works in tesseract?


Another unrelated question... Does anyone know how to convert a multi-page pdf to a multi-page tiff file? When I feed a multi-page pdf into sips, the resulting tiff is only the first page.

Mac OS X (10.5)

Posted on Jul 2, 2010 8:14 AM

Reply
1 reply

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

sips convert pdf to tiff to use in tesseract ocr

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.