Can Document OCR not get text out of a scanned pdf?

Question

JayanthiK4197

Member since 2018

8 posts

Bank Of America

Posted: Aug 23, 2019

Last activity: Aug 23, 2019

Posted: 23 Aug 2019 10:28 EDT
Last activity: 23 Aug 2019 17:00 EDT

Closed

Can Document OCR not get text out of a scanned pdf?

Report

I have tried processToXml and ProcessToPdf and tried putting ProcessToPdf before each of these and thried everything with and without ocrImagesAndText being true. everything just returns false. I am trying to get text out of a pdf produced by scanning a paper document, but there are even some pdfs the regular pdf connector can read that document ocr cannot, unless I just cannot sort out how to use it. I can make it get text from images in word documents and it can get text out of a pdf I make by doing a print to pdf, so I know I am not doing everything wrong. can this component actually not get text from a scanned pdf?

To see attachments, please log in.

Robotic Process Automation

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 6 years ago

Posted: 23 Aug 2019 14:04 EDT

ThomasSasnett

MOD

replied to JayanthiK4197

Report

It can but a lower quality of the scan may be the reason. I would suggest you open a support request so that they can examine your specific PDF unless you can attach it here for the community to examine.

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 23 Aug 2019 16:54 EDT

JayanthiK4197

Bank Of America

replied to ThomasSasnett

Report

document is sensitive but it is very high quality. doc OCR actually fails on some PDFs that can be had with the regular pdf Connector. I made a thing to do what I want to do anyway, here you go.

public void pngThat(string path)

{

AcroPDDoc pdfd = new AcroPDDoc();

pdfd.Open(path);

document is sensitive but it is very high quality. doc OCR actually fails on some PDFs that can be had with the regular pdf Connector. I made a thing to do what I want to do anyway, here you go.

public void pngThat(string path)

{

AcroPDDoc pdfd = new AcroPDDoc();

pdfd.Open(path);

Object jsObj = pdfd.GetJSObject();

Type jsType = pdfd.GetType();

object[] saveAsParam = { "out.png", "com.adobe.acrobat.png", "", false, false };

jsType.InvokeMember("saveAs", BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.Public | BindingFlags.Instance, null, jsObj, saveAsParam, CultureInfo.InvariantCulture);

}

Show Less

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 23 Aug 2019 17:00 EDT

JayanthiK4197

Bank Of America

replied to ThomasSasnett

Report

forgot to say, that code uses Acrobat SDK, so this is only helpful to people with Acrobat Pro DC. We have to bring a free-er easier solution to the people, Tsasnett Sir. Acrobat SDK rough and not free.

To see attachments, please log in.

Like (0)

Question

Can Document OCR not get text out of a scanned pdf?

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Can Document OCR not get text out of a scanned pdf?

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.