Extracting data from Scanned PDF using OCR/Robotics issue

Question

ThaveeshaC

Member since 2015

17 posts

Evonsys

Posted: Apr 10, 2025

Last activity: Apr 25, 2025

Posted: 10 Apr 2025 8:29 EDT
Last activity: 25 Apr 2025 9:21 EDT

Extracting data from Scanned PDF using OCR/Robotics issue

Report

Hi there!

We are trying to extract a table from the pdf using pega robotics. However the original pdf is a scanned images and had to be converted to a pdf with readable text using OCR so the pdfconnector can read the data.

Since its a scanned image, the table structure will not be availble for pdf connector to work with so GetTables methods would be of no use.

As of now i am trying to use pdfline/findline methods to go row by row. My question is whether is there a method or process to capture segments (pdfsegments) within a pdfline?

Each segment within a line would be the table column. so Pdfline would give the row and pdfsegements will give the value for each column.

Or is there any other alternative to read a table from pdf with scanned images.

To see attachments, please log in.

Pega Robotic Process Automation

Pega Robotic Automation 22.1

Pega Robotic Desktop Automation

Pega Robotic Automation

Financial Services

Robotics System Architect

Reply
Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 2 months ago

Posted: 25 Apr 2025 5:19 EDT

MarijeSchillern

MOD

replied to ThaveeshaC

Report

@ThomasSasnett is this something you could answer?

To see attachments, please log in.

Posted: 2 months ago

Posted: 25 Apr 2025 9:21 EDT

ThomasSasnett

MOD

replied to ThaveeshaC

Report

@ThaveeshaC I apologize for missing this post. Unfortunately, with scanned images, the table functionality is not available. You would need to read the document using words, segments, and lines. It has been a while, but I believe a PDF Line does have segments and words. You should be able to use those to read it. If you have a sample file you could upload without any real data, I am happy to take a look.

To see attachments, please log in.

Question

Extracting data from Scanned PDF using OCR/Robotics issue

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Extracting data from Scanned PDF using OCR/Robotics issue

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.