Question

Evonsys
US
Last activity: 10 Apr 2025 8:31 EDT
Extracting data from Scanned PDF using OCR/Robotics issue
Hi there!
We are trying to extract a table from the pdf using pega robotics. However the original pdf is a scanned images and had to be converted to a pdf with readable text using OCR so the pdfconnector can read the data.
Since its a scanned image, the table structure will not be availble for pdf connector to work with so GetTables methods would be of no use.
As of now i am trying to use pdfline/findline methods to go row by row. My question is whether is there a method or process to capture segments (pdfsegments) within a pdfline?
Each segment within a line would be the table column. so Pdfline would give the row and pdfsegements will give the value for each column.
Or is there any other alternative to read a table from pdf with scanned images.