Read Structured tabular content PDF Document in Pega Openspan with version 8.0.1096.0

Question

JayeshT2

Member since 2017

1 post

Capgemini

Posted: Jul 4, 2018

Last activity: Jul 27, 2018

Posted: 4 Jul 2018 17:19 EDT
Last activity: 27 Jul 2018 21:37 EDT

Closed

Read Structured tabular content PDF Document in Pega Openspan with version 8.0.1096.0

Report

I have a structured pdf document which has tabular content. I am unable to find a method which can directly fetch table content in form of collection of rows.

The text property return a junk set of text characters which are incomprehensible.

Please guide.

Any help or guideline is highly appreciated.

***Moderator Edit: Vidyaranjan | Updated Platform Capability***

To see attachments, please log in.

Robotic Process Automation

Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 7 years ago

Posted: 5 Jul 2018 23:11 EDT

grona

PEGA

replied to JayeshT2

Please review the PDF documentation here

http://help.openspan.com/80/Components/PDFConnector_Component_Properties,_Methods,_and_Events.htm

http://help.openspan.com/80/Components/PDFViewer_Component_Properties,_Methods,_and_Events.htm

To see attachments, please log in.

Posted: 7 years ago

Posted: 9 Jul 2018 11:36 EDT

JayeshT2

Capgemini

replied to grona

I already did.

I used the methods to retrieve PDF text. But the text that's in tabular form is not retrieved as a table.

The PDF generated by itextshap utility are incomprehensible. The work around for me was to use itextsharp utility within the pega scripts to read the text which is plain text.

There is no method that directly gives any tabular report which a pdf would contain. An additional manipulation is needed to read the tabular lines and map it to respective table schema using space as delimiter. This is the work around and not a guaranteed way to map the table rows/lines as space size will vary in presence of any variable text field like name.

Please suggest if there is any way to read the table structure data contents.

Thanks.

To see attachments, please log in.

Posted: 7 years ago

Posted: 27 Jul 2018 21:37 EDT

grona

PEGA

replied to JayeshT2

You could try pulling the data using OCR in OpenSpan version 8.0.1087 or later.

To see attachments, please log in.

Question

Read Structured tabular content PDF Document in Pega Openspan with version 8.0.1096.0

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Read Structured tabular content PDF Document in Pega Openspan with version 8.0.1096.0

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.