How to extract table from PDF if the table spanning to next page(one table spanning multiple pages)

Question

ManikandanT17003272

Member since 2023

5 posts

HTC

Posted: Jan 3, 2024

Last activity: Feb 5, 2024

Posted: 3 Jan 2024 8:34 EST
Last activity: 5 Feb 2024 5:58 EST

Closed

Solved

How to extract table from PDF if the table spanning to next page(one table spanning multiple pages)

Report

Hi,

We are attempting to extract a table that spans multiple pages from readable PDF file.

whole PDF is highlighting when we are trying to select Table region(please refer attachment).

is there any way to extract the table spans multiple pages from a readable PDF.

***Edited by Moderator Rupashree to add Capability tags***
***Edited by Moderator Marissa to add Support Case details***
***Edited by Moderator Marije to remove INC-B1427 (Pega.ServerDeploy issue) and replace with INC-B434 (pdf issue) ***
***Edited by Moderator Marije to add new BUG-849540 ***

To see attachments, please log in.

Pega Robotic Desktop Automation

Intelligent Document Processing

Support Case Created

Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Accepted Solution

Posted: 2 years ago

Posted: 5 Feb 2024 5:58 EST

MarijeSchillern

MOD

replied to AbhishekR17024116

@AbhishekR17024116 @ManikandanT17003272 BUG-849540 and INC-B434 (Issue with PDF Table in 22.1.24) was closed with the conclusion that the PDF file is not supported :

Issue being investigated was why the vertical and horizontal lines are not correctly counted after reduction.

The PDF was found to have non visible table lines - lines not for display but are listed in the structure of the PDF. That causes the table recognition code to interpret the table is designed from top to bottom of the pdf page.

This causes the pdf connector table recognition code to identify the table as full page. This structure is atypical of an ordinary pdf table.

A feature enhancement would be required for the pdf connector to recognize when the table lines are visible and non visible for such a pdf structure.

If support for such a pdf is needed please request assistance from our GCS team in placing a feature enhancement request through the ticketing system.

cc @ThomasSasnett cc @Mitchell

View reply inline

To see attachments, please log in.

Posted: 2 years ago

Posted: 3 Jan 2024 13:45 EST

ThomasSasnett

MOD

replied to ManikandanT17003272

@ManikandanT17003272Is it possible to attach an example of that PDF (without any real data of course)? I believe it is possible to ignore the headers and footers, but it is not something I do regularly, so having the PDF to test with would be helpful.

To see attachments, please log in.

Posted: 2 years ago

Updated: 2 years ago

Posted: 4 Jan 2024 1:16 EST
Updated: 4 Jan 2024 1:17 EST

AbhishekR17024116

HTC Global Services

replied to ThomasSasnett

@ThomasSasnett

Continuing with mani's post. Please find the test pdf attached.

To see attachments, please log in.

Posted: 2 years ago

Posted: 4 Jan 2024 11:13 EST

ThomasSasnett

MOD

replied to AbhishekR17024116

@AbhishekR17024116 I believe there is something odd with this specific PDF. I have opened a support request to get an explanation as to why this table is being misread. The INC is INC-B434.

Normally, you can simply elect to have the table span pages, however in this case, this table seems to include the entire page. While it is possible to read this and work with this, it is not ideal. If you had to work with this PDF now, you would have an extra column which essentially splits the Amount column into two parts. You could join them together in your automation to get the full value. In addition, it would contain most of the values on each page, so you would need to exclude the information from the table that you do not want. I believe there is an explanation for this PDF though, and I will update once I get word back from support.

To see attachments, please log in.

Posted: 2 years ago

Posted: 4 Jan 2024 11:14 EST

ThomasSasnett

MOD

replied to ThomasSasnett

@ThomasSasnett Here is a link to the documentation on working with PDFs.

https://docs.pega.com/bundle/robotic-automation-221/page/robotic-automation/pdf-connector/usepdfconnector-component.html

To see attachments, please log in.

Posted: 2 years ago

Posted: 11 Jan 2024 13:42 EST

ThomasSasnett

MOD

replied to ThomasSasnett

@ThomasSasnett The customer has opened INC-B1427 on this issue and the one I opened has been closed.

To see attachments, please log in.

Posted: 2 years ago

Updated: 2 years ago

Posted: 19 Jan 2024 5:59 EST
Updated: 5 Feb 2024 5:47 EST

MarijeSchillern

MOD

replied to ThomasSasnett

@ThomasSasnett INC-B1427 does not relate to PDF but to Pega.ServerDeploy overrides.

INC-B434 Pega Support ticket (Issue with PDF Table in 22.1.24) is still open!

GCS will contact you today w.r.t testing out the PDF connector.

Update: BUG-849540 logged and team is investigating further

To see attachments, please log in.

Accepted Solution

Posted: 2 years ago

Posted: 5 Feb 2024 5:58 EST

MarijeSchillern

MOD

replied to AbhishekR17024116

@AbhishekR17024116 @ManikandanT17003272 BUG-849540 and INC-B434 (Issue with PDF Table in 22.1.24) was closed with the conclusion that the PDF file is not supported :

Issue being investigated was why the vertical and horizontal lines are not correctly counted after reduction.

This causes the pdf connector table recognition code to identify the table as full page. This structure is atypical of an ordinary pdf table.

A feature enhancement would be required for the pdf connector to recognize when the table lines are visible and non visible for such a pdf structure.

If support for such a pdf is needed please request assistance from our GCS team in placing a feature enhancement request through the ticketing system.

cc @ThomasSasnett cc @Mitchell

To see attachments, please log in.

Question

How to extract table from PDF if the table spanning to next page(one table spanning multiple pages)

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

How to extract table from PDF if the table spanning to next page(one table spanning multiple pages)

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.