Question
HTC
HTC
IN
HTC
Posted: Jan 3, 2024
Last activity: Feb 5, 2024
Last activity: 5 Feb 2024 5:58 EST
Closed
Solved
How to extract table from PDF if the table spanning to next page(one table spanning multiple pages)
Hi,
We are attempting to extract a table that spans multiple pages from readable PDF file.
whole PDF is highlighting when we are trying to select Table region(please refer attachment).
is there any way to extract the table spans multiple pages from a readable PDF.
***Edited by Moderator Rupashree to add Capability tags***
***Edited by Moderator Marissa to add Support Case details***
***Edited by Moderator Marije to remove INC-B1427 (Pega.ServerDeploy issue) and replace with INC-B434 (pdf issue) ***
***Edited by Moderator Marije to add new BUG-849540 ***
To see attachments, please log in.
***Edited by Moderator Marissa to add Support Case details***
***Edited by Moderator Marije to remove INC-B1427 (Pega.ServerDeploy issue) and replace with INC-B434 (pdf issue) ***
***Edited by Moderator Marije to add new BUG-849540 ***
@AbhishekR17024116 @ManikandanT17003272 BUG-849540 and INC-B434 (Issue with PDF Table in 22.1.24) was closed with the conclusion that the PDF file is not supported :
Issue being investigated was why the vertical and horizontal lines are not correctly counted after reduction.
The PDF was found to have non visible table lines - lines not for display but are listed in the structure of the PDF. That causes the table recognition code to interpret the table is designed from top to bottom of the pdf page.
This causes the pdf connector table recognition code to identify the table as full page. This structure is atypical of an ordinary pdf table.
A feature enhancement would be required for the pdf connector to recognize when the table lines are visible and non visible for such a pdf structure.
If support for such a pdf is needed please request assistance from our GCS team in placing a feature enhancement request through the ticketing system.
cc @ThomasSasnett cc @Mitchell