Question
Cognizant
US
Last activity: 9 Mar 2018 6:09 EST
How to read form data and especially checkboxes in a PDF file
Hi All,
I have requirement of reading a PDF file having some standard data and Form data (Text boxes, Check boxes etc.). We tried using PDF Connector and were able to read some data in PDF, but having challenge in getting/reading the value of a check box.
Please find few more details below,
- How can we read a checkbox in a PDF file, to identify whether check box is checked or un-checked. In this case the checkbox has a ‘X’ mark and the pdf file has some data in read only mode.
Kindly share your inputs taking form data read only / editable formats.
Thanks,
Raj
-
Like (0)
-
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Pegasystems Inc.
US
The PDF Connector allows you to modify the settings to read your PDF file as lines, segments and words. The settings are very specific to how the PDF is constructed. Once you have found the settings that allow you to properly grab the lines, segments and words from your document, you can read through them one line at a time. Each segment or word has a Left property. This will tell you the offset in the line for the segment or word. With some detective work you can find the offset for your checkbox and then determine whether it is checked or not based on the value of the segment or word at that location.
Cognizant
US
Hi Jeff,
I attached sample PDF form snip screenshot where the sample data of check boxes, that are checked is highlighted in yellow. Also attached sample automation for reference.
As part of POC we have done below trail & error to get solution, please find few details,
1. After using the PDF Connector the checkbox couldn't be identified using any unique flag to determine whether the checkbox is checked / un-checked. We also tried to get the checkbox value using the left property of segment, but failed to figure out the checkbox value (checked/un-checked).
2. Can you please share some pointers, in which category the checkbox will fall under.
Note:
Request you to also confirm where the Radio button will fall under in PDF Connector.
Thanks,
Raj.
-
Frederic Chu
Pegasystems Inc.
US
It is very hard to tell you how to work with this without the actual PDF.
Infosys Limited
IN
Can you provide an example where we can identify checkbox from pdf through automation
Cognizant
US
Hi Jeff,
I have attached a sample pdf which we are referring, where check boxes are present. Please use this pdf and help us read and identify whether the check box - is checked or unchecked.
Thanks,
Raj
Cognizant
US
Shall i write any C# code to identify whether check box is checked or unchecked. Please throw some pointers.
Pegasystems Inc.
US
I have been testing with your form. By setting the word threshold lower (it is 2.2 by default) you can isolate the check boxes from the text next to them. I don't know how they are represented when they are checked however.
I set the Word threshold to 1 and this is how the words were delimited. Notice how the checkboxes are highlighted by themselves.
Each word has a left value - the starting pica value. If you read through the words, whenever the left value is less than the previous left value it means you are on a new line. I use this to assign line numbers to each word. Then you can index each word by line number and left position. here is an example.
Pegasystems Inc.
US
After doing all of this, it does not appear that the checkboxes present the check in text form. I need to do a little more research for them.
Cognizant
US
Thank you Jeff, please share the update once the research is done. Till then i will think in the lines of writing C# code and add that DLL.