Question
Coforge BPM
US
Last activity: 21 Jul 2017 10:46 EDT
PDF Page Count
I am currently creating PDF documents with the HTMLToPDF activity. This is working as expected, however, one of my requirements is that I need a page count of the PDF. And not the page count on the footer of a PDF (page x of y). What I really need is a way to store the amount of pages as a property. For example, if the PDF document I just created has 10 pages, I would like to set some property on the clipboard to 10.
Any help on this subject would be greatly appreciated,
Thanks!
PS: I am using PRPC 6.2 service pack 2
***Updated by moderator: Lochan to add Categories***
***Updated by moderator: Lochan to close post***
This post has been archived for educational purposes. Contents and links will no longer be updated. If you have the same/similar question, please write a new post.
-
Like (0)
-
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Accepted Solution
Updated: 15 Jul 2015 10:37 EDT
Pegasystems Inc.
GB
Hi Steve,
You can utilize the 'pdfbox' PDF library within PRPC (I am using 7.1.8 here) to count the pages of the resultant PDF created by HTMLTOPDF.
First check your system to make sure you have the library:
select distinct(pzjar) from pr_engineclasses where upper(pzjar) like '%PDF%';
On my 7.1.8, I get this:
prpdfbox-1.8.8.jar
So this looks like a 're-badged' version of the original Apache Library - we had better check the classnames/package-structure:
So run:
select * from pr_engineclasses where pzjar='prpdfbox-1.8.8.jar'
This gives me the following - which shows the package-names have been refactored here (and possibly some of the implementation has changes as well - I don't know that though).
prpdfbox-1.8.8.jar META-INF DEPENDENCIES
prpdfbox-1.8.8.jar META-INF LICENSE
prpdfbox-1.8.8.jar META-INF MANIFEST.MF
Hi Steve,
You can utilize the 'pdfbox' PDF library within PRPC (I am using 7.1.8 here) to count the pages of the resultant PDF created by HTMLTOPDF.
First check your system to make sure you have the library:
select distinct(pzjar) from pr_engineclasses where upper(pzjar) like '%PDF%';
On my 7.1.8, I get this:
prpdfbox-1.8.8.jar
So this looks like a 're-badged' version of the original Apache Library - we had better check the classnames/package-structure:
So run:
select * from pr_engineclasses where pzjar='prpdfbox-1.8.8.jar'
This gives me the following - which shows the package-names have been refactored here (and possibly some of the implementation has changes as well - I don't know that though).
prpdfbox-1.8.8.jar META-INF DEPENDENCIES
prpdfbox-1.8.8.jar META-INF LICENSE
prpdfbox-1.8.8.jar META-INF MANIFEST.MF
prpdfbox-1.8.8.jar META-INF NOTICE
prpdfbox-1.8.8.jar META-INF/maven/org.apache.pdfbox/pdfbox pom.properties
prpdfbox-1.8.8.jar META-INF/maven/org.apache.pdfbox/pdfbox pom.xml
prpdfbox-1.8.8.jar META-INF/services java.nio.charset.spi.CharsetProvider
prpdfbox-1.8.8.jar _pegainf_ jar.signature
prpdfbox-1.8.8.jar com/pega/apache/pdfbox ConvertColorspace$1.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox ConvertColorspace$ColorSpaceInstance.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox ConvertColorspace.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox Decrypt.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox Encrypt.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox ExportFDF.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox ExportXFDF.class
[...more...]
So we can (approximately) use the Apache Javadocs for PDFBOX 1.8.8 to help us here - and we have to re-factor the package names to 'com.pega.apache'.
I know from having used this library before, that the likely place to find this functionality is in the 'PDDocument' class:
http://pdfbox.apache.org/docs/1.8.8/javadocs/org/apache/pdfbox/pdmodel/PDDocument.html
If you look at the code for 'HTMLTOPDF' you can see that it creates the PDF in memory as a BYTE array (Step 4 in the 7.1.8 version of the activity) like this:
//Get the PDF generation utility class PDFUtils pdfUtil=tools.getPDFUtils(); //BUG- 32651 Back-Ported HFix-2932 //Get the PDF bytes. byte[] byteArray = pdfUtil.generatePDF(HTMLStream,tools.getParameterPage()); if (byteArray == null || byteArray.length == 0) { oLog.error("PDFUtils did not return any content for HTMLtoPDF"); } else { //Put the byte array in a parameter tools.putParamValue("PDFDocument",byteArray); }
So if we use the 'Pass Current Parameter Page' in our Activity - we should have access to the Java Object 'PDFDocument' - which is an array of bytes (the PDF itself).
The following Java Step in a custom 'wrapper' Activity for HTMLTOPDF achieves what we want here:
//Get the byte array from the parameter page byte[] byteArray=(byte[])tools.getParameterPage().getParameterValue("PDFDocument"); java.io.ByteArrayInputStream bis=null; bis = new java.io.ByteArrayInputStream(byteArray); try { com.pega.apache.pdfbox.pdmodel.PDDocument doc=com.pega.apache.pdfbox.pdmodel.PDDocument.load( bis ); oLog.infoForced( "Page Count:" + doc.getNumberOfPages() ); } catch(Exception e) { throw new PRRuntimeException(e);} finally { if (bis!=null) { try { bis.close(); } catch(Exception e) { throw new PRRuntimeException(e); } } }
Here's a screenshot of my custom activity , with that Java Step expanded:
Running the Activity logs an entry to my logfile - you could change this to set a Property on a Page and/or the Parameter Page instead.
2015-07-15 15:13:02,364 [http-bio-7180-exec-4] [ STANDARD] [ | ] [ | GCSApp:01.01.01] (rtToPDF.GCS_GCSApp_Work.Action) INFO xxxxxx|xx.xx.xx.xx Admin@GCS - Page Count:1 |
NOTE: The code is an example only - I haven't paid much attention to correct stream-handling/exception-handling here !
Pegasystems Inc.
IN
If you want to get HTMLtoPDF to have page numbers you have to Pass Param.footer as "Page ${page}. On the same line, if you have to store the value on a property on clipboard, you can play with the same param and update the property accordingly.
I can also see an old PDN Forum discussion about the same kind of requirement : https://pdn.pega.com/forums/prpc/integration/need-page-numbers-in-the-pdf You can refer that as well.
Hope this helps.
Coforge BPM
US
Thank you for your response, however I have tried to use param.footer = "Page ${page}" and this does stamp the footer of each page with a page count. Unfortunately, that param.footer 's value remains unchanged (i.e: param.footer = "Page ${page}")
Somehow I need to utilize that functionality to return the page count, not only on the pdf itself, but as a property on the clipboard.
Thank you
Pegasystems Inc.
IN
Hello
Can you please share the screenshot of the activity that you are using here.
Coforge BPM
US
I am just using the HTMLToPDF activity at this point to generate the pdf byte stream.
Accepted Solution
Updated: 15 Jul 2015 10:37 EDT
Pegasystems Inc.
GB
Hi Steve,
You can utilize the 'pdfbox' PDF library within PRPC (I am using 7.1.8 here) to count the pages of the resultant PDF created by HTMLTOPDF.
First check your system to make sure you have the library:
select distinct(pzjar) from pr_engineclasses where upper(pzjar) like '%PDF%';
On my 7.1.8, I get this:
prpdfbox-1.8.8.jar
So this looks like a 're-badged' version of the original Apache Library - we had better check the classnames/package-structure:
So run:
select * from pr_engineclasses where pzjar='prpdfbox-1.8.8.jar'
This gives me the following - which shows the package-names have been refactored here (and possibly some of the implementation has changes as well - I don't know that though).
prpdfbox-1.8.8.jar META-INF DEPENDENCIES
prpdfbox-1.8.8.jar META-INF LICENSE
prpdfbox-1.8.8.jar META-INF MANIFEST.MF
Hi Steve,
You can utilize the 'pdfbox' PDF library within PRPC (I am using 7.1.8 here) to count the pages of the resultant PDF created by HTMLTOPDF.
First check your system to make sure you have the library:
select distinct(pzjar) from pr_engineclasses where upper(pzjar) like '%PDF%';
On my 7.1.8, I get this:
prpdfbox-1.8.8.jar
So this looks like a 're-badged' version of the original Apache Library - we had better check the classnames/package-structure:
So run:
select * from pr_engineclasses where pzjar='prpdfbox-1.8.8.jar'
This gives me the following - which shows the package-names have been refactored here (and possibly some of the implementation has changes as well - I don't know that though).
prpdfbox-1.8.8.jar META-INF DEPENDENCIES
prpdfbox-1.8.8.jar META-INF LICENSE
prpdfbox-1.8.8.jar META-INF MANIFEST.MF
prpdfbox-1.8.8.jar META-INF NOTICE
prpdfbox-1.8.8.jar META-INF/maven/org.apache.pdfbox/pdfbox pom.properties
prpdfbox-1.8.8.jar META-INF/maven/org.apache.pdfbox/pdfbox pom.xml
prpdfbox-1.8.8.jar META-INF/services java.nio.charset.spi.CharsetProvider
prpdfbox-1.8.8.jar _pegainf_ jar.signature
prpdfbox-1.8.8.jar com/pega/apache/pdfbox ConvertColorspace$1.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox ConvertColorspace$ColorSpaceInstance.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox ConvertColorspace.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox Decrypt.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox Encrypt.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox ExportFDF.class
prpdfbox-1.8.8.jar com/pega/apache/pdfbox ExportXFDF.class
[...more...]
So we can (approximately) use the Apache Javadocs for PDFBOX 1.8.8 to help us here - and we have to re-factor the package names to 'com.pega.apache'.
I know from having used this library before, that the likely place to find this functionality is in the 'PDDocument' class:
http://pdfbox.apache.org/docs/1.8.8/javadocs/org/apache/pdfbox/pdmodel/PDDocument.html
If you look at the code for 'HTMLTOPDF' you can see that it creates the PDF in memory as a BYTE array (Step 4 in the 7.1.8 version of the activity) like this:
//Get the PDF generation utility class PDFUtils pdfUtil=tools.getPDFUtils(); //BUG- 32651 Back-Ported HFix-2932 //Get the PDF bytes. byte[] byteArray = pdfUtil.generatePDF(HTMLStream,tools.getParameterPage()); if (byteArray == null || byteArray.length == 0) { oLog.error("PDFUtils did not return any content for HTMLtoPDF"); } else { //Put the byte array in a parameter tools.putParamValue("PDFDocument",byteArray); }
So if we use the 'Pass Current Parameter Page' in our Activity - we should have access to the Java Object 'PDFDocument' - which is an array of bytes (the PDF itself).
The following Java Step in a custom 'wrapper' Activity for HTMLTOPDF achieves what we want here:
//Get the byte array from the parameter page byte[] byteArray=(byte[])tools.getParameterPage().getParameterValue("PDFDocument"); java.io.ByteArrayInputStream bis=null; bis = new java.io.ByteArrayInputStream(byteArray); try { com.pega.apache.pdfbox.pdmodel.PDDocument doc=com.pega.apache.pdfbox.pdmodel.PDDocument.load( bis ); oLog.infoForced( "Page Count:" + doc.getNumberOfPages() ); } catch(Exception e) { throw new PRRuntimeException(e);} finally { if (bis!=null) { try { bis.close(); } catch(Exception e) { throw new PRRuntimeException(e); } } }
Here's a screenshot of my custom activity , with that Java Step expanded:
Running the Activity logs an entry to my logfile - you could change this to set a Property on a Page and/or the Parameter Page instead.
2015-07-15 15:13:02,364 [http-bio-7180-exec-4] [ STANDARD] [ | ] [ | GCSApp:01.01.01] (rtToPDF.GCS_GCSApp_Work.Action) INFO xxxxxx|xx.xx.xx.xx Admin@GCS - Page Count:1 |
NOTE: The code is an example only - I haven't paid much attention to correct stream-handling/exception-handling here !
-
Venkat Raman Malaiarasan
Updated: 15 Jul 2015 11:00 EDT
Pegasystems Inc.
GB
Additionally: I have logged a new FEEDBACK ITEM ('enhancement request'):
FDBK-11924 "New OUTPUT Parameter: Page Count (others?)"
The customer is using HTMLTOPDF and they want to know the number of pages that the resultant PDF contains - they need a value on the CLIPBOARD to examine afterwards.
Since PDFUtilsImpl seems to know the page count (it is able to add in the page count on each page, using the [undocumented?] Parameter 'footer' with the [undocumented?] string-format like 'page ${page} of ${total}' : could the engine class also set another output parameter of the total page count along with the actual Java Object 'PDFDocument' (byte array) ?
Maybe there are other useful bits of meta-data that could also be output after conversion -dunno.
So this feedback item will be reviewed by our Subject Matter Experts who will review the feasibility of including this feature in a future version of PRPC.
Cheers
John
Pegasystems Inc.
GB
I just double-checked your post - and I see you are using 62SP2 here - luckily 'pdfbox' also ships OOTB with this version - just at a lower version of the API:
prpdfbox-1.1.0.jar
I couldn't find an online hosted version of the Javadoc - but it can be downloaded and viewed here :http://jcenter.bintray.com/org/apache/pdfbox/pdfbox/1.1.0/pdfbox-1.1.0-javadoc.jar
Both methods of PDDocument (load, getNumberOfPages) are present in this version of the API as well - so it should work the same way (although I haven't tried this).
getNumberOfPages
public int getNumberOfPages()
Specified by:
getNumberOfPages in interface Pageable
load
public static PDDocument load(InputStream input)
throws IOException
This will load a document from an input stream.
Parameters:
input - The stream that contains the document.
Returns:
The document that was loaded.
Throws:
I just double-checked your post - and I see you are using 62SP2 here - luckily 'pdfbox' also ships OOTB with this version - just at a lower version of the API:
prpdfbox-1.1.0.jar
I couldn't find an online hosted version of the Javadoc - but it can be downloaded and viewed here :http://jcenter.bintray.com/org/apache/pdfbox/pdfbox/1.1.0/pdfbox-1.1.0-javadoc.jar
Both methods of PDDocument (load, getNumberOfPages) are present in this version of the API as well - so it should work the same way (although I haven't tried this).
getNumberOfPages
public int getNumberOfPages()
Specified by:
getNumberOfPages in interface Pageable
load
public static PDDocument load(InputStream input)
throws IOException
This will load a document from an input stream.
Parameters:
input - The stream that contains the document.
Returns:
The document that was loaded.
Throws:
IOException - If there is an error reading from the stream.
Coforge BPM
US
Thank you very much. I will try this and update later
Coforge BPM
US
It works! Thank you very much for the assistance!
use param.footer = "page ${page} of ${total}"