Support Center

Question

stevenh5

Member since 2014

6 posts

Coforge BPM

Posted: Jul 14, 2015

Last activity: Jul 21, 2017

Posted: 14 Jul 2015 18:01 EDT
Last activity: 21 Jul 2017 10:46 EDT

Closed

Solved

PDF Page Count

Report

I am currently creating PDF documents with the HTMLToPDF activity. This is working as expected, however, one of my requirements is that I need a page count of the PDF. And not the page count on the footer of a PDF (page x of y). What I really need is a way to store the amount of pages as a property. For example, if the PDF document I just created has 10 pages, I would like to set some property on the clipboard to 10.

Any help on this subject would be greatly appreciated,

Thanks!

PS: I am using PRPC 6.2 service pack 2

***Updated by moderator: Lochan to add Categories***

***Updated by moderator: Lochan to close post***
This post has been archived for educational purposes. Contents and links will no longer be updated. If you have the same/similar question, please write a new post.

To see attachments, please log in.

Data Integration

User Experience

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Accepted Solution

Posted: 10 years ago

Updated: 10 years ago

Posted: 15 Jul 2015 10:30 EDT
Updated: 15 Jul 2015 10:37 EDT

JOHNPW_GCS replied to stevenh5

Report

Hi Steve,

You can utilize the 'pdfbox' PDF library within PRPC (I am using 7.1.8 here) to count the pages of the resultant PDF created by HTMLTOPDF.

First check your system to make sure you have the library:

select distinct(pzjar) from pr_engineclasses where upper(pzjar) like '%PDF%';

On my 7.1.8, I get this:

prpdfbox-1.8.8.jar

So this looks like a 're-badged' version of the original Apache Library - we had better check the classnames/package-structure:

So run:

select * from pr_engineclasses where pzjar='prpdfbox-1.8.8.jar'

This gives me the following - which shows the package-names have been refactored here (and possibly some of the implementation has changes as well - I don't know that though).

prpdfbox-1.8.8.jar    META-INF    DEPENDENCIES

prpdfbox-1.8.8.jar    META-INF    LICENSE

prpdfbox-1.8.8.jar    META-INF    MANIFEST.MF

Hi Steve,

You can utilize the 'pdfbox' PDF library within PRPC (I am using 7.1.8 here) to count the pages of the resultant PDF created by HTMLTOPDF.

First check your system to make sure you have the library:

select distinct(pzjar) from pr_engineclasses where upper(pzjar) like '%PDF%';

On my 7.1.8, I get this:

prpdfbox-1.8.8.jar

So this looks like a 're-badged' version of the original Apache Library - we had better check the classnames/package-structure:

So run:

select * from pr_engineclasses where pzjar='prpdfbox-1.8.8.jar'

This gives me the following - which shows the package-names have been refactored here (and possibly some of the implementation has changes as well - I don't know that though).

prpdfbox-1.8.8.jar    META-INF    DEPENDENCIES

prpdfbox-1.8.8.jar    META-INF    LICENSE

prpdfbox-1.8.8.jar    META-INF    MANIFEST.MF

prpdfbox-1.8.8.jar    META-INF    NOTICE

prpdfbox-1.8.8.jar    META-INF/maven/org.apache.pdfbox/pdfbox    pom.properties

prpdfbox-1.8.8.jar    META-INF/maven/org.apache.pdfbox/pdfbox    pom.xml

prpdfbox-1.8.8.jar    META-INF/services    java.nio.charset.spi.CharsetProvider

prpdfbox-1.8.8.jar    _pegainf_    jar.signature

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    ConvertColorspace$1.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    ConvertColorspace$ColorSpaceInstance.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    ConvertColorspace.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    Decrypt.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    Encrypt.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    ExportFDF.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    ExportXFDF.class

[...more...]

So we can (approximately) use the Apache Javadocs for PDFBOX 1.8.8 to help us here - and we have to re-factor the package names to 'com.pega.apache'.

I know from having used this library before, that the likely place to find this functionality is in the 'PDDocument' class:

http://pdfbox.apache.org/docs/1.8.8/javadocs/org/apache/pdfbox/pdmodel/PDDocument.html

If you look at the code for 'HTMLTOPDF' you can see that it creates the PDF in memory as a BYTE array (Step 4 in the 7.1.8 version of the activity) like this:

//Get the PDF generation utility class
PDFUtils pdfUtil=tools.getPDFUtils();

//BUG- 32651 Back-Ported HFix-2932
//Get the PDF bytes.
byte[] byteArray = pdfUtil.generatePDF(HTMLStream,tools.getParameterPage());

if (byteArray == null || byteArray.length == 0)
{
    oLog.error("PDFUtils did not return any content for HTMLtoPDF");
}
else
{
    //Put the byte array in a parameter
    tools.putParamValue("PDFDocument",byteArray);
}

So if we use the 'Pass Current Parameter Page' in our Activity - we should have access to the Java Object 'PDFDocument' - which is an array of bytes (the PDF itself).

The following Java Step in a custom 'wrapper' Activity for HTMLTOPDF achieves what we want here:

//Get the byte array from the parameter page
byte[] byteArray=(byte[])tools.getParameterPage().getParameterValue("PDFDocument");
java.io.ByteArrayInputStream bis=null;

bis = new java.io.ByteArrayInputStream(byteArray);

try {
com.pega.apache.pdfbox.pdmodel.PDDocument doc=com.pega.apache.pdfbox.pdmodel.PDDocument.load( bis );
oLog.infoForced( "Page Count:" + doc.getNumberOfPages() );

}
catch(Exception e) { throw new PRRuntimeException(e);}
finally {
  if (bis!=null) {
    try { bis.close(); }
    catch(Exception e) { throw new PRRuntimeException(e); }
}
}

Here's a screenshot of my custom activity , with that Java Step expanded:

Running the Activity logs an entry to my logfile - you could change this to set a Property on a Page and/or the Parameter Page instead.

2015-07-15 15:13:02,364 [http-bio-7180-exec-4] [ STANDARD] [

] [

GCSApp:01.01.01] (rtToPDF.GCS_GCSApp_Work.Action) INFO xxxxxx|xx.xx.xx.xx Admin@GCS - Page Count:1

NOTE: The code is an example only - I haven't paid much attention to correct stream-handling/exception-handling here !

Show Less

View reply inline

To see attachments, please log in.

Posted: 10 years ago

Posted: 15 Jul 2015 5:16 EDT

Santanu replied to stevenh5

Report

If you want to get HTMLtoPDF to have page numbers you have to Pass Param.footer as "Page ${page}. On the same line, if you have to store the value on a property on clipboard, you can play with the same param and update the property accordingly.

I can also see an old PDN Forum discussion about the same kind of requirement : https://pdn.pega.com/forums/prpc/integration/need-page-numbers-in-the-pdf You can refer that as well.

Hope this helps.

To see attachments, please log in.

Like (0)

Posted: 10 years ago

Posted: 15 Jul 2015 9:54 EDT

stevenh5

Coforge BPM

replied to Santanu

Report

Thank you for your response, however I have tried to use param.footer = "Page ${page}" and this does stamp the footer of each page with a page count. Unfortunately, that param.footer 's value remains unchanged (i.e: param.footer = "Page ${page}")

Somehow I need to utilize that functionality to return the page count, not only on the pdf itself, but as a property on the clipboard.

Thank you

To see attachments, please log in.

Like (0)

Posted: 10 years ago

Posted: 15 Jul 2015 10:02 EDT

Santanu replied to stevenh5

Report

Hello

Can you please share the screenshot of the activity that you are using here.

To see attachments, please log in.

Like (0)

Posted: 10 years ago

Posted: 15 Jul 2015 10:06 EDT

stevenh5

Coforge BPM

replied to Santanu

Report

I am just using the HTMLToPDF activity at this point to generate the pdf byte stream.

To see attachments, please log in.

Like (0)

Accepted Solution

Posted: 10 years ago

Updated: 10 years ago

Posted: 15 Jul 2015 10:30 EDT
Updated: 15 Jul 2015 10:37 EDT

JOHNPW_GCS replied to stevenh5

Report

Hi Steve,

You can utilize the 'pdfbox' PDF library within PRPC (I am using 7.1.8 here) to count the pages of the resultant PDF created by HTMLTOPDF.

First check your system to make sure you have the library:

select distinct(pzjar) from pr_engineclasses where upper(pzjar) like '%PDF%';

On my 7.1.8, I get this:

prpdfbox-1.8.8.jar

So this looks like a 're-badged' version of the original Apache Library - we had better check the classnames/package-structure:

So run:

select * from pr_engineclasses where pzjar='prpdfbox-1.8.8.jar'

This gives me the following - which shows the package-names have been refactored here (and possibly some of the implementation has changes as well - I don't know that though).

prpdfbox-1.8.8.jar    META-INF    DEPENDENCIES

prpdfbox-1.8.8.jar    META-INF    LICENSE

prpdfbox-1.8.8.jar    META-INF    MANIFEST.MF

Hi Steve,

You can utilize the 'pdfbox' PDF library within PRPC (I am using 7.1.8 here) to count the pages of the resultant PDF created by HTMLTOPDF.

First check your system to make sure you have the library:

select distinct(pzjar) from pr_engineclasses where upper(pzjar) like '%PDF%';

On my 7.1.8, I get this:

prpdfbox-1.8.8.jar

So this looks like a 're-badged' version of the original Apache Library - we had better check the classnames/package-structure:

So run:

select * from pr_engineclasses where pzjar='prpdfbox-1.8.8.jar'

This gives me the following - which shows the package-names have been refactored here (and possibly some of the implementation has changes as well - I don't know that though).

prpdfbox-1.8.8.jar    META-INF    DEPENDENCIES

prpdfbox-1.8.8.jar    META-INF    LICENSE

prpdfbox-1.8.8.jar    META-INF    MANIFEST.MF

prpdfbox-1.8.8.jar    META-INF    NOTICE

prpdfbox-1.8.8.jar    META-INF/maven/org.apache.pdfbox/pdfbox    pom.properties

prpdfbox-1.8.8.jar    META-INF/maven/org.apache.pdfbox/pdfbox    pom.xml

prpdfbox-1.8.8.jar    META-INF/services    java.nio.charset.spi.CharsetProvider

prpdfbox-1.8.8.jar    _pegainf_    jar.signature

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    ConvertColorspace$1.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    ConvertColorspace$ColorSpaceInstance.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    ConvertColorspace.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    Decrypt.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    Encrypt.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    ExportFDF.class

prpdfbox-1.8.8.jar    com/pega/apache/pdfbox    ExportXFDF.class

[...more...]

So we can (approximately) use the Apache Javadocs for PDFBOX 1.8.8 to help us here - and we have to re-factor the package names to 'com.pega.apache'.

I know from having used this library before, that the likely place to find this functionality is in the 'PDDocument' class:

http://pdfbox.apache.org/docs/1.8.8/javadocs/org/apache/pdfbox/pdmodel/PDDocument.html

If you look at the code for 'HTMLTOPDF' you can see that it creates the PDF in memory as a BYTE array (Step 4 in the 7.1.8 version of the activity) like this:

//Get the PDF generation utility class
PDFUtils pdfUtil=tools.getPDFUtils();

//BUG- 32651 Back-Ported HFix-2932
//Get the PDF bytes.
byte[] byteArray = pdfUtil.generatePDF(HTMLStream,tools.getParameterPage());

if (byteArray == null || byteArray.length == 0)
{
    oLog.error("PDFUtils did not return any content for HTMLtoPDF");
}
else
{
    //Put the byte array in a parameter
    tools.putParamValue("PDFDocument",byteArray);
}

So if we use the 'Pass Current Parameter Page' in our Activity - we should have access to the Java Object 'PDFDocument' - which is an array of bytes (the PDF itself).

The following Java Step in a custom 'wrapper' Activity for HTMLTOPDF achieves what we want here:

//Get the byte array from the parameter page
byte[] byteArray=(byte[])tools.getParameterPage().getParameterValue("PDFDocument");
java.io.ByteArrayInputStream bis=null;

bis = new java.io.ByteArrayInputStream(byteArray);

try {
com.pega.apache.pdfbox.pdmodel.PDDocument doc=com.pega.apache.pdfbox.pdmodel.PDDocument.load( bis );
oLog.infoForced( "Page Count:" + doc.getNumberOfPages() );

}
catch(Exception e) { throw new PRRuntimeException(e);}
finally {
  if (bis!=null) {
    try { bis.close(); }
    catch(Exception e) { throw new PRRuntimeException(e); }
}
}

Here's a screenshot of my custom activity , with that Java Step expanded:

Running the Activity logs an entry to my logfile - you could change this to set a Property on a Page and/or the Parameter Page instead.

2015-07-15 15:13:02,364 [http-bio-7180-exec-4] [ STANDARD] [

] [

GCSApp:01.01.01] (rtToPDF.GCS_GCSApp_Work.Action) INFO xxxxxx|xx.xx.xx.xx Admin@GCS - Page Count:1

NOTE: The code is an example only - I haven't paid much attention to correct stream-handling/exception-handling here !

Show Less

To see attachments, please log in.

Likes (1)

Venkat Raman Malaiarasan

Posted: 10 years ago

Updated: 10 years ago

Posted: 15 Jul 2015 10:59 EDT
Updated: 15 Jul 2015 11:00 EDT

JOHNPW_GCS replied to JOHNPW_GCS

Report

Additionally: I have logged a new FEEDBACK ITEM ('enhancement request'):

FDBK-11924 "New OUTPUT Parameter: Page Count (others?)"

The customer is using HTMLTOPDF and they want to know the number of pages that the resultant PDF contains - they need a value on the CLIPBOARD to examine afterwards.

Since PDFUtilsImpl seems to know the page count (it is able to add in the page count on each page, using the [undocumented?] Parameter 'footer' with the [undocumented?] string-format like 'page ${page} of ${total}' : could the engine class also set another output parameter of the total page count along with the actual Java Object 'PDFDocument' (byte array) ?

Maybe there are other useful bits of meta-data that could also be output after conversion -dunno.

So this feedback item will be reviewed by our Subject Matter Experts who will review the feasibility of including this feature in a future version of PRPC.

Cheers

John

To see attachments, please log in.

Like (0)

Posted: 10 years ago

Posted: 15 Jul 2015 11:12 EDT

JOHNPW_GCS replied to JOHNPW_GCS

Report

I just double-checked your post - and I see you are using 62SP2 here - luckily 'pdfbox' also ships OOTB with this version - just at a lower version of the API:

prpdfbox-1.1.0.jar

I couldn't find an online hosted version of the Javadoc - but it can be downloaded and viewed here :http://jcenter.bintray.com/org/apache/pdfbox/pdfbox/1.1.0/pdfbox-1.1.0-javadoc.jar

Both methods of PDDocument (load, getNumberOfPages) are present in this version of the API as well - so it should work the same way (although I haven't tried this).

getNumberOfPages

public int getNumberOfPages()

Specified by:

getNumberOfPages in interface Pageable

load

public static PDDocument load(InputStream input)

                       throws IOException

    This will load a document from an input stream.

    Parameters:

        input - The stream that contains the document.

    Returns:

        The document that was loaded.

    Throws:

I just double-checked your post - and I see you are using 62SP2 here - luckily 'pdfbox' also ships OOTB with this version - just at a lower version of the API:

prpdfbox-1.1.0.jar

I couldn't find an online hosted version of the Javadoc - but it can be downloaded and viewed here :http://jcenter.bintray.com/org/apache/pdfbox/pdfbox/1.1.0/pdfbox-1.1.0-javadoc.jar

Both methods of PDDocument (load, getNumberOfPages) are present in this version of the API as well - so it should work the same way (although I haven't tried this).

getNumberOfPages

public int getNumberOfPages()

Specified by:

getNumberOfPages in interface Pageable

load

public static PDDocument load(InputStream input)

                       throws IOException

    This will load a document from an input stream.

    Parameters:

        input - The stream that contains the document.

    Returns:

        The document that was loaded.

    Throws:

        IOException - If there is an error reading from the stream.