Need to read data from the doc and pdf
Hi,
I have a requirement where we get different types of files(doc,docx,pdf) on the filesystem path. Based on the filename input i will be pulling the file from the server and read through the contents of the files and copy it to the property whose control is rich text editor. I am able to read the file contents however, the format, alignment, images, tables are coming as text and displaying the data without any format or alignment. I am reading the file from java code as we have to pic only a specific file which cannot be achieved by file listener.
Please suggest if there is any approach or do i need to modify in my code copied below.
This is for PDF.
com.pega.apache.pdfbox.util.PDFTextStripper pdfStripper = null;
com.pega.apache.pdfbox.pdmodel.PDDocument pdDoc=null;
com.pega.apache.pdfbox.cos.COSDocument cosDoc = null;
ParameterPage pp = tools.getParameterPage();
try{
String filePath = pp.getString("FullFilePathName");
//java.io.File file = new java.io.File(filePath);
PRFile prfCheck = new PRFile(filePath);
// java.io.FileInputStream fis=null;
PRInputStream fis = null;
com.pega.apache.pdfbox.pdfparser.PDFParser parser = new com.pega.apache.pdfbox.pdfparser.PDFParser(new PRInputStream(prfCheck));
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new com.pega.apache.pdfbox.util.PDFTextStripper();
pdDoc = new com.pega.apache.pdfbox.pdmodel.PDDocument(cosDoc);
Hi,
I have a requirement where we get different types of files(doc,docx,pdf) on the filesystem path. Based on the filename input i will be pulling the file from the server and read through the contents of the files and copy it to the property whose control is rich text editor. I am able to read the file contents however, the format, alignment, images, tables are coming as text and displaying the data without any format or alignment. I am reading the file from java code as we have to pic only a specific file which cannot be achieved by file listener.
Please suggest if there is any approach or do i need to modify in my code copied below.
This is for PDF.
com.pega.apache.pdfbox.util.PDFTextStripper pdfStripper = null;
com.pega.apache.pdfbox.pdmodel.PDDocument pdDoc=null;
com.pega.apache.pdfbox.cos.COSDocument cosDoc = null;
ParameterPage pp = tools.getParameterPage();
try{
String filePath = pp.getString("FullFilePathName");
//java.io.File file = new java.io.File(filePath);
PRFile prfCheck = new PRFile(filePath);
// java.io.FileInputStream fis=null;
PRInputStream fis = null;
com.pega.apache.pdfbox.pdfparser.PDFParser parser = new com.pega.apache.pdfbox.pdfparser.PDFParser(new PRInputStream(prfCheck));
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new com.pega.apache.pdfbox.util.PDFTextStripper();
pdDoc = new com.pega.apache.pdfbox.pdmodel.PDDocument(cosDoc);
String parsedText = pdfStripper.getText(pdDoc);
tools.putParamValue("ContentSourceAuthored",parsedText);
}catch(Exception e) {
throw new PRRuntimeException("Unable to read file '"+e);
}