Question
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
Last activity: 28 Oct 2022 5:38 EDT
OCR is installed in Linux but error persists
@Piotr Skowronek Hi Piotr,
I have a use case where I am taking input from the user to upload a PDF document that has the Customer name, customer address, and SSN, With a click of a button, I should be able to see that these values have been extracted from the PDF. Can this be done using Pega OCR?
Regarding the installation, I have used a Linux system now. Installation has been successful
Branched Information: Originally posted as reply here.
-
Like (0)
-
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Updated: 5 Jul 2022 15:35 EDT
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
Out of the box solution integrates with Email Channel to textualize PDF attachments.
Out of the box solution integrates with Email Channel to textualize PDF attachments.
As it comes to your case, you could invoke the following activity Work-Channel-Triage-Email#pyOCRTextExtractor directly (it requires some parameters like: fileName, base64source (base64 file content), extractedText - this is where the extracted text will put to). Then you would need to run NLP on the extracted content or write regexp expressions to get required values.
-
Alekhyaprasanna Neturu
Updated: 5 Jul 2022 15:35 EDT
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr Skowronek Hello, We tried to use the activity mentioned in the above post, I see the success of this activity but I don't see any value being populated in the extracted text parameter. Is there anything else which we need to do? Thanks!
Updated: 5 Jul 2022 15:35 EDT
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
Pls provide more details. Have you installed Pega OCR and its required software on you linux server?
Anything interesting in the logs?
Updated: 5 Jul 2022 15:35 EDT
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr SkowronekYEs, OCR is installed in the linux server. I have observed the below error
Something went wrong during text OCR: java.lang.UnsatisfiedLinkError: Can't load library: /opt/ABBYY/FREngine12/Bin/libFREngine.Jni.so We got this error in the logs but I have got the path corrected in the DT but unable to resolve this specific issue which is seen in the logs. Any work arounds for this?
Updated: 5 Jul 2022 15:35 EDT
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
pls properly install the required software on the linux box, following the instructions: https://docs-previous.pega.com/installing-pega-ocr-component
Updated: 5 Jul 2022 15:35 EDT
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
INC-231473 | has been raised for the issue. Can we pls get in touch to sort it out as part of the incident? |
data:image/s3,"s3://crabby-images/2ab3a/2ab3aef3b1f0c97b995c3b50c32aea92c6fc49ab" alt=""
data:image/s3,"s3://crabby-images/2ab3a/2ab3aef3b1f0c97b995c3b50c32aea92c6fc49ab" alt=""
Capgemini
IN
Have you installed engine on the /opt/ path ?
Looks like you installed on the wrong path.please try after installing on the /opt/
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Bharath.Suntigikar Looks like this aligns to what is expected. Do you see any discrepency?
Updated: 7 Jul 2022 5:28 EDT
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
I see 'abby' in the paths.... are you sure you have installed it under /opt/abby? Not for example under '/opt/abbyy' or '/opt/ABBYY' (the latter one is default). If you changed the paths, then have you also updated 'configureAbbyyFREngine' rule to reflect the paths?
Another question is whether you have updated ld.so.conf with correct paths, and have you restarted the system afterwards?
Please contact your linux system administrator for advise - the prerequisite steps may require some linux administrator skills as each linux flavor and linux setup may be different.
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr Skowronek Hello, here are my observations :
the install path is /opt/ABBYY, here is the screenshot for the same
have you also updated 'configureAbbyyFREngine' rule to reflect the paths? Path is updated as per the current installation folders
whether you have updated ld.so.conf with correct paths, there is no change required, as the correct path is already there in the required file
have you restarted the system afterwards? Yes we have restarted the system. Pls let me know if you see any discrepencies with my response.
Updated: 7 Jul 2022 8:10 EDT
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
So far so good.
1) were there any errors thrown by installation script? Do you have the console output? At the end of the script there is sanity check being done - please take look. The installation script also has 'check' command (run it with `-?` to see the allowed commands and params)
2) with what parameters have you run the installation script?
3) is the error still the same in pega logs while trying to execute ocr?
4) does `/opt/ABBYY/FREngine12/Bin/libFREngine.Jni.so` exist?
5) what linux distro is it?
6) yesterday you've pasted a screenshot of env where '/opt/abby' was present - what was that?
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr Skowronek Hello,
Thanks for a quick response. I am attaching the document with the details. Pls let me know if you need any more information.
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
Thanks for the answers. It looks good.
Please check access rights to both folder and files under /opt/ABBYY/FREngine12/Bin/
Each folder on every level must have +x *), finally what are access rights for the files in this directory - any files under /opt/ABBYY/FREngine12/ must have +r *).
*) for user (or group) that is running your application server (is it Tomcat?).
In other words, please provide result of running 'ls -la /opt/ABBYY/FREngine12/Bin'
Depending which application server you are using, it might require relaxing filesystem access (especially if it was blocked by configuration).
Updated: 11 Jul 2022 2:44 EDT
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr Skowronek Hello, here is the output of command =~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2022.07.11 10:44:46 =~=~=~=~=~=~=~=~=~=~=~= clearls -la /opt/ABBYY/FREngine12/Bin/ total 762204 drwxr-xr-x. 4 root root 12288 Jul 7 07:42 [0m[01;34m.[0m
Updated: 12 Jul 2022 6:26 EDT
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
"Please check access rights to both folder and files under /opt/ABBYY/FREngine12/Bin/" - I only see access rights for folder, can you check access rights on files in it?
Additionally, for each parent folder (/opt/, /opt/ABBYY, /opt/FREngine12/) you have to do the same.
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr Skowronek Hello pls find the log attached
Updated: 12 Jul 2022 8:20 EDT
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
Permissions look good.
1) What's your application server? Is it Tomcat? Is it being run natively or dockerized?
2) Is the application server reconfigured to block access to filesystem?
3) Can you check if your antivirus (CrowdStrike?) is not interfering with java trying to access ABBYY?
4) Pls check if for example `apparmor` (or anything similar) is not blocking the access (you may want to observe /var/log/syslog, /var/log/audit/audit.log or similar).
-
Alekhyaprasanna Neturu
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr SkowronekHere are the details
1) What's your application server? Is it Tomcat? Is it being run natively or dockerized? - Tomcat is the application server, running natively (not dockerized)
2) Is the application server reconfigured to block access to the filesystem? - it's a vanilla installation not reconfigured to block access to the filesystem
3) Can you check if your antivirus (CrowdStrike?) is not interfering with java trying to access ABBYY? This is configured to alert, and not block any access
4) Pls check if for example `apparmor` (or anything similar) is not blocking the access (you may want to observe /var/log/syslog, /var/log/audit/audit.log or similar). - What exactly do we need to check in these log file? Is it some deadlock or process interference kind of thing?
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
ad 3) Can you temporarily turn off AV and check?
ad 4) If `apparmor` or similar software is turned on linux (not sure if on CentOS if turned on by defaylt, but on Ubuntu I guess it is), then this app my block access to files. In those logs you should check if there's anything about Abbyy or libFREngine.Jni.so - for example if `apparmor` is blocking the access then you should see in logs entries about it - https://www.unix.com/man-page/centos/7/apparmor/).
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr Skowronek 3) Can you temporarily turn off AV and check? - this is controlled by LTI Cloud team, may not be possible to make any changes to this
4) SELINUX is equivalent of apparmor in Centos, currently, it is in a disabled state.There is no log in the audit.log file for ABBYY or libFREngine.Jni.so
Updated: 13 Jul 2022 5:19 EDT
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
Please grep thru all the logs in /var/log and /var/log/*/ (and use -i .... otherwise 'Abbyy' pattern will not match all possible upper/lower case variations).
Please consult your linux administrator why it can be that java is unable to find the library in the system and eventually where to see any logs if the access was denied.
Please verify that libraries are properly registered in the system (https://stackoverflow.com/questions/9922949/how-to-print-the-ldlinker-search-path): `ld --verbose | grep -i abbyy` and `ldconfig -v 2>/dev/null | grep -i abbyy`.
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Skowronek Hi Piotr,
Here are the screen shots which our linux admin has sent
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
ad 1st screenshot - please do so recursively including subfolders.
ad 2nd screenshot - definitely, there should be more .so libraries from abbyy over there. Can you re-run with 'ldconfig -v' and provide the whole output?
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr Skowronek
1. recursive checks
Attaching the log for #2
Updated: 14 Jul 2022 9:37 EDT
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
I've just taken a magnifying glass and I guess I can see unwanted space in the path? Can you make sure the paths in DT are really correct and don't have any unwanted spaces?
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr Skowronek Thanks a lot, sometimes we miss on the minor things and your catch was really helpful. We are able to proceed.
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
data:image/s3,"s3://crabby-images/d85a6/d85a647b7cc0121b15a4e4d7be9be3c03a9f0972" alt=""
Pegasystems Inc.
PL
Great! Good luck!
-
Alekhyaprasanna Neturu
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
data:image/s3,"s3://crabby-images/3e2ad/3e2ad369b8636d90ca20fef8b63443eb8499e664" alt=""
Booking.com
IN
@Piotr Skowronek Hello, we are trying to extract data from OCR - I have uploaded a PDF which has data like
In short what I want to convey is the data extracted isn't the way its in PDF. why is it so? The side headings are added at the start and also the order of the text is being mixed up.
Is there anything we can do to resolve this issue or do you suggest any customizations?