Support Center

Question

moryh

Member since 2017

7 posts

PEGA

Posted: Mar 26, 2019

Last activity: Apr 13, 2019

Posted: 26 Mar 2019 5:32 EDT
Last activity: 13 Apr 2019 8:41 EDT

Closed

Solved

NLP With Ruta Script

Report

I have created a Decision Data rule for entity extraction. I am performing NLP using RUTA script in pega. My requirement is to extract policy number from an email.

S- Represents Alphanumeric A- Represents Numeric

Policy Number has format: 1)With Hyphen SS-SSSSSSS-AAA 2)Without Hyphen SS SSSSSSS AAA 3)Without Spaces SSSSSSSSSAAA 4)Optionally This policy number can be prefixed with 1 also.So 1SS-SSSSSSS-AAA, 1SS SSSSSSS AAA and 1SSSSSSSSSAAA are also valid combination.

So policy number has 3 parts; 1st part is of length 2(SS), 2nd part is of length 7(SSSSSSS) and third part is of length 3(AAA). And optionally "1" is fourth part which would be prefixed to policy number.

I have written a script for this but its not working for combination in which policy number is prefixed with 1.

Below is code from script:

PACKAGE uima.ruta.example;
Document{-> RETAINTYPE(SPACE)};

DECLARE VarA;
DECLARE VarC;
DECLARE VarE;


("1")? W{REGEXP(".{2}")} ("-"|SPACE)? ((W* NUM* W* NUM* W* NUM* W*)|(NUM* W* NUM* W* NUM* W* NUM*)){REGEXP(".{7}")} ("-"|SPACE)? W{REGEXP(".{3}")->MARK(EntityType,1,6)};


(W* NUM*){REGEXP(".{2}")} ("-"|SPACE)? ((W* NUM* W* NUM* W* NUM* W*)|(NUM* W* NUM* W* NUM* W* NUM*)){REGEXP(".{7}")} ("-"|SPACE)? W{REGEXP(".{3}")->MARK(EntityType,1,5)};

((W|NUM)(NUM|W)*){REGEXP("(?i)\\b[1]{0,1}[A-Z0-9]{2}[A-Z0-9]{7}[A-Z]{3}\\b" )->MARK(EntityType)};

I have created a Decision Data rule for entity extraction. I am performing NLP using RUTA script in pega. My requirement is to extract policy number from an email.

S- Represents Alphanumeric A- Represents Numeric

I have written a script for this but its not working for combination in which policy number is prefixed with 1.

Below is code from script:

PACKAGE uima.ruta.example;
Document{-> RETAINTYPE(SPACE)};

DECLARE VarA;
DECLARE VarC;
DECLARE VarE;


("1")? W{REGEXP(".{2}")} ("-"|SPACE)? ((W* NUM* W* NUM* W* NUM* W*)|(NUM* W* NUM* W* NUM* W* NUM*)){REGEXP(".{7}")} ("-"|SPACE)? W{REGEXP(".{3}")->MARK(EntityType,1,6)};


(W* NUM*){REGEXP(".{2}")} ("-"|SPACE)? ((W* NUM* W* NUM* W* NUM* W*)|(NUM* W* NUM* W* NUM* W* NUM*)){REGEXP(".{7}")} ("-"|SPACE)? W{REGEXP(".{3}")->MARK(EntityType,1,5)};

((W|NUM)(NUM|W)*){REGEXP("(?i)\\b[1]{0,1}[A-Z0-9]{2}[A-Z0-9]{7}[A-Z]{3}\\b" )->MARK(EntityType)};

Valid Policy Numbers: AB-CD123EF-GHI, 1AB-CD123EF-GHI, ABCD123EFGHI, 23-456ABC7-GHI, 123-456ABC7-GHI, 1A3-456ABC7-GHI, 12A-456ABC7-GHI etc..

i am not able to handle 123-456ABC7-GHI, 1A3-456ABC7-GHI, 12A-456ABC7-GHI these combination.

Please help to write correct script that cover all possible combination. Thanks in advance.

Show Less

To see attachments, please log in.

Pega Intelligent Virtual Assistant

Conversational Channels

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Accepted Solution

Posted: 6 years ago

Posted: 13 Apr 2019 8:40 EDT

moryh

PEGA

replied to moryh

Report

The UIMA Ruta seed annotation NUM or W, covers the whole number or Word. Therefore, examples like
23456, 123456 cannot be split in subannotations by Ruta.
A solution would be to use pure regexp to annotate all the mentioned examples

"\\w{2,3}[\\-|\\s]?\\w{2,3}" -> EntityType;

To see attachments, please log in.

Likes (3)

Siddhant Suryakant Jivane Dave Grunbaum Raghunath Mahakud

Question

NLP With Ruta Script

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

NLP With Ruta Script

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.