Steps to use partitioning in pega marketing

Question

ABHINANDAN

Member since 2011

32 posts

Areteans Technology Solutions

Posted: Dec 31, 2017

Last activity: Nov 20, 2018

Posted: 31 Dec 2017 12:40 EST
Last activity: 20 Nov 2018 13:19 EST

Closed

Solved

Steps to use partitioning in pega marketing

Report

Our client's current pega marketing 7.12 on base pega 7.1.7 application has performance issues, as in to run a campaign and send offer SMS to a target customer population of 1.3 million it takes 7-8 hours may be more, no partitioning was implemented then.

So post upgarde to pega marketing 7.22 on base pega 7.3 we decided to use partition so to ensure parallel processing, we followed the below steps

1. We used database partitioning (underlying DB is oracle) for Customer table, Installed base table and defined 10 partitions in application settings, considering we will be having 4 jvms on two physical boxes hence 2*4 =8, hence kept 10 partitions, partition key ranging from 0-9 for both customer and Installed base.

2. Resaved all our existing segments, used partition keys in all data sets.

3. Added all the nodes in Data Flow service.

4. Also for Interaction history partitions we followed the instructions in the link https://pdn.pega.com/interaction-history-partitions/interaction-history-partitions and defined around 16 partitions, partition key ranging from 0-15.

I have following questions

1. Are there any other steps we need to follow to implement for partitioning?

2. Should the partition range be kept same i.e 0-9 for Interaction history? I mean keeping different partition key range for Customer and Intercation history will it create any problem?

So post upgarde to pega marketing 7.22 on base pega 7.3 we decided to use partition so to ensure parallel processing, we followed the below steps

2. Resaved all our existing segments, used partition keys in all data sets.

3. Added all the nodes in Data Flow service.

I have following questions

1. Are there any other steps we need to follow to implement for partitioning?

2. Should the partition range be kept same i.e 0-9 for Interaction history? I mean keeping different partition key range for Customer and Intercation history will it create any problem?

3. How to test if partitioning is working in development environment? In dev env we have two nodes, created a segment containing avg 100 customers belonging to different partitions, my expectation if I print customer id and partition keys in pega logs I will get half of the segment customer is will be printed in node 1 log and half in node 2 logs. But that's not how it's happening most of the time all the customer ids are printed in node 1 and some time few of them in node 2, some times I can see same customer ids are printed in both logs. Not sure how partitioning behaves, need some clarification, how to test the performance gains from it.

4. If partitioning works corrrectly as expected in production having 4 nodes, then can we expect 1/4 the time it takes currently to run a campaign for 1.3 million i.e 1/4 * 8 = 2 hours or less.

5. If tomorrow we increase number of nodes and want to increase partitioning range to 0-16, do we need to resave all segmemts(100+) again?

Regards

Abhi

Show Less

To see attachments, please log in.

Pega Marketing

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 7 years ago

Posted: 2 Jan 2018 5:02 EST

ManuVarghese

PEGA

replied to ABHINANDAN

Report

Hi Abhi,

1. i have following comments on below point.

"1. DB partitioning" - the number of partitions seems less, and i think it is not just a factor of how many nodes you have but also how powerful your hardware is, in terms of cpu, memory and disk IO, so that you could run multiple threads per node.
The Dataflow execution splits the amount of work into multiple assignments and one of the factor is partition key, more number of partitions the better. however below is the formula that will limit how many assignents will get created, so you can achieve better parallelism if number of partitions are slightly higher than below.

nos of Assignments = number of nodes in data flow cluster * number of configured threads * 2

"2. Resaved all our existing segments, used partition keys in all data sets.". - When you change the partition mode, you need to recreate the segments from scratch (instead of resave). To validate if the segment is picking partition information, please check the output Segment table, they should now have 2 columns customerid and partition key.

2. i am not sure of any relation between partitions on Customer and IH fact table.

Hi Abhi,

1. i have following comments on below point.

nos of Assignments = number of nodes in data flow cluster * number of configured threads * 2

2. i am not sure of any relation between partitions on Customer and IH fact table.

3. & 4. one method i can suggest to troubleshoot Campaign performance problems is to enable debug logging level on "ExecuteDDFInBatch" actitivy.
The campaign execution runs a dataflow in the background and when DF run completes successfully it deletes it.
Enabling debug on "ExecuteDDFInBatch" activity prevents this deletion.
Hence when the campaign completes, you can open the corresponding PR-xxx Dataflow run from Designer Studio -> Decisioning ->Decisions ->Dataflows ->Batch Processing landing page.
In the opened page, you will have 2 tabs in the DF execution page, "Run Details", and "Distribution details"
In the Run Details tab, you can view the component statistics and inspect the % of total time.
From this you can deduce which components are running slower and taking most % of the time from overall time.

in the distribution tab you can see how many records were processed on each node. In case of partitioned mode you should see multiple nodes here.

When running small customer segment, you wouldnt be covering records from all the partitions. You have to inspect the segment table check how many distinct partitions got picked using your segment criteria. you could also inspect the batchoutput table. Both segment and batch output table will have parition key column.


5. I could think of 2 possible functionality that could be affected.
a. Segments - you should rerun all the segments, no need to recreate, but segment table should be regenerated.
b. existing Campaign runs - you have to make sure your existing campaign run that are in wait state are completed as wait flow processing makes a join of batch output table with customer table.

Regards
Manu

Show Less

To see attachments, please log in.

Like (0)

Posted: 7 years ago

Posted: 3 Jan 2018 7:20 EST

ABHINANDAN

Areteans Technology Solutions

replied to ABHINANDAN

Report

Dear Manu,

Thanks for your reply, please find my comments below

1. We will see if we can increase the partitions , as per your instruction based on our proposed hardware we should be having 4(no of nodes)* 5 (number of thread) *2 = 40 partitions or more. But after increasing the partitions do you think only executing the segment once more will do, because we saw sql query with partition column are embedded in segment rule blob which changes when we resave the segment, please confirm this.

2. Actually what we did was we deleted the segment and restored again with same name that did the trick for us.

3. The relationship between Interaction table partition and customer partition is very important, can you point us to some one who can answer that.

4. We will use technique you mentioned to check performance post partitioning in dev environment.

5. Also can you please confirm if segment execution also uses partitioning, because segment also takes lot of time, so I was wondering the DeferredSegment agent which I believe is responsible for segment execution uses data flow and partitioning.

Regards

Abhi

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 31 Aug 2018 0:41 EDT

NabenduP

Accenture

replied to ABHINANDAN

Report

Hi all,

Wanted to ask one question here, For IH, did you make any changes at server level? if yes, will you be able to share what was needed.

As far as Customer table is concerned, did you do Data Partitioning or Table Partitioning. And was your Custmoer table in Cassandra or in normal table.

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 31 Aug 2018 1:09 EDT

ABHINANDAN

Areteans Technology Solutions

replied to NabenduP

Report

For IH partitioning we only changed the interactionHistory/numberOfPartition Dynamic System Setting.

Our customer table is a normal DB table and we use Table Partitioning.

Regards

Abhi

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 31 Aug 2018 1:09 EDT

ABHINANDAN

Areteans Technology Solutions

replied to ABHINANDAN

Report

For IH partitioning we only changed the interactionHistory/numberOfPartition Dynamic System Setting.

Our customer table is a normal DB table and we use Table Partitioning.

Regards

Abhi

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 24 Oct 2018 10:20 EDT

AnoojitD

Capgemini

replied to ABHINANDAN

Report

Hi Abhinandan,

we are also in the process of introducing partitioning in our CAR tables. As you mentioned that you did table partitioning, can you let me know what type of partitioning was done? like hash, range or just using a MOD function on the primary key and store the value in a new partition key column?

The application settings landing page asks for the range of the partitions, for us it would be around 64 based on the same formula you are using, how do we specify the range in the landing page?

Any insights would be helpful.

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 24 Oct 2018 10:37 EDT

ABHINANDAN

Areteans Technology Solutions

replied to ABHINANDAN

Report

Well if you are using DB partitioning then physical table has to have physical partition which is a DB feature. In our case the ETL process which populates the table is generating those partitions using MOD 25 on the last two digits our Customer ID which leads to 25 partitions i..e 00 -24, which is a reasonable number.

Then we specify this ranges in the application setting page.

Regards

Abhi

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 24 Oct 2018 10:50 EDT

AnoojitD

Capgemini

replied to ABHINANDAN

Report

Hi,

Thanks for the response. In our case we need to do the partitioning on the existing customer data and then probably include the function in the ETL ( our case its also a ETL feed) for subsequent updates. What I wanted to know is whether you had a column in the table with the partition key values from 00-24 or were they mentioned as partition names in the table definition ( not visible as columns).

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 24 Oct 2018 15:30 EDT

ABHINANDAN

Areteans Technology Solutions

replied to ABHINANDAN

Report

Yes we had a seperate column for holding the partition values, the physical partitions were also done based on the values of this column.

Regards

Abhi

To see attachments, please log in.

Like (0)

Posted: 6 years ago

Posted: 25 Oct 2018 8:00 EDT

AnoojitD

Capgemini

replied to ABHINANDAN

Report

Thanks Abhi for the response.

To see attachments, please log in.

Like (0)

Question

Steps to use partitioning in pega marketing

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Steps to use partitioning in pega marketing

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.