Defining a partition key and using it in a report definition (data flow)

Question

AndrewN2

Member since 2016

4 posts

Scotiabank

Posted: Oct 26, 2017

Last activity: Jun 26, 2018

Posted: 26 Oct 2017 0:10 EDT
Last activity: 26 Jun 2018 1:27 EDT

Closed

Defining a partition key and using it in a report definition (data flow)

Report

Wondering if any teams out there could speak to how they identified their partition key (what should the value be) and can speak to the behavior of using a partition key in the report definition source of a data flow. We currently have a batch which will take a very long time due to the number of records involved and looking for ways to optimize the batch data flow execution.

Also, to note: the table in context which I want to partition is going to be truncated and reloaded on a nightly basis.

***Updated by moderator: Lochan to update Categories***

***Edited by Moderator Marissa to add SR Details***

To see attachments, please log in.

Decision Management

Reporting

Data Integration

Support Case Exists

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 8 years ago

Posted: 30 Oct 2017 18:07 EDT

GopiGanapathy

PEGA

replied to AndrewN2

Report

Please pick the partition key column such that each thread gets an assignment to process. you can pick any column from the customer table.

DB data set uses select distinct partitionKey from table query to get the list of partitions.

Does table columns remains the same after the nightly refresh?

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 30 Oct 2017 22:45 EDT

AndrewN2

Scotiabank

replied to GopiGanapathy

Report

We will define a partition key column (e.g. SequenceNumber) and evenly distribute the records in our data table to 2x our JVMs (so if we had 16 JVMs, we would set the partition between 1 to 32 and ensure there is an even number of record distribution.

The question is, how does Pega's data flow rule (with a report definition source) handle the distribution of load across our JVMs if we pick this sequence number as the partition key for the source component? Will Pega know to use the SequenceNumber (partition key) defined as an integer to evenly distribute the record processing within the data flow to each node/JVM?

In our dev and test environments we have a single node so it makes validating the performance improvements of defining a partition key harder as we would need to move to our non-functional testing environment (multi-node) in order to test this and want to know as much details as possible before proceeding.

Table columns will remain the same, will we just truncate and reload the columns (and rebuild the partition key/sequence number).

To see attachments, please log in.

Like (0)

Posted: 7 years ago

Posted: 26 Jun 2018 1:27 EDT

Raju Botu

Coforge

replied to AndrewN2

Report

I know its a old thread, but could you please update the final status.?

As far as i know, Partition key will not work if the source is a Report Definition (Pega V 7.2) . Please correct me if am wrong.

I don't think a unique key such as sequence number can be used as a partition key, there will be too many partitions.

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 31 Oct 2017 14:32 EDT

GopiGanapathy

PEGA

replied to AndrewN2

Report

For each record of 'select distinct partitionKey from table', pega creates assignment that will be scheduled to process set of customer records('select * from table where partitionkey = ?'). each thread takes an assignment & processes it. Once a thread completes an assignment, then it will check for next assignment until no more assignments.

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 25 Nov 2017 0:13 EST

AndrewN2

Scotiabank

replied to AndrewN2

Report

Hi Gopi,

As we have previously discussed. The use of a partition key functions as you noted. However, defining the # of partitions should be investigated. What should the range of numbers be based on the server hardware (e.g. is it dependent on # of JVMs/nodes, CPU, memory, etc.)?

To see attachments, please log in.

Like (0)

Question

Defining a partition key and using it in a report definition (data flow)

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Defining a partition key and using it in a report definition (data flow)

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.