Does Elastic Search support higher volume of production data?

Question

Mr. Ravi Kumar Pisupati

Member since 2024

90 posts

GovCIO

Posted: May 19, 2016

Last activity: Aug 8, 2016

Posted: 19 May 2016 18:47 EDT
Last activity: 8 Aug 2016 1:58 EDT

Closed

Does Elastic Search support higher volume of production data?

Report

Hi,

We are working on a call center application and is under upgrade process (V7.1.9). Our application has huge volume of data (TB's) and 50-60 billions of records in production and due to that we are afraid of enabling ES for work objects based on unknown performance issues related to it. Can someone respond me whether ES has any volume restriction or ES supports higher volume of data like us? What are the precautionary things we have to take while enabling ES in our prod environment?

Note: We disabled searching of work items in prod in the current V62 environment (Lucene) due to some performance issues seen earlier.

Thanks,

Ravi Kumar.

To see attachments, please log in.

Updates

System Administration

Data Integration

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 9 years ago

Posted: 20 May 2016 5:25 EDT

nistr replied to Mr. Ravi Kumar Pisupati

Report

In version 6.2, the Pega platform was using Apache Lucene 3.0.0 which did have memory issues. This was later fixed in Apache Lucene 3.5.0 which was used in 6.3 version of Pega platform. You can find the details of the performance issue here - https://issues.apache.org/jira/browse/LUCENE-2205

That said, Elastic Search, introduced in 7.1.7 version of the Pega platform uses Apache Lucene 4.6 underneath and thus the performance is much better.

As I understand, the data is huge. How often does this data change? How does your database cope up with this volume? Do you have an archive / purge strategy? Do you need to search on all this data all the time?

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 20 May 2016 11:38 EDT

Mr. Ravi Kumar Pisupati

GovCIO

replied to nistr

Report

Thanks Rajiv. Will get back to you for more details.

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 24 May 2016 14:00 EDT

Mr. Ravi Kumar Pisupati

GovCIO

replied to nistr

Report

Rajiv,

Here are the responses for your questions.

1.	How often does this data change? Once the cases are resolved, the data does not change at all. For assignment errors, support team usually resolves these manually.

2.	How does your database cope up with this volume? Oracle MA, Oracle RAC architecture.DB Utilization capacity to handle current load so far (without Elastic Search functionality)— ~ 40 – 50%. Keep in mind that DB utilization capacity is much higher during peak season (From October to end of Jan).

3.	Do you have an archive / purge strategy? All cases older than 1 year are moved to the Archived DB and should not be needed in the indexes.

4.	Do you need to search on all this data all the time? As per the use case “Configure the “Quick Search” feature to have the following pre-set search features to match on: Case-ID, Member-Name, Member ID, and User Name. (Case search – only, other OOTB searches will be removed).

Thanks,

Ravi Kumar.

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 26 May 2016 8:08 EDT

nistr replied to Mr. Ravi Kumar Pisupati

Report

Our current Archive and Search Wizard doesn't remove entries from search index for those instances which are purged from the database. So, this needs to be done by the developer / administrator themselves.

If you are going to search using "Case-ID, Member-Name, Member ID and User Name" then these can done via database queries as well. Elastic Search is useful when you need the power of full text search where you are not sure which field in the instance contains the string you are searching for.

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 20 May 2016 7:07 EDT

AndrewWerden

replied to Mr. Ravi Kumar Pisupati

Report

Bigger question - what is the use case and business value for full text search in this call center?

Most high volume call centers have very clearly defined access paths to old calls / cases - search by ref #/caseID, search by account / member / subscriber / provider, search by last name + date

Keep in mind that enabling full text search adds five database transactions every time work object / case is saved -- write to the ftsindexer queue, select from ftsindexer queue, update ftsindexer queue, read work, delete from ftsindexer queue. Regardless of index scalability, DB load is going to increase. If system is truly very large, that may be an issue.

We do have several customer service / customer support systems that run with text search enabled (gcssupport is not built on customer service/cpm rules but is used for that role) but they are 'high touch' and not 'high volume'. How would you characterize your application?

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 20 May 2016 11:38 EDT

Mr. Ravi Kumar Pisupati

GovCIO

replied to AndrewWerden

Report

Thanks Andrew. Will get back to you for more details.

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 24 May 2016 14:04 EDT

Mr. Ravi Kumar Pisupati

GovCIO

replied to AndrewWerden

Report

Hi Andrew,

Below are the responses.

1. What is the use case and business value for full text search in this call center?

Business was told that this is OOTB capability at the time (by Pega consultants).

6. We do have several customer service / customer support systems that run with text search enabled (gcssupport is not built on customer service/cpm rules but is used for that role) but they are 'high touch' and not 'high volume'. How would you characterize your application?

High Volume (peak call volumes/ number of cases).

The recent scenario was

1. New Calls / I-Cases creation - 145K / day;

2. Max. Transactions recorded – 25.8 Mil.

Thanks,

Ravi Kumar.

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 24 May 2016 14:09 EDT

AndrewWerden

replied to Mr. Ravi Kumar Pisupati

Report

Whats the nature of this customers product / service?

What are the customers data retention requirements?

What are the logical ways in which one would be searching hundreds of millions of cases?

- work ID

- customer ID

- customer tel #

etc ....

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 24 May 2016 16:56 EDT

Mr. Ravi Kumar Pisupati

GovCIO

replied to AndrewWerden

Report

Here are the responses.

What’s the nature of this customers product / service? – To resolve customer specific issues and concerns using this app.

What are the customers data retention requirements? – Not sure on this.

If you processed 145k calls a day, with two service intents per call, that would be 435k work items per day. Assuming 260 work days per year and 2 years retention I get 226 million work items to search. Where did you get 25.8 million? – 25.8M transactions include all the User hits between app server and User sessions; it does not mean we are creating that many intent Work objects (S-Cases).

What are the logical ways in which one would be searching hundreds of millions of cases? – Work ID, Member/Cust. ID mainly

- work ID

- customer ID

- customer tel #

etc ....

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 20 May 2016 13:03 EDT

ranjr

PEGA

replied to Mr. Ravi Kumar Pisupati

Report

Just to add to this, why you need this huge data searchable? You could use purge and archive to store your very old data. (Just a thought)

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 20 May 2016 13:13 EDT

Mr. Ravi Kumar Pisupati

GovCIO

replied to ranjr

Report

After Pega 7, biz is asking some search features in the new app but for now this feature is disabled in V62 to avoid performance issues while searching the data (I- or S- cases). By the way, our guys contacted on this issue with GCS earlier and based on that we disabled this feature till now. I don't remember the SR# for this earlier contact with GCS.

Thanks,

Ravi Kumar.

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 2 Jun 2016 14:12 EDT

Mr. Ravi Kumar Pisupati

GovCIO

replied to Mr. Ravi Kumar Pisupati

Report

Hi Rajiv/Andrew,

We had an internal discussion on enabling ES feature in our prod environment. We got some interesting points and would like to share with you guys.

1) PEGA enhanced Elastic search in PRPC7.X using third party driver : hazel cast which supports hazel cache mechanism.

2) But the implementation of hazel cache mechanism had bug in PRPC where it will create hung threads by scanning jvm nodes which are part of different host as part of boot strap process,etc.It even bump up the CPU utilization , network traffic.

3) One of our team members mentioned that there is a Hfix-21219 given to resolve some performance issues but not sure whether the above mentioned issues are still present in V7.1.9

Based on above points, can any of you please help us on what needs to be done?

Thanks,

Ravi Kumar.

To see attachments, please log in.

Like (0)

Posted: 9 years ago

Posted: 10 Jun 2016 3:20 EDT

nistr replied to Mr. Ravi Kumar Pisupati

Report

Hi Ravi,

I am not sure I understand point 1 and 2. Elastic search is meant to provide distributed search and failover. Hazelcast is meant to provide distributed cache / remote execution facility in the product. While both are distributed technologies, the purpose is completely different.

This issue with Elasticsearch was fixed in 7.1.8 and backported to 7.1.7 via HFIX-21219. So 7.1.9 does have the fix.

-Rajiv

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 5 Aug 2016 14:04 EDT

Mr. Ravi Kumar Pisupati

GovCIO

replied to nistr

Report

Hi Rajiv,

Sorry for the late reply. As per your last reply, we are planning to enable the ES in our prod env but I would like your recommendation. Can we go ahead and enable it prod based on our current situation? If yes, we need Pega support for any issues/concerns after enabling it. Please let me know.

Thanks,

Ravi Kumar.

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 8 Aug 2016 1:58 EDT

nistr replied to Mr. Ravi Kumar Pisupati

Report

Hi Ravi,

If you need the full text search capabilities on the cases that get created, then Elastic Search is the only way forward within the Pega 7 platform. As mentioned, if full text search is not needed, database queries can also work well.

Given the volumes you discussed the following should be kept in mind before enabling

Turn off indexing on classes which don't need to be searched. This can be done by checking the option "Exclude this class from search" in the advanced tab of the class definition
Have a purge / archive mechanism which can clean up the instances in the Elastic Search index as well
Have ample resoruces for the index host nodes
1. disk space should be up to 3 times the size of the initial index
2. suitable memory to handle the resources of Elastic Search (you should use up to half of the available RAM for your heap - at minimum of 8 GB)
3. Inter node connectivity over port range 9300~9399 should avoid any network blips
At least 2 nodes should be configured as index host nodes to provide failover
If you are using Websphere and IBM Java, please read the platform support guide on the minimum Websphere and IBM SDK versions
Enable attachment indexing on the search landing page, only if necessary

You should be able to reach out GCS for any issues faced or use the Pega Product Support community to seek help on any questions that you might have.

Hi Ravi,

Given the volumes you discussed the following should be kept in mind before enabling

Turn off indexing on classes which don't need to be searched. This can be done by checking the option "Exclude this class from search" in the advanced tab of the class definition
Have a purge / archive mechanism which can clean up the instances in the Elastic Search index as well
Have ample resoruces for the index host nodes
1. disk space should be up to 3 times the size of the initial index
2. suitable memory to handle the resources of Elastic Search (you should use up to half of the available RAM for your heap - at minimum of 8 GB)
3. Inter node connectivity over port range 9300~9399 should avoid any network blips
At least 2 nodes should be configured as index host nodes to provide failover
If you are using Websphere and IBM Java, please read the platform support guide on the minimum Websphere and IBM SDK versions
Enable attachment indexing on the search landing page, only if necessary

You should be able to reach out GCS for any issues faced or use the Pega Product Support community to seek help on any questions that you might have.

-Rajiv

Show Less

To see attachments, please log in.

Like (0)

Question

Does Elastic Search support higher volume of production data?

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Does Elastic Search support higher volume of production data?

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.