Question
Pegasystems Inc.
GB
Last activity: 3 Mar 2017 0:39 EST
Search Indexes on multinode environment
Hi im trying to get a better understanding of how distributed searching works we are on Pega 7.2.1
We have 4 nodes in the environment. Three of these nodes have been defined as Search Indexing nodes and work as expected.
I can bring down each of the search nodes in turn and as long as one node is up that is defined as a search node search continues to work
when i am left with one node that is not defined as a search node search fails as expected.
We then start bringing the nodes back so one node no index and one node with a search index.
At this point search still does not work in the portal.
We then bring a second search node back so 3 of our 4 nodes are up 2 with a search index defined
At this point search works.
Why can we bring nodes down and have search working with only one search indexingnode available but if we start bringing nodes back search does not work until there are two search index nodes available?
-
Like (0)
-
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Pegasystems Inc.
IN
In the scenario where you have a node which is not an index node and bring up one index node,
- Write operations would fail as the quorum is not met (Elasticsearch by default needs > (replicas/2) + 1 to be active in order for writes to succeed. In your scenario it would be 2 i.e. atleast 2 nodes should be active for writes to succeed)
- However, this won't be applicable to search requests.
- Do you notice any exceptions in the log when you perform a search operation ?
- If search is done on the node which has indices on it, does it return results ?
Pegasystems Inc.
GB
so bringing all search index nodes down then restart one. So i have two nodes running one with a search index one without.
If i perform a check of the index file from either node i get the message Rule Index File Unavailable or Corrupt
If i perform a Search in the Developer portal on either node no results are found
in the logs when i do a search on the node with the searchindex gives a simple search failed error, :-
2017-02-01 18:09:40,408 [ tomcat-http--68] [ STANDARD] [ ] [ PegaHotel:07.03.31] (Search.Data_Find_Search.Action) ERROR lxxxxxxx.europe.intranet|lxxxxxxx.europe.intranet HAT04 - Search request failed. Query: [*test*], Index [RULE]
2017-02-01 18:15:50,357 [ tomcat-http--34] [ STANDARD] [ ] [ PegaHotel:07.03.31] (Search.Data_Find_Search.Action) ERROR lxxxxxxx.europe.intranet|lxxxxxxx.europe.intranet HAT04 - Search request failed. Query: [*test*], Index [RULE]
searching on the node with no index also gives the same error
2017-02-01 18:15:57,727 [ tomcat-http--50] [ STANDARD] [ ] [ PegaHotel:07.03.31] (Search.Data_Find_Search.Action) ERROR lxxxxxxx .europe.intranet|lxxxxxxx.europe.intranet HAT01 - Search request failed. Query: [*test*], Index [RULE]
We do see that when we log in the write fails due to unavailable shards - which makes sense but i dont see why the search is not working
so bringing all search index nodes down then restart one. So i have two nodes running one with a search index one without.
If i perform a check of the index file from either node i get the message Rule Index File Unavailable or Corrupt
If i perform a Search in the Developer portal on either node no results are found
in the logs when i do a search on the node with the searchindex gives a simple search failed error, :-
2017-02-01 18:09:40,408 [ tomcat-http--68] [ STANDARD] [ ] [ PegaHotel:07.03.31] (Search.Data_Find_Search.Action) ERROR lxxxxxxx.europe.intranet|lxxxxxxx.europe.intranet HAT04 - Search request failed. Query: [*test*], Index [RULE]
2017-02-01 18:15:50,357 [ tomcat-http--34] [ STANDARD] [ ] [ PegaHotel:07.03.31] (Search.Data_Find_Search.Action) ERROR lxxxxxxx.europe.intranet|lxxxxxxx.europe.intranet HAT04 - Search request failed. Query: [*test*], Index [RULE]
searching on the node with no index also gives the same error
2017-02-01 18:15:57,727 [ tomcat-http--50] [ STANDARD] [ ] [ PegaHotel:07.03.31] (Search.Data_Find_Search.Action) ERROR lxxxxxxx .europe.intranet|lxxxxxxx.europe.intranet HAT01 - Search request failed. Query: [*test*], Index [RULE]
We do see that when we log in the write fails due to unavailable shards - which makes sense but i dont see why the search is not working
Pegasystems Inc.
IN
Could you set logging level for "com.pega.pegarules.search.internal.es.FTSQueryExecutor" to DEBUG and execute the search request. This gives more details on why the search request failed.
You can change the logging level from DesignerStudio->System->Operations->Logs->Logging level settings
Pegasystems Inc.
GB
so after setting logging we see that all shards failed?
2017-02-02 15:14:19,746 [ tomcat-http--12] [ STANDARD] [ ] [ PegaHotel:07.03.31] ( internal.es.FTSQueryExecutor) DEBUG lxxxxxxx.europe.intranet|lxxxxxxx.europe.intranet HAT01 - Error trace:
com.pega.elasticsearch.action.search.SearchPhaseExecutionException: Failed to execute phase [query_fetch], all shards failed
at com.pega.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:272)
at com.pega.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:147)
at com.pega.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:55)
at com.pega.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:45)
at com.pega.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at com.pega.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:112)
at com.pega.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
at com.pega.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
so after setting logging we see that all shards failed?
2017-02-02 15:14:19,746 [ tomcat-http--12] [ STANDARD] [ ] [ PegaHotel:07.03.31] ( internal.es.FTSQueryExecutor) DEBUG lxxxxxxx.europe.intranet|lxxxxxxx.europe.intranet HAT01 - Error trace:
com.pega.elasticsearch.action.search.SearchPhaseExecutionException: Failed to execute phase [query_fetch], all shards failed
at com.pega.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:272)
at com.pega.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:147)
at com.pega.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:55)
at com.pega.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction.doExecute(TransportSearchQueryAndFetchAction.java:45)
at com.pega.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at com.pega.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:112)
at com.pega.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
at com.pega.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at com.pega.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92)
at com.pega.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212)
Pegasystems Inc.
GB
this is the same error on both the node with a search index and the node without.
As soon as we add in a further node search works correctly
Pegasystems Inc.
IN
I have tested this locally and below are my observations for 3 index nodes, 1 non-index node
1. 3 index nodes down, 1 non-index node (no index nodes alive) ->> Search doesn't work (expected)
2. 2 index nodes and 1 non-index node down (1 index node is alive) ->> Search works fine
3. 3 index nodes down, 1 non-index node is alive and bring up 1 index node ->> Search doesn't work
4. All 4 index nodes down, 1 index node is brought up and 1 non-index node brought up (in this sequence) ->> Search doesn't work
The issue occurs due to number of replicas that gets configured on each index node addition. But, when the nodes are brought down, the replicas count is not updated.
During elasticsearch node start-up, it tries to make sure all shards (which includes replica shards) are allocated. In the scenario's 3 & 4, the cluster state remains in RED (due to the inability to allocate shards even for the primary shard). All search requests fail in this scenario.
When an additional index node is added, its able to assign the primary as well as 1 replica shard and cluster comes to YELLOW state.
However if we look at scenario 2 which is similar to 3&4 with the exception that index node hasn't been restarted, the cluster state remains in YELLOW as the primary shard is in an allocated state.
Hope this helps