@FastwebPegaS please can you confirm that you logged INC-235356 support ticket for this? This will help the moderators track the issue and follow it to conclusion.
This type of error can be seen during search initiation. It is a known issue with sharding management during rolling restarts. This may have nothing to do with the startup except that search is unlikely to operate in this environment. Shutting down all nodes, emptying the index directory, and reindexing should resolve the search initialization issue. If the problem still appears to be a search issue, please share the thread dumps from the node startup with our support team.
The best course of action is to wait for our support team to analyse your logs and help you further.
Posted: 1 year ago
Updated: 1 year ago
Posted: 31 Aug 2022 9:15 EDT Updated: 18 Oct 2022 14:03 EDT
Marije Schillern (MarijeSchillern)
Senior Knowledge Management Specialist
Based on the above stack trace it looks like target is null and partition having issue , This happens Specifically, when the target is null, this message means that this particular member doesn’t have the owner set for a specific partition. This means that the member didn't get its partition table updated in time (a request was made before we were informed where the data lives in the grid).
What it means: In a healthy cluster, this should rarely occur as Hazelcast has delivered fixed in past releases which prevent the race condition between looking for data and getting the updated partition table information. In a split-brain situation when the cluster is fractured into many smaller clusters, partitions are lost (since some partitions may only have existed on nodes that are no longer part of a splintered group of nodes).
It’s also the case that frequent fracturing and merging causes the partition tables to experience delayed updates.
What to do:
In a healthy cluster, this should be a one-off and can safely be ignored. When the error is seen multiple times, it may indicate that the cluster is experiencing fracturing. Also check to make sure that there are no ports blocked by your firewall.
You can try the below steps to fix this issue for now and check if it helps:
Take the DB backup.
Bring all nodes down
Truncate the pr_sys_statusnodes table (Take first DB backup)
Bring up one node (preferably an index host node)
Bring up the remainder of the nodes in parallel or in series as you wish
Note: It is recommended to perform the above steps in non-business hours after taking the DB backup.
If you still see the issue after performing the above then support has asked you to provide below details: