Stream Node Connection Issues
We are running our application in customer managed cloud. Few days ago, one of the the stream nodes went down and hence Auto scaling group tried to replace the same with a fresh instance.
As you know, during that time, other active nodes(web-user for example) tried to connect to the stream node(While it is coming up) that is starting up and it was in Pending-Joining status for a long time and then the logs indicated that the web-user node could be joined with the new stream node
StreamServer.Default stream-ip-100-83-227-225.ec2.internal Proprietary information hidden JOINING_FAILED (was PENDING_JOINING) 8.7.3 c3459de5-ec6d-472a-89c7-8f808fe8a7c0
During this window, the web user node experienced overheads, and api hits reaching this web -user node took more time(60 seconds).
After few minutes, the joining succeeded and the web-user node was able to join and it showed the new stream node as normal
Basically the web-user node was unstable for few minutes that caused hiccups in production.
Question :: I believe the stream node start up has to be quick in order to prevent this issue. What options shall be employed to expedite the stream node startup/joining?
or
Is it possible to defer the joining (from the webuser node) until the stream node is up and running to prevent overheads?
We are running our application in customer managed cloud. Few days ago, one of the the stream nodes went down and hence Auto scaling group tried to replace the same with a fresh instance.
As you know, during that time, other active nodes(web-user for example) tried to connect to the stream node(While it is coming up) that is starting up and it was in Pending-Joining status for a long time and then the logs indicated that the web-user node could be joined with the new stream node
StreamServer.Default stream-ip-100-83-227-225.ec2.internal Proprietary information hidden JOINING_FAILED (was PENDING_JOINING) 8.7.3 c3459de5-ec6d-472a-89c7-8f808fe8a7c0
During this window, the web user node experienced overheads, and api hits reaching this web -user node took more time(60 seconds).
After few minutes, the joining succeeded and the web-user node was able to join and it showed the new stream node as normal
Basically the web-user node was unstable for few minutes that caused hiccups in production.
Question :: I believe the stream node start up has to be quick in order to prevent this issue. What options shall be employed to expedite the stream node startup/joining?
or
Is it possible to defer the joining (from the webuser node) until the stream node is up and running to prevent overheads?
Any step/best practices to eliminate overheads will be helpful