We have encountered a situation several times where we are getting fatal errors due to too many open files. It turns out that we had 811,992 open sockets, 807,057 in CLOSE_WAIT state, 2,315 in a connected or listening state. The limit for open descriptors for the Java process is 4096. The 807k sockets in CLOSE_WAIT state are connecting to the Zookeeper socket on the same host; "TCP host101:eforward->host101:39565 (CLOSE_WAIT)".
The eforward port, 2181, is the default Zookeeper port, the port 39565, is of course, ephemeral.
I saw reference to a post describing a similar issue with too many sockets in CLOSE_WAIT state in a SOAP scenario. The suggested action that case was to change timeouts for Axis2.
Is this a known issue? If so, is there a setting for Kafka or ZK that we need to consider?
Found the issue. Tomcat config in conf/context.xml specifies the JDBC pool connection. It turns out that for some Tomcat instances the value of maxActive is not longer used and is replaced with maxTotal. It's confusing as some Tomcat docs refer specifically to maxActive. Using maxTotal has resolved the issue of sockets not closing when expected.
Not sure of the relationship between pool connections and sockets being able to close. We have only surmised what might be happening.