Kafka com.pega.dsm.kafka.impl.RetryableOperationException SSL handshake error
Kafka: Receiving com.pega.dsm.kafka.impl.RetryableOperationException top class exceptions in our non-prod kafka cluster on multiple namespaces. What causes the Kafka SSL handshake error: current status ?
@IvanP276
@IvanP276
to stop Kafka SSL handshakes causing com.pega.dsm.kafka.impl.RetryableOperationException in non-prod. First, confirm you’re actually hitting an SSL listener (security.protocol = SSL or SASL_SSL) and not pointing an SSL client at a PLAINTEXT port or the wrong advertised.listeners. Next, verify the broker cert: not expired, correct SAN for the hostname you call, and full chain present (include intermediates); most handshakes fail on SAN mismatch or missing intermediates. Ensure client truststore contains the issuing CA(s) for the broker cert; if you use mutual TLS, also load the client keystore with the private key and cert chain. Align protocols/ciphers: Java on Pega nodes and brokers must support a common TLS version (TLSv1.2+), and corporate policies sometimes disable older ciphers causing silent fails. Check endpoint identification: if your cert CN/SAN doesn’t match DNS, either fix the cert or temporarily set ssl.endpoint.identification.algorithm to empty (better to fix cert). Validate passwords/paths for ssl.truststore/ssl.keystore and types (JKS vs PKCS12); wrong type or password throws handshake errors. Check clock skew on nodes; big skew can break cert validation. If using SASL_SSL, confirm mechanism and JAAS config are correct before the TLS step. On the cluster, verify listener.name.*.ssl.keystore/truststore settings and that brokers reload after cert changes. From a stream node, run an openssl s_client against the broker host:port to see the exact TLS error and cert chain. In Pega, restart the Stream/Kafka client tier after updating DSS/keystores and purge stale connection pools. Finally, scan broker and client logs for “PKIX path building failed,” “certificate_unknown,” or “no suitable protocol” to pinpoint whether it’s trust, identity, or cipher mismatch the fix will map directly to that message.