Last activity: 14 Mar 2023 8:14 EDT
Implementation of the Apache Kafka® with the Apache AVRO serialization in Pega
The attached document is the lesson learned for a future reuse with the implementation details of the Pega Platform integration with the Apache Kafka streaming platform using the Apache AVRO serialization.
Demonstrate the implementation of the Apache Kafka event streaming platform with the Apache AVRO serialization to consume the Apache AVRO messages using the Pega Real-Time Data Flow run.
The Apache Kafka® is a distributed streaming platform, which has three key capabilities:
- publish and subscribe to streams of records, like a message queue or enterprise messaging system
- store streams of records in a fault-tolerant durable way
- process streams of records, as they occur
Kafka is generally used for two broad classes of applications:
- building the real-time streaming data pipelines that reliably get data between systems or applications
- building the real-time streaming applications that transform or react to the streams of data
More about Apache Kafka: https://kafka.apache.org/
"The Apache Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format."
More about Apache AVRO: https://avro.apache.org/
Is Schema Registry component is available at Pega Exchange? Or when can we expect the availability?
The Pega Exchange component is not available officially yet. The described component was delivered and tested on Pega 8.2.x and sanity checked on Pega 8.3. I have asked the product team, when there is a plan to release it as the official Pega Exchange component and if the current version of it can be shared here, so I will update you, as soon as I know that.
Are you asking for the partition key(s)? When you define the Kafka Data Set, you select the Kafka instance, Kafka topic, and you define the partition key there.
Not the partition keys. I want to know about writing key and value on a topic.
It's awesome to see this Avro support for Kafka.
We have 2 Questions.
We need to know whether we can also Write the Message to Queue/Topic supporting AVRO?
we can see it is tested in realtime from your pdf, Will it also work for Batch?
Many Thanks & Appreciate your response.
I did not test the writing into the topic, but I think that it should work too, as reading works. To use AVRO serialization in Pega you need to define classes in Pega with properties, which represent the AVRO message. Having that, when you create the Kafka Data Set in the class, which is the root of the message, you should be able to read and write from and into AVRO messages. I assume to write, you need to prepare the correct messages (matching the AVRO schema) in the flow e.g. using a convert shape, data transform.
When the Kafka Data Set is used on the left side of the Data Flow (reading data), then this Data Flow run can be only the real-time flow, as incoming messages come from the stream, so Pega automatically recognizes the real-time Data Flow run, not a batch Data Flow run.
You can create a batch run, when you are using a non-stream input to the Data Flow, like Data Set of the type Table. Having that, you should be able to write into the Kafka Data Set. When Kafka Data Set is configured on the input and output of the Data Flow, then the Data Flow run will be the real-time type only.
Many Thanks for your response, we are eagerly waiting for your component available to use it.
May you be so kind to provide us any tentative timeline for the same, however our Engagement leaders will reach Pega for the same.
The official Kafka AVRO component will be shared via Pega Exchange, so it should be available here: https://community.pega.com/marketplace
I do not know the official timeline, when this component will be shared to the marketplace, as it has to be tested and prepared for different Pega versions. The described AVRO package from my article was delivered for a project need, as a first version of this component and tested for Pega Platform 8.2.1-8.2.3.
Do you have any news regarding it?
we'd like to download this component and test it on our project.
we have a requirement to read messages from kakfa avro.
thanks & regards
The AVRO out of the box support is challenging, as every single customer uses it in a different way. This component has been tested on 8.2.x and sanity checked on 8.3 and it is not supported by Pega in production by GCS. Component will not be released via Marketplace primary due to a very long time it takes to publish and keep the component up to date.That being said, all new versions and fixes will be delivered via github. Tentatively, there is a plan to add native platform Avro support in 8.6.
This component is published the Pega GitHub repository:
You can find the released version and supporting documentation here:
I think that this message closes the AVRO conversation in this topic for now, as I am not working on the progress of this component, or maintenance of it. As I mentioned, code can be re-used, but with the risk of no production support for now.
@nowap Hi Can we know how to header when we do custom serialization
@nowap I mean, how to add custom header for KAFKA
@BalamuraliKrishnan which version of Pega you're using? It should be possible in Kafka Data Set configuration from Pega 8.6
@nowap I have raised separate question for same with some details. We are using Pega 8.6.3
@nowap Can we deserialize the avro message into different property name ? Example: Variable name in schema is "header", but want to use "Header" property in Pega, is it possible?
Yes you can write messages to the topic as well.
All you need to do is set all the properties on a page based on the avro format and call activity with execute data set method and select save. It will publish on the topic.
Copy it on to any page of class Code-Pega-List and into pxResults and then you can send it as a whole batch.
I have a question, how does offset commit happen ?
Coz the option which I see in the data flow are
1)Read existing and new records
2)Only read new records
What happens if the cluster fails or if pega goes down and there are messages that are written then how does pega read these messages?
Did you see this article about Data Flow resilience?
It navigate to that article, which should answer your question:
Hope this answers your question.