Knowledge Buddy REST APIs
In this article you can find an introduction on what the Pega Knowledge Buddy API is and how can it be useful as a RAG Service. To evaluate this approach I used the following categories:
- Retrieval Quality
- Context Management
- Integration Capabilities
I also included results of the Pega Knowledge Buddy API testing to let you understand basics of this integration.
Pega Knowledge Buddy
Pega Knowledge Buddy is an AI-powered application that uses Natural Language Processing (NLP) and machine learning to help users quickly find accurate information. By effectively understanding user queries more effectively, it provides more personalized responses based on available data sources that you provide.
Pega Knowledge Buddy offers concise answers to questions that would typically require browsing through multiple articles as users would in a conventional search experience. Furthermore, Knowledge Buddy responses are customized according to the audience and their corresponding access permissions.
Key benefits of Knowledge Buddy include:
- Natural Language Processing (NLP) capabilities to understand and interpret user queries.
- Customizable data sources for personalized and relevant responses.
- Security features for access roles, data sources, and articles that are incorporated or ingested into Knowledge Buddy by using APIs.
Depending on your business environment, you can integrate Knowledge Buddy into the following applications:
- Pega Knowledge
- Pega CRM
- Other Pega applications
- Any third-party application
How does Pega Knowledge Buddy work? First, you create or identify the content that you want your buddy to use as a basis for its responses. This content then serves as a data source.
Once you have configured one or more data sources, you can create a buddy – a Large Language Model-based tool that employs RAG to answer questions using the knowledge you provided. Pega Knowledge Buddy enables you to create multiple buddies, each with a unique set of data sources, and each with a different query purpose, including sales, customer service, marketing and more.
Data Ingestion
Customers can input any text-based content into Knowledge Buddy using the REST API service. When used in combination with Pega Knowledge Management software, data ingestion can be run automatically whenever a knowledge article is published. Once the content is uploaded, Pega Knowledge Buddy generates a set of content chunks and processes them through the Pega GenAI gateway. This step creates embeddings – segments of content with detailed semantic features attached to them. Embeddings are finally saved in the Pega GenAI Vector Store.
Asking Questions
Pega Knowledge Buddy receives the question from the application through the corresponding REST API service. The question is then passed through the Pega GenAI gateway, where it is converted into embeddings. Next, Pega Knowledge Buddy runs a similarity search in the vector database to find relevant content embeddings that match the question. Once the matching content is identified, Knowledge Buddy adds it to a specially formulated prompt. This prompt, along with the user role information, is sent to the Pega GenAI gateway again. The Large Language Model within the GenAI gateway processes the prompt and generates the answer to the user's question.
API
Knowledge Buddy provides REST APIs for various functions, including ingesting content, asking questions, deleting content, providing feedback on received responses, and performing semantic searches on content within the Knowledge Buddy.
Knowledge Buddy APIs use the OAuth 2.0 authentication model, but also support Basic authentication. If required, you can change the authentication method.
Ingestion API
Ingestion API ingests new content or updates existing content in the Knowledge Buddy database.
The request contains attributes of content that is ingested. The attributes provide metadata/configuration on the data source, chunking configuration, and the content.
Below is a sample request that demonstrates required structure. The `content` attribute contains all relevant information to answer coming queries. However, the `text` attributes include the same information as well, to allow filtering when searching through the knowledge base.
{
"dataSource": "top_songs",
"collection": "spotify",
"objectId": "001",
"title": "MILLION DOLLAR BABY",
"chunkingMethod": "NONE",
"chunkSize": 200,
"chunkOverlap": 200,
"roles": [
{
"value": "KnowledgeBuddy:Public"
}
],
"text": [
{
"content": "1. \\"MILLION DOLLAR BABY\\" by Tommy Richman (Released April 26, 2024): This is currently the #1 ranked track with an impressive track score of 725.4.It's performing exceptionally well on social media, particularly on TikTok with over 5.7 million posts and a staggering 5.3 billion views. The song has nearly 390 million Spotify streams and appears in over 30,000 playlists. Its high Spotify popularity score of 92 and significant YouTube presence (84 million views) indicate it's a major mainstream hit. It's a clean (non-explicit) track that's getting strong radio play with over 40,000 AirPlay spins.",
"roles": [
{
"value": "KnowledgeBuddy:Public"
}
],
"attributes": [
{
"name": "artist",
"values": [
{
"value": "Tommy Richman"
}
]
},
{
"name": "releaseDate",
"values": [
{
"value": "2024-04-26"
}
]
},
{
"name": "trackScore",
"values": [
{
"value": "725.4"
}
]
},
{
"name": "spotifyStreams",
"values": [
{
"value": "390000000"
}
]
},
{
"name": "tiktokPosts",
"values": [
{
"value": "5700000"
}
]
},
{
"name": "tiktokViews",
"values": [
{
"value": "5300000000"
}
]
},
{
"name": "youtubeViews",
"values": [
{
"value": "84000000"
}
]
},
{
"name": "playlistCount",
"values": [
{
"value": "30000"
}
]
},
{
"name": "airplaySpins",
"values": [
{
"value": "40000"
}
]
},
{
"name": "spotifyPopularityScore",
"values": [
{
"value": "92"
}
]
},
{
"name": "explicit",
"values": [
{
"value": "false"
}
]
}
]
}
]
}
Find more about the Ingestion API.
Question API
Use the Question API to ask questions to a Knowledge Buddy and receive answers.
The request contains metadata/configuration of the question. It may include question-related attributes, a flag to include search results, or filtering attributes.
Find more about the Question API.
Retrieval Quality
I tested various queries. You can find the examples below.
Question 1 - general question:
{
"question": "What is the #1 ranked track?",
"questionSource": "chat",
"questionTag": "user query",
"includeSearchResults": false,
"buddy": "APITesting",
"userEmail": "[email protected]"
}
Answer to the question 1:
{
"questionID": "ASK-72038",
"answer": "The #1 ranked track is \\"MILLION DOLLAR BABY\\" by Tommy Richman, released on April 26, 2024. It has an impressive track score of 725.4 and is performing exceptionally well on social media, particularly on TikTok with over 5.7 million posts and 5.3 billion views. The song has nearly 390 million Spotify streams and appears in over 30,000 playlists. It has a Spotify popularity score of 92 and significant YouTube presence with 84 million views.",
"status": "Results found"
}
Question 2 - asking on the content attribute:
{
"question": "What is the released date of the #2 ranked track?",
"questionSource": "chat",
"questionTag": "user query",
"includeSearchResults": false,
"buddy": "APITesting",
"userEmail": "[email protected]"
}
Answer to the question 2:
{
"questionID": "ASK-72026",
"answer": "The #2 ranked track, \\"Not Like Us\\" by Kendrick Lamar, was released on May 4, 2024.",
"status": "Results found"
}
Question 3 - asking that involves content attribute of multiple contents:
{
"question": "What are the tracks that have Spotify popularity of 92?",
"questionSource": "chat",
"questionTag": "user query",
"includeSearchResults": false,
"buddy": "APITesting",
"userEmail": "[email protected]"
}
Semantic search is a search method that understands the intent and contextual meaning of a query, not just matching keywords.
Question 4 - Semantic search:
{
"question": "Who sings the second top song?",
"questionSource": "chat",
"questionTag": "user query",
"includeSearchResults": false,
"buddy": "APITesting",
"userEmail": "[email protected]"
}
Answer to the question 4:
{
"questionID": "ASK-72040",
"answer": "The second top song is \\"Not Like Us\\" by Kendrick Lamar.",
"status": "Results found"
}
Overall, quality and relevance of retrieved results were acceptable. Probably, to achieve even better results I would need to play more with chunking attributes of the Ingestion API.
Context management
When it comes to context management it's possible to retrieve specific portions of a document. By enabling the includeSearchResults
flag in the request the Knowledge Buddy API returns chunks of data that is returned during semantic search.
Because the REST API is stateless, it does not maintain context between related pieces of information. You can achieve it by handling state on the client side that is calling the Question API.
Integration capabilities
Pega Knowledge Buddy API can undoubtedly serve content that can be integrated with your application. The API's responds with JSON format that provides proper structuring of the retrieved content. It simplifies using the integration in your application.
Conclusion
Pega Knowledge Buddy API proves to be a robust solution for implementing RAG (Retrieval-Augmented Generation) services. Its key strengths include:
- Strong retrieval capabilities with semantic search functionality that understands context and intent beyond simple keyword matching.
- Flexible content ingestion system that allows for detailed metadata and attribute configuration
- Well-structured JSON responses that enable easy integration with external applications.
While the API's stateless nature requires client-side context management, the overall architecture provides a solid foundation for building knowledge-based applications. The combination of OAuth 2.0 security, customizable chunking options, and comprehensive documentation makes it a viable choice for organizations looking to implement AI-powered information retrieval systems.