Question
Capgemini
IN
Last activity: 20 Aug 2015 4:16 EDT
lucene search with caret (^)
In our application (v6.3), when we search with this "ABC1234^6789", two cases are returned. One has the this value in one of the properties but for the second case, there is a different value in that property "ABC1234^6790". Could you please help me in understanding what's happening here and how this can be resolved?
-
Like (0)
-
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Accepted Solution
Pegasystems
IN
To get exact matches, you should use double quotes. By default we add a suffix wildcard to the search string when you search for work items.
Can we see the indexed values from the index file?
You can use Luke tool outside of Pega to look into the Lucene index - https://code.google.com/p/luke/downloads/detail?name=lukeall-3.5.0.jar&can=2&q=
Pegasystems Inc.
US
Not sure, but I think the ^ is causing Lucene to tokenize your value into two … and “promoting” any search hits with “ABC1234” to the top of your list.
Try search with ABC1234\^6789
Refer to following document for Lucene wildcards : https://www.drupal.org/node/375446
Caret will boost your search string. If you want the term "lucene" to be more relevant, boost it using the ^ symbol along with the boost factor next to the term. You would type:
lucene^4 drupal
This will make documents with the term lucene appear more relevant. By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2).
Capgemini
IN
Searching with ABC1234\^6789, doesn't return any result.
Pegasystems
IN
We don't keep all the special characters as part of the Lucene terms. The special characters that are part of the term are
-_!%$.
All other special characters are considered delimiters.
PDN article says NBAA but it is applicable for any full text search in Pega - https://community.pega.com/support/support-articles/unable-search-keywords-special-characters-nbaa
Capgemini
IN
Thanks Rajiv,
The support article mentions "-", "_", "!", "%", "@", ".". as valid characters. Does this mean that text with only these characters and no other special characters can be searched.
Also, the "!" characters search works as below. When I search with abc3600!18793, results returned are abc3600$18792, abc3600^18791 and abc3600^18792 but not the text I searched with. But If I enclose it in quotes ("abc3600!18793") , this is the only result returned.
Can we see the indexed values from the index file?
Accepted Solution
Pegasystems
IN
To get exact matches, you should use double quotes. By default we add a suffix wildcard to the search string when you search for work items.
Can we see the indexed values from the index file?
You can use Luke tool outside of Pega to look into the Lucene index - https://code.google.com/p/luke/downloads/detail?name=lukeall-3.5.0.jar&can=2&q=