Monday, February 29, 2016

Sitecore Lucene searching by phrase and each of its words

With simple implementations of search in Sitecore, one might be using search to find items in the index where the content is like or contains the exact search term. This works well if a user is searching for a given keyword, however if they enter a phrase we would also want to return results which match certain words in that phrase (albeit with less of a boost).

This can be achieved using the predicate builder in Sitecore, along with the standard search code and indexes in Lucene. In this example I am searching on a simple Sitecore page template, and in particular 3 of it's fields: title, page description and main text. These fields have been included in the custom Lucene index XML and have the storage type set to true (this means I can access the data stored in the fields from the search index without having to get the Sitecore item).

In the example above there are two objects defined: the SearchResult object which contains the fields from the index we want to pass through to the front end and the SearchModel object which maps the data from the Lucene index.

The GetSearchPredicate method builds up the object for which each indexed item will be evaluated against. If an indexed item meets the logic inside this predicate, then it will be returned as a search result. In this case there are 3 main parts to this predicate object:
  1. Searching the indexed fields for content that is like the search term. This has a boost of 1.2f because like is not an exact match.
  2. Searching the indexed fields for content that contains the search term. This would be considered an exact match so has a boost of 2.0f.
  3. Searching the indexed fields for content that contains each of the words that make up the search phrase. This has a boost of 1.0f, because the content might not be exactly what the user is searching for but may be relevant.
These pieces are all connected together with or statements which means a result only needs to match one of these to be returned. For more complex searching you could use the and statement.

Then in the main DoSearch method, we are getting the search index and creating a context for it. We then query this index for items which match the search predicate built up earlier (items that match any of the true statements for contains/likes). We then get all of the search results and parse them into a custom object for use on the front end. Please note that this can be an expensive task to run, so it is recommended that you get the results by paging (with Google how often do you go past the first page or 2?).

This example extends on from Sitecore 8 Lucene search index with some help from stack overflow.

No comments:

Post a Comment