Monday, February 29, 2016

Sitecore Lucene searching by phrase and each of its words

With simple implementations of search in Sitecore, one might be using search to find items in the index where the content is like or contains the exact search term. This works well if a user is searching for a given keyword, however if they enter a phrase we would also want to return results which match certain words in that phrase (albeit with less of a boost).

This can be achieved using the predicate builder in Sitecore, along with the standard search code and indexes in Lucene. In this example I am searching on a simple Sitecore page template, and in particular 3 of it's fields: title, page description and main text. These fields have been included in the custom Lucene index XML and have the storage type set to true (this means I can access the data stored in the fields from the search index without having to get the Sitecore item).

public static class Searcher
    public static List<SearchResult> DoSearch(string searchTerm)
        var searchIndex = ContentSearchManager.GetIndex("MySearchIndex"); // Get the search index
        var searchPredicate = GetSearchPredicate(searchTerm); // Build the search predicate

        using (var searchContext = searchIndex.CreateSearchContext()) // Get a context of the search index
            var searchResults = searchContext.GetQueryable<SearchModel>().Where(searchPredicate); // Search the index for items which match the predicate

            // This will get all of the results, which is not reccomended
            var fullResults = searchResults.GetResults();

            // This is better and will get paged results - page 1 with 10 results per page
            //var pagedResults = searchResults.Page(1, 10).GetResults();

            var myResults = new List<SearchResult>();

            foreach (var hit in fullResults.Hits)
                myResults.Add(new SearchResult
                    PageDescription = hit.Document.PageDescription,
                    Title = hit.Document.ItemName,
                    Url = hit.Document.Url

            return myResults;

    public static Expression<Func<SearchModel, bool>> GetSearchPredicate(string searchTerm)
        var predicate = PredicateBuilder.True<SearchModel>(); // Items which meet the predicate

        // Search the whole phrase - LIKE
        predicate = predicate.Or(x => x.DispalyName.Like(searchTerm)).Boost(1.2f);
        predicate = predicate.Or(x => x.PageDescription.Like(searchTerm)).Boost(1.2f);
        predicate = predicate.Or(x => x.Introduction.Like(searchTerm)).Boost(1.2f);
        predicate = predicate.Or(x => x.Maintext.Like(searchTerm)).Boost(1.2f);

        // Search the whole phrase - CONTAINS
        predicate = predicate.Or(x => x.DispalyName.Contains(searchTerm)).Boost(2.0f);
        predicate = predicate.Or(x => x.PageDescription.Contains(searchTerm)).Boost(2.0f);
        predicate = predicate.Or(x => x.Introduction.Contains(searchTerm)).Boost(2.0f);
        predicate = predicate.Or(x => x.Maintext.Contains(searchTerm)).Boost(2.0f);

        // Search the individual words
        foreach (var t in searchTerm.Split(' '))
            var tempTerm = t;

            predicate = predicate.Or(x => x.DispalyName.Contains(t)).Boost(1.0f);
            predicate = predicate.Or(x => x.PageDescription.Contains(t)).Boost(1.0f);
            predicate = predicate.Or(x => x.Introduction.Contains(t)).Boost(1.0f);
            predicate = predicate.Or(x => x.Maintext.Contains(t)).Boost(1.0f);

        return predicate;

    /// <summary>
    /// Search item mapped to Lucene index
    /// </summary>
    public class SearchModel
        public string ItemName { get; set; }

        public string DispalyName { get; set; }

        public string TemplateName { get; set; }

        public string Url { get; set; }

        public string PageDescription { get; set; }

        public string Introduction { get; set; }

        public string Maintext { get; set; }

    /// <summary>
    /// Custom search result model for binding to front end
    /// </summary>
    public class SearchResult
        public string Title { get; set; }

        public string Url { get; set; }

        public string PageDescription { get; set; }
In the example above there are two objects defined: the SearchResult object which contains the fields from the index we want to pass through to the front end and the SearchModel object which maps the data from the Lucene index.

The GetSearchPredicate method builds up the object for which each indexed item will be evaluated against. If an indexed item meets the logic inside this predicate, then it will be returned as a search result. In this case there are 3 main parts to this predicate object:
  1. Searching the indexed fields for content that is like the search term. This has a boost of 1.2f because like is not an exact match.
  2. Searching the indexed fields for content that contains the search term. This would be considered an exact match so has a boost of 2.0f.
  3. Searching the indexed fields for content that contains each of the words that make up the search phrase. This has a boost of 1.0f, because the content might not be exactly what the user is searching for but may be relevant.
These pieces are all connected together with or statements which means a result only needs to match one of these to be returned. For more complex searching you could use the and statement.

Then in the main DoSearch method, we are getting the search index and creating a context for it. We then query this index for items which match the search predicate built up earlier (items that match any of the true statements for contains/likes). We then get all of the search results and parse them into a custom object for use on the front end. Please note that this can be an expensive task to run, so it is recommended that you get the results by paging (with Google how often do you go past the first page or 2?).

This example extends on from Sitecore 8 Lucene search index with some help from stack overflow.

No comments:

Post a Comment