Thursday, March 17, 2016

Sitecore Lucene scheduling a search index to rebuild with IntervalAsynchronousStrategy

For larger Lucene search indexes in Sitecore (that can take some time to run) it is not the best idea to have these update after publishing - via the onPublishEndAsync or RebuildAfterFullPublish index strategy. In my case the index contained 20,000 items, with more than half being PDFs so it took close to an hour to process.

Luckily with Sitecore there is an index strategy which allows the index to rebuild automatically on a schedule. This is done via the IntervalAsynchronousStrategy index strategy and you are able to define the database and interval for when it should run.

Sitecore Lucene serach indexing sublayout or rendering data sources

Most Sitecore sites will use renderings or sublayouts in Sitecore as a method to display data, and more often than not the data is a custom template assigned via data source. It's this sort of logic which could then be extended to personalize or A/B test which content works best. The only trouble is ensuring that this content is able to be added to the Lucene search index against the correct item (that which the rendering/sublayout is assigned to).

The following computed index field can be used to index the templates used as datasources against renderings or sublayouts. You may also be interested in indexing child content against an item.

Wednesday, March 16, 2016

Sitecore no lists are appearing in the list manager

If you have contact lists under the All Lists item in your content tree (/sitecore/system/List Manager/All Lists), but the lists are not showing in the list manager. This is likely an issue with a system index. To fix this issue:

Tuesday, March 15, 2016

Sitecore faceting search results based on top level site architecture

Many web sites will have a structure/site architecture which contains a number of top level categories of which each contains the relevant pages/content. When a user is searching that same web site, it can be a useful feature to have a facet which allows the user to narrow down the results by category (aka the top level architecture).

Friday, March 11, 2016

Sitecore WFFM email not sending on submit

A colleague came across an interesting error with a web form for marketers form that was not completing the send email save action. The mail settings were setup for the save action (/sitecore/system/Modules/Web Forms for Marketers/Settings/Actions/Save Actions/Send Email Message), the success message was showing, and there was no errors in the logs. However no email was generated.

Sitecore Lucene highlight search term for best matching field

In a previous post I outlined an example of some code to highlight the search term in the search results using Lucene in Sitecore. This works well if you have one field you want to display on the search results page (the page description for example). But if you have multiple fields which could contain the search terms, it's a good idea to check them all to see which has the best match.

Sitecore Lucene highlighting the search term(s) in the search results

A popular feature with any good search engine, is the ability to highlight the search terms in the content shown for each search result. The main benefit with this is that the user is able to see the context of which the search term appears for each result returned, which allows them to choose the result which is most relevant to them.

This feature is not provided out of the box with Sitecore, but you can implement it if you update the Lucene DLL in your Sitecore web site.

Wednesday, March 9, 2016

Sitecore Lucene getBestFragment returning null with search results highlighting

I have been using the getBestFragment method of the highlighter class available through the Lucene contirb library - to highlight the search term(s) on the search results page. It worked as expected in most cases, however I noticed that some results were returning as null. After looking through the documentation, I found out that:
Returns: highlighted text fragment or null if no terms found
So it is important to have handling in your code and not just expect that there will be an output from this method.

Sitecore Lucene boosting content of a specific template

When testing out my new Sitecore Lucene search engine, I noticed that for some key search terms, news articles were outranking the content pages (which should have been the highest matches). This was due to a high scoring based on keyword density in the news articles.

Sitecore error when highlighting search keywords with Lucene

When referencing the Lucene.Net DLL that comes out of the box with Sitecore (version 7 or 8 with my tests), you will encounter the following error when attempting to highlight the search keywords.
Method not found: 'System.Collections.Generic.ISet`1<!!0> Lucene.Net.Support.Compatibility.SetFactory.CreateHashSet()'.
This is because the version used with Sitecore is not compatible with the Lucene contrib libraries. This library which is maintained by contributors with special rights includes the highlight functionality (among other features such as spellchecker).

Sitecore accessing items in a computed index field

When the search index is indexing a computed index field for a given item, there is no Sitecore context. This means that if you have code which requires the context of Sitecore - to get an item for example, then there will be a null reference exception.

I found this when I was using a helper class to generate the custom title of my pages - to then be indexed. This class used the context to get a configuration item and therefore was throwing an exception for every page.

Tuesday, March 8, 2016

Sitecore accessing a search index whilst it is rebuilding

Depending on the indexing strategy of your Sitecore search index, there will be time when the index needs to rebuild. This may be at the end of publish (onPublishEndAsync) or even after a full publish (rebuildAfterFullPublish). What you might not realize is that the index will be unavailable (and therefore the search itself) during this republish. This is not best practice as it can ;lead to large amounts of downtime depending  on the size of the index and how often it's rebuilt.

Sitecore Lucene search index when an item was last updated

With your Sitecore search it can be useful to the end user to provide sorting based on the last updated date of the Sitecore item. A good use case of this is actually when using Google to search for Sitecore help - if you set the results to show from the past year, you are more likely to get results relating to the current version of Sitecore.

Sitecore Lucene search all documents have the same score of 1

I spent a bit of time banging my head against the wall when my Sitecore Lucene search was returning all documents with the score of 1. Even with boosting on key fields, less relevant documents were appearing first (because they all had the same score).

In this particular case the issue was the when building the search query I was using filter instead of where.
var searchResults = searchContext.GetQueryable<SearchModel>().Where(searchPredicate);
 would be correct instead of:
var searchResults = searchContext.GetQueryable<SearchModel>().Filter(searchPredicate);

Wednesday, March 2, 2016

Facets with Sitecore lucene search

With search engines, faceting is a concept which allows users to filter the results set to give them more relevant results to what they are attempting to search. A common example would be an online clothing store; when searching for clothing they would have facets on type (mens, womans or childrens) and even sizing (small, medium large, etc.). These facets are great from a users perspective, because it allows them to filter out results that are not relevant to them (to use that clothing store example again, I would only be interested in mens clothing in my size).

With Sitecore search using Lucene, facets are simple to implement and make for a much better search experience for users.

Sitecore Lucene facet phrases or sentences with spaces

I came across an interesting error with an implementation of Sitecore search using Lucene. I had a facet based on a phrase (page category) that contained spaces. When the facets came back on search code they were split up into individual words - "New" and "Zealand" instead of "New Zealand".

Tuesday, March 1, 2016

Sitecore 8 Lucene indexing PDF content

By default, Sitecore will not index the content inside document types such as PDF or DOC. It requires the use of an iFilter and a custom Sitecore index field. Simply put, the custom index field will read the content of the document (using the iFilter) and then the content will be inside the Lucene index and available for searching.

Sitecore allowing content authors to exclude items from search

Third party search engines like Google will find pages to index via a sitemap or what is linked off of other pages on the internet. This makes it easier to control what pages are index-able and able to be found by users. With Sitecore any item which has a template included in the search index (or not excluded) will be indexed - as long as it's published to the web database. For system pages or campaign pages for example, you might not want users to be able to find them in your custom site search (on a case by case basis). There is a simple solution to this, which empowers content authors to exclude a given page/item as required.