Thursday, March 17, 2016

Sitecore Lucene scheduling a search index to rebuild with IntervalAsynchronousStrategy

For larger Lucene search indexes in Sitecore (that can take some time to run) it is not the best idea to have these update after publishing - via the onPublishEndAsync or RebuildAfterFullPublish index strategy. In my case the index contained 20,000 items, with more than half being PDFs so it took close to an hour to process.

Luckily with Sitecore there is an index strategy which allows the index to rebuild automatically on a schedule. This is done via the IntervalAsynchronousStrategy index strategy and you are able to define the database and interval for when it should run.

Define the strategy

First the indexing strategy will need to be defined. This is done inside the Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config file which is located in Website/App_Config/Include.
<configuration xmlns:patch="">
        <intervalAsync3Hour type="Sitecore.ContentSearch.Maintenance.Strategies.IntervalAsynchronousStrategy, Sitecore.ContentSearch">
         <param desc="database">web</param>
         <param desc="interval">03:00:00</param>
In the example above (configuration has been trimmed for clarity), an index strategy of type IntervalAsynchronousStrategy has been defined on the web database with an interval of 3 hours. The CheckForThreshold element (if true) will fully rebuild the index if more items than the FullRebuildItemCountThreshold setting are affected.

Assigning the strategy to an index

Now that the strategy is defined, it needs to be assigned to your custom search index. Take note that the name of the strategy defined above (intervalAsync3Hour) is used in the search index configuration below. 
<?xml version="1.0" encoding="utf-8" ?>
<configuration xmlns:patch="">
      <configuration type="Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
        <indexes hint="list:AddIndex">
          <index id="MySearchIndex" type="Sitecore.ContentSearch.LuceneProvider.SwitchOnRebuildLuceneIndex, Sitecore.ContentSearch.LuceneProvider">          
            <strategies hint="list:AddStrategy">
              <!-- NOTE: order of these is controls the execution order -->
              <strategy ref="contentSearch/indexConfigurations/indexUpdateStrategies/intervalAsync3Hour" />

Testing that the index is running

Now to ensure that the index is running on schedule, open up the latest Crawling.Log file (found in Data/Logs) and the following line should appear:
INFO  [Index=MySearchIndex] Initializing IntervalAsynchronousUpdateStrategy with interval '03:00:00'.

Now your custom search index will update on schedule! It's also worth noting that this index strategy should not be combined with SynchronousStrategy and OnPublishEndAsync.

No comments:

Post a Comment