Monday, November 2, 2015

Configuring Sitecore Lucene Search Indexes

Introduction

Search indexes are maintained by Sitecore scanning items stored in the database. Whenever an item is created, updated or deleted, a job is run in Sitecore which updates the search index(s). In the example of item creation, the item would be added to the index immediately after it is created.

The search index is a physical file (effectively a searchable database) which is maintained by the crawler and allows for searching of itself - which links to the actual Sitecore items. These index(s) are stored in the data folder for Sitecore, and can actually be viewed (and even queried) via a third party tool such as Luke 3.5.

These index(s) are configured via XML in the App_Config folder of the Sitecore web site. With a default Sitecore installation, located in the Website/App_Config/Include folder are a number of files:
  1. Default configuration files: These files are the active configuration for search in Sitecore. Generally the file that would be of interest is Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration which is the default index configuration for Lucene.



  2. Example configuration files: These are some example configuration files which can be used to control search indexing in Sitecore. At the top are three example indexes for Lucene, one for each of the Sitecore databases. Along with everything needed to configure Solr and some advanced logging.


Lucene default configuration

The Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration file mentioned above, contains the configuration for any search index in which you create. Inside are a number of settings which can be changed, which can lead to optimized search for a given Sitecore implementation. Some of the settings to be noted are as follows:

  • indexAllFields: by default this option will be set to true, which means that all fields created will be indexed.
  • Under the <fieldNames hint="raw:AddFieldByFieldName"> section there are a list of fields which are indexed by default. By default these are the general fields such as template name, datasource, ID, language and so on. It is important to note that when storageType is set to YES, the value of the field will actually be stored in the index, this means you can get the value of the field in your search code without having to query Sitecore for the full item (which should lead to better performance).
  • The <fieldTypes hint="raw:AddFieldByFieldTypeName"> section contains the list of field types available in Sitecore and how they are stored in the index. For example, most fields are stored as a string, but Date/DateTime are stored as DateTime and integer as integer. If you add a custom field type to Sitecore, you will need to add it here if you wish to have it indexed.
  • <exclude hint="list:ExcludeTemplate"> are the templates which should never be indexed (across any defined search indexes you create). Generally you would put templates in here which don't need to be searched or perhaps don't render on the front end (perhaps a template used to populate a category Droplist for example.
  • <exclude hint="list:ExcludeField"> is a list of fields that are excluded from the search index. By default Sitecore will place a lot of system fields in here, but any fields you don't want indexed can be added.
  • <include hint="list:IncludeField"> allows any custom fields you want included in the index. Generally you would set common fields such as meta data (description or keywords) in here and leave included fields custom to a type to the search index definition for it. This is only relevant if indexAllFields is set to false.
  • <include hint="list:IncludeTemplate"> are a list of templates which are included by default in the index. You will need to specify every template (including those inherited as base templates).
It's important to be careful with the default index configuration, because if you exclude a template in here it won't be indexed, even when included in another index definition.

Creating a search index

With the default Lucene settings defined, it can be useful to create a custom index definition to actually query in your search logic. By abstracting the search away from your default settings it allows for multiple search indexes (perhaps a general one for pages and another for a custom content type) and it allows for optimization - being that each index indexes and stores the exact templates/fields that they require.

An basic search index configuration is below. You would save this as it's own file (named Sitecore.ContentSearch.Lucene.Indexes.Shared.Web.MyIndexName.config for example) with the other indexes.

This search configuration is a simple example for indexing news items, please note that:
  • The name of the index is set at: <index id="NewsSearch" ...
  • One template is set to be indexed (in <include hint="list:IncludeTemplate"> section), this is the news item template.
  • 4 fields have been set to be indexed with storage type set to true (<fieldNames hint="raw:AddFieldByFieldName"> section):
    • Fields with a space in them (article date for example) need to be added with and without underscores to ensure the field value is actually stored.
  • The database is set to web and the root is further narrowed to "/sitecore/content/Home/News/" as the news items are under here.
This index can be further configured using an of the include/exclude options outlined in the default lucene configuration section.

No comments:

Post a Comment