|« Placement.info hidden features||Rewrite Module »|
There is already an existing documentation section on this exact topic in the official Orchard website, at http://www.orchardproject.net/docs/Search-and-indexing.ashx. Though, this only covers the site administrator and user experience, to setup and use the search engine. What it doesn't cover is how it works internally and how developers can reuse it, or customize the search experience. And believe me, it's great!
This post provides details on the Indexing implementation and usage. Next one will focus on querying the index, and customizing the search experience.
In order to have a search box in your website, you need to enable the Indexing module, then the Search module and finally the Lucene module.
Indexing is responsible for adding content to an IIndexProvider implementation as soon as a content item has been modified. There is a background task which is running on a separate thread and consumes IndexingTask records, created by a specific content handler event which creates the indexed document information. This point will be cover in detail later.
Search is providing a Search Form widget, and also a Search Settings to select what indexed fields should be part of the search.
Lucene is providing a default implementation of IIndexProvider and is used as a dependency of the Index module to save the indexed documents information in a physical storage. It's also providing an ISearchBuilder implementation which is called to build a search query.
Customizing Indexed Content
The Indexing module doesn't know anything about the content in Orchard. Though it has to be able to index anything inside a website, even from new modules, like Products, News, or anything else. For that to happen, modules can create implementations of ContentHandler and explicitly define OnIndexing<T>() in order to provide some information of a specific content part. It it defined as this:
This means you have to provide a lambda which takes exactly two parameters: an IndexContentContext instance, and the content part you should extract indexed information from. IndexContentContext inherits from ContentContextBase, and provide common information about the current content item:
It also contains a DocumentIndex property of type IDocumentIndex which looks like this:
This interface is used to directly provide all the information which has to be indexed. The Add() methods are used to add a named field to the index document, Analyze() just tell the concrete index implementation that the text should be tokenized (to differentiate text content like a title from metadata like an Id or an integral value). For instance, when OnIndexing() is called on the BodyPart (which is what is containing the body of a Page for instance) the following handler is executed:
The text of the body and its inner format are indexed. The text calls RemoveTags() which is used to remove all HTML or XML tags of a content. As our BodyPart will mostly contain HTML, and we don't want a search query on "div" to return all documents, then this method is used on this field. Finally Analyze() is called to be able to search on each different word of the content. The format is save in a separate field, and as it should only contain a mime type like text/html or so, we don't need to analyze its content or to remove tags. Though, we mark this field with a call to Store() in order to be able to retrieve the exact value from a search result, as indexing will generally break the original content. This should be done for any metadata added in the index.
NB: You can add several values to the same field name, even from different handlers. An example will be provided later.
This should be enough to understand that any module can provide information about the indexed information of a specific content item. For more examples just take a look at how Tags, Comments or Routable add their own properties in the index.
Trackback address for this post
This post has 38 feedbacks awaiting moderation...