Wednesday, August 08, 2012

Search Indexing Handlers


Introduction

Lets start with a business case to give you an idea when to use custom search indexing handlers.

When a Component with localized children is changed, all translations need to be updated as well. This however is not visible in the content manager. So, while working on a proof of concept assignment for a bank, they wanted me to show that we could write an extension to support their translation process.

It had to be possible to easily see and work with all Components that require translation:

  • When a parent is updated all localized children require translation
  • When a shared item is localized it requires translation
Usually this kind of metadata is stored as Metadata on the Component. However, I did not want editors to be able to change this information, and I did not want to impose requirements on the content model (Schemas), just because of this extension.

Therefore I decided to store the Translation Metadata as Application Data on the Components.

But since Application Data is not indexed I had to implement a custom search indexing handler that would index the Translation Required field on a Component, allowing me to find all Components that require translation.

So when you want to find items based on specific metadata, Application Data, or even based on properties (e.g. PublishLocationUrl) stored in an item its Application Data, then implementing a custom search indexing handler makes sense.

In this blog article I will guide you through the steps necessary to implement a handler, and build one that indexes a Component’s Expiry Date, set in the Component Metadata.

When implementing a search indexing handler three steps need to be covered:



  • The index field needs to be defined in the Sol-r Tridion Schema
  • The handler itself needs to be developed in a class that implements ISearchIndexingHandler.
  • This class needs to be configured in the searchIndexer section of Tridion.ContentManager.config



  • Field Schema Configuration


You can find the Sol-r Schema Field configuration here: Tridion\solr-home\tridion\conf\schema.xml, and for each field you need to add a field element to the fields section.

  <fields>

    <!-- ContentAuthoring.ExpiryDates -->
    <field name="ExpiryDate_dyn_s_dte" type="date" indexed="true" stored="true" />
    <!-- ContentAuthoring.Translation -->
    <field name="TranslationRequired_dyn_s_bln" type="boolean" indexed="true" stored="true" />
    <field name="ParentRevisionDate_dyn_s_dte" type="date" indexed="true" stored="true" />
    <!-- ContentAuthoring.PublishedTo -->
    <field name="PublishedToTargetIds_dyn_mvl_txt" type="string" indexed="true" stored="false" />
    <field name="PublishedToTargetTitles_dyn_mvl_txt" type="text" indexed="true" stored="false" />
  </fields>

In the example you see three fields defined. Expiry Date, and Published To Target Ids and Titles.

The name of the field needs to be set in a specific format to allow to parse the field value in the result set. E.g.: ExpiryDate_dyn_s_dte

Besides the name of the field, the name attribute contains information on whether or not the field is dynamic (_dyn), stored (_s) or multivalued (_mvl).

NB: Making a field stored will return it in the search result.

The last part of the field name must contain a type keyword:

Type
Type Code
Field Class Name
Boolean
bln
BooleanField
Date
dte
DateField
Double
dbl
DoubleField
Integer
int
IntegerField
Float
flt
FloatField
Long
lng
LongField
String
s
StringFIeld
TcmUri
uri
TcmUriField
Text
txt
TextField
Xml
xml
XmlField
 Table 1: Field definition

The dyn, mvl, s and [type] keywords are configured for Sol-r in the field element, and included in the name to support the SDL Tridion Search functionality.

To work with index fields you can use the helper classes shown in table 1 and defined in the Tridion.ContentManager.Search.Fields namespace.

NB: Although technically you’re not configuring a dynamic Sol-r field, the custom field is marked dynamic. Will revise this article later with some more details.

Keep in mind that when you define a string field the value is indexed in its entirety, and only text fields allow for partial and more advanced search matches.

Implementing the Search Indexing Handler

You can create your search indexing handler by developing a class that implements the ISearchIndexingHandler interface. This interface requires two methods:


public void Configure(SearchIndexingHandlerSettings settings)

public void ExtractIndexFields(IdentifiableObjectData subjectData, Item item)

The Configure method can be left empty, and ExtractIndexFields contains the actual implementation of your search index handler.

The following example indexes any ExpiryDate metadata field present in the Component:


public sealed class ExpiryDateSearchIndexingHandler: ISearchIndexingHandler

{
    const string ExpiryDateFieldName = "ExpiryDate";

    public void Configure(SearchIndexingHandlerSettings settings)
    {
    }

    public void ExtractIndexFields(IdentifiableObjectData subjectData, Item item)
    {
        var itemData = subjectData as VersionedItemData;

        if (SubjectHasMetadata(itemData))
        {
            AddExpiryDateToIndexItem(itemData, item);
        }
    }

    public void Dispose()
    {
    }

    static bool SubjectHasMetadata(RepositoryLocalObjectData itemData)
    {
        return itemData != null &&
                !string.IsNullOrEmpty(itemData.Metadata);
    }

    static void AddExpiryDateToIndexItem(RepositoryLocalObjectData itemData, Item item)
    {
        var expiryDate = GetExpiryDate(itemData.Metadata);

        if (expiryDate.HasValue)
        {
            item.Add(new DateField(ExpiryDateFieldName, expiryDate, true));
        }
    }

    static DateTime? GetExpiryDate(string xml)
    {
        var xe = XElement.Parse(xml);

        var expiryDateValue =
            xe.Descendants().Where(
                el => string.Equals(el.Name.LocalName, ExpiryDateFieldName, StringComparison.InvariantCultureIgnoreCase)).
                Select(el => el.Value).FirstOrDefault();

        DateTime expiryDate;
        return DateTime.TryParse(expiryDateValue, out expiryDate) ? expiryDate : default(DateTime?);
    }
}




Configuring a Search Indexing Handler



The third step is to configure your search indexing handler in the content manager configuration.

You can configure the handler in the searchIndexer section:

<searchIndexer heartbeatTimeout="300" indexingThreadCount="1" hostUrl="http://localhost:8983/tridion" hostUsername="[Domain]\[Account Name]" hostPassword="[Password]">
   <searchIndexingHandlers>
        <add type="Tridion.ContentManager.Search.Indexing.Handling.DefaultSearchIndexingHandler" assembly="Tridion.ContentManager.Search.Indexing, Version=6.1.0.996, Culture=neutral, PublicKeyToken=ddfc895746e5ee6b" />
        <add type="[Customer Name].ContentManager.Extensions.ContentAuthoring.ExpiryDates.ExpiryDateSearchIndexingHandler" assembly="[Customer Name].ContentManager.Extensions, Version=1.0.0.0, Culture=neutral, PublicKeyToken=9d4a72f7ed42e8d3, processorArchitecture=MSIL" />
    </searchIndexingHandlers>
</searchIndexer>

It might be possible that in your implementation these elements are encrypted. If this is the case you can encrypt the section again by resetting the host password using the SDL Tridion MMC Snap-in.

Expired Content Search Folders

When you create a Search Folder you need to query the entire name as specified in the Sol-r Schema.

The following example shows the Search Folder source to find the content that is expiring in the coming week:

<SearchFolder xmlns="http://www.tridion.com/ContentManager/5.1/SearchFolder">    <GeneralParameters>        <SearchQuery>ExpiryDate_dyn_s_dte:[NOW TO NOW+1WEEK]</SearchQuery>        <SearchIn xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="tcm:0-6-1" Recursive="true"></SearchIn>    </GeneralParameters>    <AdvancedParameters></AdvancedParameters></SearchFolder>

The following example shows the Search Folder source to find the Component published to two specific targets, Live (Public) and Live Secure:

<SearchFolder xmlns="http://www.tridion.com/ContentManager/5.1/SearchFolder">
    <GeneralParameters>
        <SearchQuery>PublishedToTargetTitles_dyn_mvl_txt:Live AND PublishedToTargetTitles_dyn_mvl_txt:"Live Secure"</SearchQuery>
        <SearchIn xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="tcm:0-4-1" Recursive="1"></SearchIn>
    </GeneralParameters>
    <AdvancedParameters></AdvancedParameters>
</SearchFolder>

Wrapping up and some additional advice


Now restart the COM+ application and search related services and call TcmReIndex.exe /all to reindex your entire search index.

To wrap up I would like to share some points I came across:


  • Updating Application data does not trigger updating the Search Catalog. You need to force this manually, which is not possible using the API, and the solution I found is unsupported.
  • You receive a Tridion.ContentManager.CoreService.Client.IdentifiableObjectData data object in the handler, and not the actual TOM.NET Tridion Item. This means that when you need to do anything with the item (e.g. retrieve Application Data) you need to use the Core Service.
  • When searching for these custom fields you need to use the full name as specified in the Sol-r Schema.
  • You can define multivalued index fields (nice to keep track of all publication targets a component is published to!), but they can’t be returned in the search result. Tridion doesn’t know how to handle these results, so set the stored attribute in the Schema configuration to false.

Good luck! I hope you find this article useful, and when you need any help, let me know.

No comments: