IDEAS | BLOG

Excluding Stuff from Search with Search API

The requirement was simple. Don't index [some-content] in search. This may be private information or just stuff you don't want showing up in search.

The answer for a site using Search API is as simple as hook_search_api_alter_callback_info().

In my case I wanted to exclude private files from being indexed while using search_api_attachments. The preferred way is using this hook as it creates a filter that the user can see, enable and disable on the Search API index “Filter” configuration screen.

First implement the hook to let Search API know you are adding a new filter:

/*
 * Implements hook_search_api_alter_callback_info()
 */
function myModule_search_api_alter_callback_info() {
    // Adds a filter to exclude private files from the index
  $callbacks['exclude_private_files'] = array(
    'name' => t('Exclude private files'), 
    'description' => t('Excludes private files from being indexed in search'),
    'class' => 'SearchApiExcludePrivateFiles',
  );
  return $callbacks;
}

The callback array index name can be of your choosing “exclude_private_files” in my case. Name and Description are straightforward and the class will be the name of the class we will create that actually does the filter work.

We will define our class as follows:

/*
 * The following class is used to provide file filtering for private files. It ensures
 * they are not indexed by Search.
 */
class SearchApiExcludePrivateFiles extends SearchApiAbstractAlterCallback {


  // This filter is only available on file entities
  public function supportsIndex(SearchApiIndex $index) {
    return $index->getEntityType() === 'file';
  }


  // For each file item that is indexed if the URI field contains the private
  // prefix, do not index the file by unsetting it
  public function alterItems(array &$items) {
    foreach ($items as $k => $item) {
        if (strpos($item->uri, 'private://') !== false) {
        unset($items[$k]);
      }
    }
  }
}

The class we create extends SearchApiAbstractAlterCallback. The “supportsIndex” function that we override tells Search API which entity types to apply this to. We don't want our filter meant for file entities to be applied to a node index for example. We are making sure our filter is therefore only an option for file entity indexes.

The “alterItems” function that we override does the actual filter work. Here we are looking at each file item's URI (or file path) and if it contains 'private://' we know it's a private file. We unset the item from the $items array which will make sure it's not indexed by Search. As you can see the possibilities here to create your own filters are vast. Keep in mind that this will run for every item that should be indexed by Search so keeping it speedy is good to strive for.