Use ElasticSearch to search and use external sources, like Wikipedia, inside eXo Platform

Since its version 4, eXo Platform has added a new unified search, which greatly improves its search capabilities. All the platform’s resources (contents, files, wiki pages, etc.) can now be easily found from a single and centralized location.

01-default-unified-search

Besides these out-of-the-box capabilities, a new API allows creation of custom search connectors in order to extend the search scope and enrich the results. This blog post explains how to implement and configure such a connector.

For this blog post, the search connector will retrieve data indexed by ElasticSearch, a highly powerful and easy to use search engine. It is of course up to you to decide what your search connector returns (data indexed by another search engine, data from a database, other custom data stored in eXo, etc.)

ElasticSearch

The first step if we want to use ElasticSearch is to install and configure it! The only thing to do here is to download it, extract it, and start it with:

Index data in ElasticSearch

As is, ElasticSearch is empty, no data has been indexed. So we need to feed it. For this purpose, we will use the Wikipedia River plugin. A river is an ElasticSearch component which feeds ElasticSearch with data to index. The Wikipedia River simply feeds ElasticSearch with Wikipedia pages.

After stopping your ElasticSearch server you can install the plugin with:

After restarting ElasticSearch you should see logs similar to the following:

This ensures that the Wikipedia River plugin is correctly installed (loaded [river-wikipedia]).

We can now start indexing Wikipedia pages in ElasticSearch by creating the river with a REST call (we are using curl here; feel free to use your favorite tool):

A lot of data is now being indexed by ElasticSearch (yes, Wikipedia is a huge source of data :)). You can check this by executing a search with:

Warning: the Wikipedia River will index a lot of data. You should stop the river after a few minutes to avoid filling your entire disk space ;-). This can be done by deleting the river with:

Now that we have data indexed by ElasticSearch, let’s dig into the eXo search connector!

eXo search connector

A search connector is a simple class that extends org.exoplatform.commons.api.search.SearchServiceConnector and implements the “search” method:

It needs to be declared in the eXo configuration, either in an extension or directly in the jar which will contain the connector class. Let’s go for the jar method:

  • add the class in your jar
  • add a file named configuration.xml in conf/portal in your jar with the following content (the “type” tag contains the FQN of your connector class):

We now have the skeleton of our search connector. The last step is to implement the search method.

Fetching results from ElasticSearch

We need to call ElasticSearch in order to fetch Wikipedia pages based on the input parameters of the search (query text, offset, limit, sort field, sort order). ElasticSearch provides a Java Client API (TransportClient). Sadly, it depends on Lucene artifacts, and since eXo Platform already embeds Lucene artifacts that are not necessarily in the same version as the ones needed by ElasticSearch, it can cause conflicts. Instead we will directly use the REST API:

Requests and responses are full JSON. You can find more details about ElasticSearch query syntax in their documentation. The important point here about the search connector is that each result has to be a SearchResult object returned in a collection.

Deploy it in eXo, and enjoy!

We can now deploy our jar (which contains the SearchConnector class and the XML configuration file) in the libs of the application server (/lib of Tomcat for example) and start eXo.

A search using the quick search in the toolbar now retrieves contents from Wikipedia:

02-search-results-preview

When the unified search screen is displayed, we can see that a new Wikipedia filter is listed, and our search results contain some Wikipedia pages:

03-search-results-1

If you don’t want to see Wikipedia contents in your results, simply uncheck the filter:

04-search-results-2

The code source is available here, as a Maven project.

Learn more about this project and what you can do with eXo Platform; join the eXo tribe!

Be part of the discussion, share your comments

comments

Keep in touch with the author

Tags: , , , , ,