The title of this post might make this obvious, but just to be sure, I want to prove out the ability to run a multilingual site in Umbraco that supports language specific search. While I could easily just create different sites for each language and reference them by top level domains, the sites are going to share all the views and only the text content is going to be different. Different language sites would mean a lot of duplication. So I want to keep everything in one instance and use URLs to drive the language. I will quickly explain my multilingual setup, but the focus of this post is on the search setup.
First, I'll walkthrough my Proof of Concept Umbraco multilingual site search implementation. The key assumption being, how the multilingual site is configured. I believe this setup is standard for multilingual Umbraco sites, but just in case, here they are:
There are plenty of options to understand and consider when setting up the Umbraco Examine indexes. I found some articles referring to Lucene language specific indexers, I made some attempts at using these but was not successful and ended up using the standard analyzer. I believe the analyzer will need further investigation.
In order to create indexes that are specific to the language site roots and don't include results from other language sites I had to set the indexes to point to the node roots using the node id. The node id is just a number you can get by logging into the Umbraco back office, selecting a node and clicking the properties tab. Below is my index setup in config/ExamineIndex.config.
Next, I added the indexers and the searchers. These need to be added to the ExamineSettings.config in the ExamineIndexProviders and ExamineSearchProviders sections.
With the indexes, indexers and searcher created, it is now possible to test the indexes and searchers. Log into the Umbraco backoffice and navigate to the 'developer' menu (left hand nav) and select the 'Examine Management' tab. The new indexes and searchers should be visible here. There are several Umbraco events that trigger the indexes to refresh, like a content publish. to force an index refresh, select it by clicking its link, select 'Index info & tools' and click 'Rebuild Index'. A warning popup will be displayed because this operation can be hard on the infrastructure. If this is not an issue, click 'Ok' to proceed and the index will be rebuilt.
The searchers can be tested by selecting the searcher link and selecting the search tool. For my testing I stuck with the 'Text Search' as I believe it more closely resembled what the site search would be doing. At this point you should be able to search content that has been discovered by the indexer associated with this searcher. A text search here will display matching nodes, their ids data and search 'scores'. The score is Lucene's search ranking on how relevant a search result is to the entered search term. The search page in a site will most likely sort results based on the search score.
The searchers were returning results specific to the node roots which meant I had achieved language specific indexing and results. Success!
While the Umbraco search tools were useful to ensure the configuration was correct, I still wanted to build an actual search page to complete the PoC. The code for my simple search page is included below.
This search page view works for all languages. In my PoC I have a search node item directly hanging off the site parent node, and I detect which language parent is the current parent by checking the node id. The production version of this page will likely make fewer assumptions, but for the PoC I had a working search page that returned only results for the selected language site.
Umbraco provides reasonable support for multiligual sites out of the box. Adding a functional search to these sites thats keeps language results isolated to the site is easy to accomplish.
If you are interested in running an Umbraco CMS hosted site, read my other posts on Umbraco:
Photo by katie manning on Unsplash