Tag Archives: solr dismax

Understanding the WebSphere Commerce Solr Integration

If you use WebSphere Commerce V7 then you may already use the WebSphere Commerce Solr integration for search that is provided in the product, or you might be thinking about using it. This integration brings together two very complex pieces of software here WebSphere Commerce we know is complex and Solr is an enterprise search.

It will have been a trade off for IBM when working on the integration to give your marketing team a single interface to manage both search and merchandising functionality,while at the same time supporting Commerce functionality like eSites.  It works well and can be customised but are you really giving customers relevant results or are they seeing too many no results pages, or irrelevant responses?  Do you want to know more and get more from Solr, if you do we will try and bring together these different areas and help you get the most from search.  Understanding the integration, and improving search relevancy for your customers when they are looking for the products that you sell.

Some Useful Terms

First let’s take a look at just some of the components and terms that you will use when working with the WebSphere Commerce Solr integration.

  • Solr – provides an open source enterprise search environment from the Apache foundation, supporting many features including full text searching, faceted search and rich document support such as word and PDF. It powers some of the largest site’s in the world and many eCommerce vendors integrate with it to provide search functionality.
  • Preprocess – the di-preprocess command will take the WebSphere Commerce data and generate a series of tables that flatten the data data that so it can be indexed by Solr.  A number of preprocess configuration files are provided out of the box and when you run the command you will see a series of tables that start ti-…. will be created in your database instance.  When you become more advanced with Solr you may want to include additional data at pre-process time.
  • di-buildindex – for Solr to run it must have an Solr Index, this is built from the data that was generated when running the pre-process component of WebSphere Commerce.  The index then needs to be kept up to date at various times either through a full build of all the data or a delta build to just pick up changed data.
  • Structured data – the structured data for Commerce is anything from the database so your product information would be part of your structured data.
  • Unstructured data – this would be your PDF’s documents anything not from the database that will be returned in your results.  We won’t really focus on this type of information yet, there is enough to get right with the structured data.
  • Solr document – a document in the Solr index refers to the details on a product / item /category, the document contents are then returned as part of the Solr response.
  • Search term – the search term’s are the words you are looking for within the Solr Index
  • Relevancy Score – this is very important it how Solr has ranked the document when it performs a search against the terms.  That score can be impacted by a wide variety of options both Solr driven but also down to how you have structured the data.  Understanding this score is understanding the results being produced.
  • Extended dismax – a query mode used when working with Solr. Prior to Feature Pack 6 IBM went with the very simple Solr query parser, at FEP6 and up they started using dismax (though not fully).  The Solr parser is limited in what it can do, IBM did produce a cookbook example on how to fix this but it is pointless, we explain why in a forthcoming post.
  • Schema.xml – the schema.xml file defines the structure of the Solr configuration,  the file can be modified if you want to say add longDescription into your search index which by default is not used. You would also make changes in here if you adjust the configuration components such as the spellchecker.
  • Solr Core – this allows us to have a single Solr instance with multiple Solr configurations and indexes, you will see a ‘default’ core that is available and not used.
  • CatalogEntry Core – the index created that covers everything about the products and items within WebSphere Commerce.  When a query is created you send it against that index for example http://<myhostname>:<port if not 80>/solr/MC_10351_CatalogEntry_en_US/select?q*:* will return information from the entry based index on products and items in there.  You can see from the core name that it’s taking the master CatalogId as an identifier as well as the language.  This means we can have multiple language indexes being used.
Solr WebSphere Commerce CatalogEntry Query

Solr WebSphere Commerce CatalogEntry Query

  • CatalogGroup Core – the index created that covers information about the categories that are within the store. An example query against the Catalogroup gCore http://<myhostname>:<port if not 80>/solr/MC_10351_CatalogGroup_en_US/select?q*:*
Solr WebSphere Commerce CatalogGroup Query

Solr WebSphere Commerce CatalogGroup Query

Working with Solr through Management Centre

The marketing team interact with Solr though Management centre, it provides the ability to manage how results are produced based on a combination of what the customer is searching for.  The ‘Landing Page’, ‘Synonym’ and ‘Replacement Terms’ that follow are all found under the ‘Catalogs’ section of Management Centre, while ‘Search Rules’ are found in ‘Marketing’.  It may not seem obvious to split the functionality up in this way, especially as when you first look certain aspects of say a replacement term are repeated in a search rule.  But what you will find is the power of the ‘search rule’ means that more can be done rather than just altering the terms.   It will really be down to you to decide where you want to manage the functionality, because most users will have access to both areas, very few companies restrict the access in Management Centre that we have come across.

Landing Page – although it comes with the other Solr components the Landing Page is not actually doing anything with Solr.  That is really important to understand. If you have a landing page defined for a search that a user makes, it will be the first option evaluated.  If there is a match then the landing page is called and the search request never goes near to Solr.  Instead the user will get a browser redirect to the page that has been defined, and the process will finish

Synonym – is a way of increasing the scope of the terms a user is searching on, by adding in additional search terms. For example you might have two terms that have nearly the same meaning so ‘dog’ and ‘pooch’, or you might have words that describe the same term so ‘shelves’ and ‘shelf. Also with a synonym it is bi-directional,  so if I enter dog, my search will be for both ‘dog’ and also ‘pooch’ and if I enter ‘pooch’ it will also be for ‘dog’.

One area that can cause unexpected results with synonyms is when setting up multi-term synonyms there is a really good article on why they are so awkward.

To keep your configuration tidy synonyms should not be used for misspelling that is where replacement terms are used. You don’t really want to be producing a search that has both the misspelt term and the correctly spelt term, it just uses up processing time.

Replacement Term – is a way of changing the search terms a user has entered, either by using an ‘also search for’ or an ‘instead search for’. As an example suppose we pick up in our analytics that a common customer searches is for fusa, we could have an ‘instead search for’ that replaces the term with fuchsia correcting the misspelling. We could then use the ‘also search for’ if they put in a term that may have some non natural listing so they search term is ‘notebook bag’, we could have an ‘also search for’ that extends ‘bag’ to be ‘sleeve’. That way we can pick up our products for ‘notebook sleeve’ as well as ‘notebook bag’

As with Synonyms you must be careful when looking at multiple term replacements, you can get some strange results. For example if you have a replacement that says ‘matteress topper’ instead search for ‘mattress topper’ to pick up the type, you end up with a search term that looks like this.

+”matteress” +”topper” +”mattress topper”

This is how the query parameters are sent to Solr, we have the individual terms and we have the full term that has been replaced.  The + sign will be telling Solr it’s an AND so all our search terms must match, we will then get no matches.  The reason why is because we still can see that ‘matteress’ is there, and it’s spelt wrong, so the AND will fail in this case.

The answer is make single term replacement’s not multi-term, and the use of ANY and ALL as your matching types will also help.

Understanding how Solr Synonyms and Replacement terms integrate with WebSphere Commerce

The way synonyms and replacement terms work is not the same as if you were just using Solr on it’s own.  Instead WebSphere Commerce is mimicking some of the functionality that Solr provides, so it handles expanding the search terms if there are synonym matches and the same for replacement terms.  This is to help it support eSites, but it can cause potential issues especially when you look to use more of Solr such as the minimum match parameter.  The way WC works is it uses two tables SRCHTERM and SRCHTERMASSOC and allows changes to be seen straight away.  But because this is done outside of Solr it can have an impact on looking to use some of the more interesting Solr functionality such as minimum match.  I will investigate this further but just keep it at the back of your mind it is good for integration not so good for working with Solr.

The final option and the one with the most functionality can be found in the marketing area Management Centre and these are the search rules.

Search Rule – this is the key part of ‘searchandising’ it is here that you can really manipulate the interaction with the searches on the site and Solr.  The search rule is made up from two parts

The target where we can apply to customers in a certain segment, at a certain time of day when they search on a specific term.  There are a variety of targets including social participation or external site referral.

WebSphere Commerce Search Rule Target

WebSphere Commerce Search Rule Target

The action this allows us to take modify and control what the user see’s, so we might first of all adjust the search term, and then bring back a list of products with some boosted in that search.  We can be clever in here and create actions such as canned searches, where we can control the products a user see’s by generating a search term that gets no other product matches.

WebSphere Commerce Search Rule Action

WebSphere Commerce Search Rule Action

The search rules can also handle branching and experimentation, it is very powerful but what it produced at the end is the query that will be passed into Solr.  And again if you can understand that query you can also help in how the results are delivered.

Where to go from here?

That is a brief introduction to some of the areas we feel are useful when working with Solr, and there is a lot more that can be covered.  It is important to understand because good search results within the site can count as much as having good SEO off the site.

Right now we are creating articles on tuning search and getting the best relevancy as well as on understanding and using Dismax pre Feature Pack 6.  The importance of Solr is increasing all the time with Feature Pack 8 to come, there will be more new features to look at using.  It is a very powerful piece of software, that needs time and attention to get the best results from

We also have some existing articles looking at issyes with Delta Builds, and some more around potential Core issues when changing feature Packs or installing APAR’s.