Hibernate-Search Multiple-value Facet-Counts

Introduction

The hibernate-search facet search API is pretty amazing but it lacks a little on the following:

  • Facet-Count values are incorrect where a facet matches a fields that have multiple values. I.e. the facet-counts are correct where the facet has a 1-to-many relationship with result-set items but the facet-counts are entirely incorrect where the facet has a many-to-many relationship with result-set items.
  • The facet-count values are that of the number of matched result-set items within the result-set as distinct from the number of matched result-set items that will be merged into the result-set if the facet is applied. This is highly undesirable where more than one facet-group is used conjunctively. Let’s explain…
    • Facet-counts are available before any facets are applied. Adjusted facet-counts are available after facets are applied. This is great because it facilitates two different usages. With the former, the facet-counts can remain static as facets are selected and deselected. With the latter, applying one facet will reduce the facet-count for other facets. In fact, the other facet-counts can reduce to zero. This zero count does not necessarily mean no result-set items will result from applying the facet. Consequently, we don’t know if clicking on the facet will yield new results or not. If a facet yields no results, and it is used in conjunction with facets from another facet group, this can lead to a zero-results situation. Confused? no problem Ok, allot of detail, description, example and analysis is provided in this blog.
  • These issues are identified in the following discussion threads:
  • This blog delivers a comprehensive solution to all issues surrounding facet-counts in hibernate-search. The solution is derived from the information provided here: http://sujitpal.blogspot.ie/2007/04/lucene-search-within-search-with.html. A java class that delivers the solution is provided. There is also a fully working maven project demonstrating the solution in action.

Background

  • This blog describes how the different relationships between facets and result-set items make for differing behavior patterns. I.e. 1-to-many versus many-to-many relationships.
  • Here are some jsFiddles demonstrating how this form of faceted searching can work:
  • With the third example, the facet-counts show the number of result-set items that will be merged into the result-set if the facet is applied. This is the solution provided here in this blog.

A java application leveraging the hibernate-search faceting API with a solution for facet-counts

The java application is very similar to the jsFiddle applications except that, for demonstration purposes, it optionally uses date-range filtering along with the border-color and stripe-color faceting.

  • Each square has one arbitrary date for no other reason other than demonstrating the facet-count solution.

 

The facet count solution BitSetFacetHitCounter.java

First some terminology and info, in the image above there are three facet-groups (the squares are in fact rectangles).

  • The “Date Filtering” facet-group
  • The “Square Colors” facet-group (for which the facets have a 1-to-many relationship with squares)
  • The “Stripes” facet-group (for which the stripe facets have a many-to-many relationship with squares)

The BitSetFacetHitCounter class is the engine behind the facet-counts. It works by reducing the lucene query by each facet-group except the one for which counts are being generated.

So, in other words to calculate the counts for the stripe-facets, we reduce the query by the date-range-facet and the selected square-color facets, then we get a count for the stripe-facets.

Similarly to get counts for the square-color-facets we reduce the query by the date-range facet and the selected stripe-color facets, then we get counts for the square-color-facets. This happens after a conventional faceted search.

 

Usage

Take a look at the code below to observe how the reduce method is used ahead of obtaining the counts.

Usage occurs pretty much after the conventional hibernate-search facet filtering has been applied.

Note: Keep in mind that each org.hibernate.search.query.facet.Facet contains a lucene Query object.

The reduce() Methods

The BitSetFacetHitCounter reduce method is overloaded to facilitate reducing by pretty much any type of faceting or more accurately, by pretty much any filter or query.

The count() Methods

The count method takes a list of TermQuery objects and returns counts for each in the form of a FacetView (custom bean class) list. (Each discrete Facet object has a TermQuery)

As previously mentioned, the count method can be used after the Query has first been reduced by the other facet groups.

Observe the code to see how the reduce methods are used.

 

The complete code (BitSetFacetHitCounter.java)

The constructor throwing an Exception is not ideal but we’ll have to get over it.

The getTermQueryList(..) method is just for convenience.

Look to the hibernate-search documentation on Filters if using the reduce(…) method that takes NumericRangeFilter as an argument.

 

FacetView.java dependency

 

 

A slight problem with sorting

When generating the FacetingRequest object, ordering by facet-count will not yield correct results because the counts we’re concerned with are obviously generated afterwards. Therefore manual sorting will have to be done if sorting by count. Sorting by field value still works ok.

FacetingRequest.includeZeroCounts() is also redundant, another task that will have to be performed manually.

I.e. see the inline comments

 

Performance

I haven’t done any proper performance testing beyond looking at the Net –> XHR tab in firebug with a different application operating on a few thousand records.

This obviously isn’t even close to the testing required but “everything seemed fine”.

The high numbers are for the first request after a redeploy was done. Subsequent numbers are from when a facet was randomly selected or deselected. Two of the deploys are when the BitSetFacetCounter was used, one is from when it wasn’t used and the out-of-the-box facet-counting was used instead. Can you tell which was which?

Please feel perfectly free to contribute to this.

 

Maven Project

  • The maven project is available for download here: HibernateFacetSearchManyToManyCount.
  • The project depends on a MySQL database called squares. The WEB-INF/sql folder has a SQL setup script.
  • The MySQL database called squares must be created first.
  • The SQL to create the tables and populate them is in WEB-INF/sql
  • After the DB has been created, the index needs to be created. The index location is specified in two places (yep, sorry) ‘database.properties’ and ‘hibernate.cfg.xml’. It’s set to c:\temp\squares_hs_index so feel free to adjust (in both files) to suit your needs. (In fact much of the hibernate configuration is duplicated, I would not use this setup as a golden template).
  •  There is a file called Indexer.java in package com.outbottle.hibernatefacetsearchmanytomanycount.indexing which needs to be executed to build the index. In Netbeans, right-click the file and choose Run-file from the menu. This will build the index. An essential prerequisite to running this application.
  • Anyway, it uses the Spring framework but don’t worry if you’re not familiar with Spring, it’s by-the-way to the demonstration.
  • It uses AngularJS with jQuery for the user-interface but again, don’t worry about that, that’s just a means to an end for demonstration purposes.
  • BitSetFacetHitCounter.java is the core reusable class for facet-counts. SquareDaoImpl.java avails of BitSetFacetHitCounter while using the hibernate-search API in a fairly standard way. These along with the dependencies FacetView.java and SquareDaoHelper.java are the only classes of real importance in understanding the relevant facet-count solution.
  • The methods searchIncludingDateFiltering(…) and search(…) in SquareDaoImpl are not a lesson in good code from a reusability perspective; they are deliberately stand-alone in order to make what’s happening as clear as possible. (Note FacetView.java has a very weak DOM ID generator, don’t depend on it).
  • Note the comments //End conventional facet query and //begin manual count of multi-valued facets or many-to-many facets. This is where the normal hibernate-search facet API stuff ends and the facet-counting with BitSetFacetHitCounter begins.

 

As ever, comments, feedback and critique welcome because I know very little about Lucene. There may be alternative ways of achieving this. There may also be issues with this this code that I’m not aware of but in any case it seem to work pretty well.

 

 

 

4 Comments

  • Hi!

    I used your code in order to fix the facet problem with Hibernate SEarch 4.4.2 and for the moment seems to be fine…do you know any problem with this solution?

    Thanks in advance!

    • I’m using it myself for a project in development. So far, I haven’t found any issues or problems.

      I’ve added another count() method to the class which takes a List of Facet objects rather than a list of TermQuery’s to avoid looping twice. However, the method does cast each Facet’s Query to a TermQuery so it’s not exactly safe outside of a certain context.

      In addition, the FacetView List returned from the count method is not ideal because as mentioned in the blog, the DomId generation is daft. I’ve found it handier in fact to just return a list of Facet implementations rather than a list of (demonstration purposes only) FacetView objects.

      But, yep, in general, I’ve found no problems at all.

      If you do, I’d be very very interested to hear about them. Thank you for taking an interest in this, I was looking forward to some critique and feedback on this one in particular.

      Remember, full credit to the contributors on this thread: http://sujitpal.blogspot.ie/2007/04/lucene-search-within-search-with.html

  • Hi John, have you made any attempts at getting this to work with hibernate-search 5.1? I know the Lucene API changed from v3 to v4, so I’m not sure if any work has been done to get this working. It also looks like Hardy is working on a new Faceting implementation, however it’s looking like it’s a low priority task and could be some time before being completed. https://hibernate.atlassian.net/browse/HSEARCH-809

    • Hi George

      I haven’t looked into hibernate-search 5.1 at all actually. I take it this technique does not work in 5.1?

      If you do get something working please feel free to post a link or information here.

      Thanks
      John