Understanding Faceted Searching

Introduction

Tony Russell-Rose describes faceted searching in great detail in these posts:

http://isquared.wordpress.com/2011/04/12/interaction-models-for-faceted-search/

http://java.dzone.com/articles/designing-faceted-search

http://isquared.wordpress.com/2011/04/28/where-am-i-techniques-for-wayfinding-and-navigation-in-faceted-search/

http://isquared.wordpress.com/2011/02/17/reflections-on-faceted-search-and-beyond/

They’re well worth a read before even contemplating developing a faceted search.

There are many types of faceting techniques available. This blog focuses on a facet-search where the facets and results remain on a single page. Facets dynamically reduce and expand the result-set on the same page. A good comparable might be the faceted people search on LinkedIn.com. This one is also a really good example: http://www.bikersbest.dk/en-gb/shop/accessories/fenders-etc

We’re going to describe faceted searching and its oddities through example. The examples proliferate in complexity towards an end result which hopefully will yield an understanding of how faceted searching can (or dare I indulge, should work). Some rules for development/behavior will emerge; these will be emphasized and summarized throughout. At the end there is some rambling and a solution for a hibernate-search implementation but for the first 99% of the blog there is no focus on implementation; the focus is exclusively on the following.

  • What exactly a faceted search really is.
  • The behavior of the result-set in response to applying or removing facet selections.
  • What the numbers beside the facets (facet-counts) really mean (cardinality).
  • The different relationships between facets and result-sets e.g. 1-to-Many V.s. Many-to-Many.

The examples used throughout are simple to understand but they will provoke allot of thought because filtering results by facets is not as straight forward a process as may seem at an initial glance. It can be extremely confusing. Each example is a fully working jsFiddle.

Each example is similar. There are facets in the form of checkbox-label pairs on the left and “squares” on the right. The squares are the result-set. Squares are characterized by their border color which is one of an available set of colors. In subsequent examples, squares are also characterized by stripes, each stripe being a fixed color from the available color list. The facets are essentially a list of colors. Facet filtering is performed on border-color and/or stripe-color.

Example 1

The facets initiate filtering by border color. The relationship between facet and result-set is one-to-many. E.g. clicking on the “red” facet will simply filter the list to those squares with red borders. Each square has only one parent facet.

Example 2

The second example is similar except we introduce numbers beside each facet (facet-counts). There are options for what these numbers mean; this is examined in detail.

Example 3

The third example is again similar except that each square now has one or more stripes. The filtering will be done by stripe-color rather than border-color. Border-colors are ignored. This is a many-to-many relationship between the facets and the result-set items. E.g. a square with a “green” strip and a “red” stripe will be targeted by both the “red” and “green” stripe-facets. This is very different from the previous examples because now, squares can have more than one parent facet. In the previous examples, squares had just one parent facet. Consequently, while filtering in general is similar, the facet-counts are more analogous. The filtering becomes a little less straightforward and requires a little more thought to fully understand.

Example 4

The fourth example combines the previous examples. It’s now possible to filter by border-color and (Boolean AND) stripe-color. This has all kinds of consequences for the facet-counts (the numbers beside the facets). Successful implementation and correct behavior require that some options for facet-counts previously examined must be adhered to while others make no since at all in this context.

Side note

Example four leads into this blog http://outbottle.com/hibernate-search-multivalue-facet-counts/ which will examine the following:

  • Implementations for example 3 and 4 are readily available in hibernate-search but the facet-count values obtained through the hibernate-search faceting API are incorrect for the many-to-many facets (or multi-valued results in general). This is a known situation for those particular requirements.
  • A solution to this problem is provided which allows example 3 and 4 to be 100% implemented using hibernate-search without additional dependencies.
  • A maven project is provided with everything required to demo this solution locally in a matter of minutes.

 

Concrete example 1 – One-to-Many facets

[

http://jsfiddle.net/jralston/z2b47/embedded/result/

]

There are checkboxes on the left and a result-set of squares on the right. Each square is characterized by a border whose color is one of a number of possible colors. Clicking on any of the checkboxes causes the result-set to update. For each facet there are many (zero or more) squares. Facets do not share squares, i.e. each square has only one parent facet.

  • We can reduce the result-set to only those squares that have red borders by clicking on a “red” facet.
  • If the “green” facet is also clicked we now see those squares which have red OR green borders.

 

Facet Search Behavior (1):

  • Clicking on the first facet reduces the result-set.
  • Clicking on subsequent facets increases the result-set.
  • Consequently, having no facets selected is the same as having all facets selected.
  • Each facet (in any facet-group) is applied disjunctively, i.e. it’s a case of “red” OR “green” rather than “red” AND “green”. The latter would always yield zero results for a one-to-many relationship.

 

Concrete example 2 – One-to-Many facets with facet-counts

[

http://jsfiddle.net/jralston/XY4pk/embedded/result/

]

This example is evidently the same as the previous example except that facet-counts are present.

So, let’s examine facet counts independently….

 

Facet Counts

Facet counts can mean one of two things:

  1. A count of matching items in the current result-set

–OR–

  1. A count of matching items that will be added into the result-set if the facet is applied.

(Note these two options; they are referenced throughout the remainder of this blog)

Question

Which is correct?

Answer

Either. Whichever you like and makes since for your application and situation.

But, when there is more than one facet-group only the latter really makes since but we’ll get to that.

Applied Example

  • A count of matching items in the current result-set
    • A count of currently visible squares with red borders with the “red” border-facet applied (all facets are applied when no facet is selected, so all squares are visible).
  • A count of matching items that will be added into the result-set if the facet is applied
    • A count of squares with red borders that will be displayed when the “red” border-facet is applied.

 

Facet Counts – practical concerns

So now we know what facet counts are or can be. Consider the following scenario:

The first facet-count option is applied: i.e. facets show “A count of matching items in the current result-set“. The “red” facet is selected so we see X squares with red borders. The “red” facet-count is therefore also X. No other facets are selected so no other squares are visible. Because of our facet-count implementation (and our 1-to-many situation) all other facets besides “red” show a facet count of zero. Now the scene is set, so here is the dilemma:

Will clicking on the “green” facet add all the squares with green borders to the result-set?

…..

Of course it will….unless of course there were no squares with green borders to begin with. No problem right? The facet count will update to well….. zero! i.e. it won’t change. This is fine, no problems so far but bear with me.

Now consider you want to eliminate facets from the facet-group that will not increase the result-set (to avoid having the user click on facets to avail). How would you do it? Conventionally, a facet-count of zero would eliminate the facet from the facet-group. I.e. you could disable the facet or remove it completely but… hold on a minute…. if when we select one facet the other non-selected facet counts go to zero we have no way of distinguishing the facets that will add to the result-set from those that will not, i.e. those that have children from those that do not. To solve this you would now have to handle both facet-count situations, the former for display purposes and the latter to determine whether or not a facet should be disabled/hidden or not.

Lost? Not to worry, this becomes much more apparent and obvious when dealing with more than one facet-group.

 

Concrete example 3 – Many-to-Many facets

[

http://jsfiddle.net/jralston/uGF5e/embedded/result/

]

As before, there are facets on the left and squares (the result-set) on the right. This time it’s different though. The facets relate to the stripe-colors not the borders. Selecting the “red” facet will yield those squares that have a red stripe irrespective of border color. (Note: ignore the borders completely for this example). As in the previous one-to-many example, for one facet there are many (zero or more) squares however facets can share squares. I.e. a square has only one border but a square can have several stripes. Therefore for many-to-many, any given square has one or more parent facets. This is the difference, the many-to-many.

The facet-counts correspond to the number of squares not the number of stripes.

But, in our example squares will not have duplicate stripes so the facet-count could also be interpreted as the stripe-count. They are essentially the same. I can’t think of a practical example where our metaphorical stripes would be duplicated.

This is an important point to understand. In addition, like the previous example, there are two different options for what the facet-counts can mean. Refer to the chapter on “Facet Counts” for detail of this.

The numbers (facet-counts) can mean:

“the number of matching squares currently being displayed”

–or—

“the number of squares that will be added to the result-set when the facet is applied”.

 

I.e. from the “Facet Count” chapter:

A count of matching items in the current result-set

–or—

A count of matching items that will be added into the result-set if the facet is applied

 

In each case the numbers can be a little confusing. Use the “reduce number of squares” option on the jsFiddle to better see the process. Remember that unlike the previous example, each square can correspond to more than one facet, specifically, the same square having a green stripe and a red stripe will match the “green” facet and also match the “red” facet.

As before, facets within this (one) facet-group should be applied disjunctively (OR’ing). Conjunctive (AND’ing) application can result in zero results and so can be considered an incorrect implementation.

 

One-to-Many V.s. Many-to-Many Comparison

We’ve seen so far that both types are not dissimilar to each other at all. Behavior for the most part is pretty much the same and the facet-count results exhibit similar behavior. The main and obvious differences are that one-to-many filtering occurs directly on the squares, i.e. the square’s border. The many-to-many filtering occurs indirectly on the squares, i.e. it occurs on the stripes. Borders do not share facets and neither do stripes but in effect, squares ultimately share facets because of their stripes.

An alternative way of expressing the same sentiment is to consider that a square in a 1-to-many relationship can only ever be selected by one facet. In the many-to-many relationship, any given square can be selected (targeted) by one or more facets. Selection by one border versus selection by many stripes.

In a 1-to-many relationship, the facet-counts are either zero or X, almost digital in a since. In a many-to-many relationship the facet-counts can be any number between zero and X. Don’t get too attached to this notion though because as we’ll see now, this is only true for one facet-group. In a situation with more than one facet-group, the facet selections in one group will impact the facet-counts in the other groups.

 

Concrete example 4 – One-to-Many AND Many-to-Many facets

[

http://jsfiddle.net/jralston/EeW97/embedded/result/

]

This example combines both types of facet-to-result-set relationships discussed so far, i.e. one-to-many and many-to-many. In this case it is clear to see that to avoid zero-result situations, the chosen facet-count type is critical. Facet-counts should ideally show “A count of matching items that will be added into the result-set if the facet is applied” (see chapter on “Facet Counts”). From this, as before the facet-counts within a facet-group should not change when a facet is applied from within that group. However, the application of a facet in one group can cause the facet-counts in the other facet-groups to change. Give this point some thought while you experiment with the jsFiddle.

Let’s expand on that: Critically, with the facet-count applied correctly (facets show “a count of matching items that will be added into the result-set if the facet is applied“) it is still possible to get a zero-results situation by clicking on a facet with a facet-count of zero. Therefore, ideally, facets with a facet-count of zero should be either disabled or not visible at all.

This point lends further to the realization that facet-counts showing “a count of matching items in the current result-set” are simply not a suitable implementation because it’s simply not possible to determine if clicking on a facet with a facet-count of zero will add to the result-set or create a zero-results situation. This zero-results situation only applies in the combinational facet-group setup because in our previous examples we were not AND’ing with zeros, we were only OR’ing. (We OR within each facet-groups and AND across facet-groups).

There is a note in the jsFiddle regarding this situation zero-results situation not being obvious at first so let’s deal with that: With the reduced-number-of-squares option applied, deselect all facets. Click a stripe facet to reduce the number of squares, now click a border facet and observe that a border-facet facet-count was reduced to zero. This may not always happen so repeat with different facets until this situation is achieved (use the reset button if need be). Now, click on the border facet which had its facet-count reduced to zero. It will increase the result set but hey…., how were you to know that it would because other border-facets with a facet-count of zero will reduce the result-set count to zero! (Experiment with this in the jsFiddle because describing all scenarios surrounding this could get very verbose).

Also worth noting is that the rule previously derived, “no-facets-selected is the same as all-facets-selected” still holds across facet-groups collectively.

 

Facet Search Behavior (2):

Disclaimer: facet-counts in this situation assumes the “A count of matching items that will be added into the result-set if the facet is applied” situation (either behind the scenes or on display). See chapter on “Facet Counts” above.

  • With more than one facet group, applying a facet should never result in the facet-counts within that facet-group changing.
  • With more than one facet group, applying a facet within one group can (and probably will) result in the facet-counts in the other groups changing.
  • While individual facets within each facet group are applied disjunctively (OR’ing); facet-groups are applied conjunctively (AND’ing). I.e. we get:
    • “green” OR “red” borders AND “green” OR “red” stripes
    • I.e. ( (red-border OR green-border) AND (red-stripe OR green-stripe) ).
      Left and right of the center AND will each yield a separate array of squares. The AND’ing occurs on the resultant squares (not stripes or anything). I.e. squares common to both sides of the AND will be retained while squares not occurring on both sides of the AND will be discarded.
  • With more than one facet group, clicking on a facet with a facet-count of zero, if it is the only facet selected in the facet-group, will reduce the result-set to zero items (this is a phenomenon that only emerges when combining facet-groups). This situation should be avoided by disabling or removing facets with a facet-count of zero.

 

Max, Min or Range Faceting

It’s possible and common to filter results based on max and min values such as dates or cost etc. This type of faceting is pretty much just a one-to-many situation except that the one(-to-many) facet is not a checkbox anymore it’s something different. It might be composed of one or two integer, decimal or date fields but it’s still just one facet which will yield anything from zero to many results. One facet – many results.

http://www.carzone.ie/ is a good example of showing facet-counts for a range facet (select cars by year to see it).

 

Facet Search Behavior (Complete)

Disclaimer: facet-counts in this situation assumes the “A count of matching items that will be added into the result-set if the facet is applied” situation (either behind the scenes or on display). See chapter on “Facet Counts” above.

  • Clicking on the first facet reduces the result-set.
  • Clicking on subsequent facets increases the result-set.
  • Consequently, having no facets selected is the same as having all facets selected.
  • Each facet (in any facet-group) is applied disjunctively, i.e. it’s a case of “red” OR “green” rather than “red” AND “green”. The latter would always yield zero results for a one-to-many relationship.
  • With more than one facet group, applying a facet should never result in the facet-counts within that facet-group changing.
  • With more than one facet group, applying a facet within one group can (and probably will) result in the facet-counts in the other groups changing.
  • While individual facets within each facet group are applied disjunctively (OR’ing); facet-groups are applied conjunctively (AND’ing). I.e. we get:
    • “green” OR “red” borders AND “green” OR “red” stripes
    • I.e. ( (red-border OR green-border) AND (red-stripe OR green-stripe) ).
      Left and right of the center AND will each yield a separate array of squares. The AND’ing occurs on the resultant squares (not stripes or anything). I.e. squares common to both sides of the AND will be retained while squares not occurring on both sides of the AND will be discarded.
  • With more than one facet group, clicking on a facet with a facet-count of zero will reduce the result-set to zero items (this is a phenomenon that only emerges when combining facet-groups). This situation should be avoided by disabling or removing facets with a facet-count of zero.

 

A Note on Hibernate-Search and Faceted Searching

Ignore this if you haven’t struggled with the hibernate-search faceting API yet.

  • Hibernate-search and its faceting facility is truly fantastic.
  • It works perfectly for our first two examples above. I.e. with one-to-many relationships between facets and result-sets. The facet-counts returned represent “A count of matching items in the current result-set“. This is perfect for certain types of facet searches but it lacks considerably on two separate potential requirements:
    • The facet-counts required in a one-to-many situation are the “number of items to be added to the result set” as opposed to the “number of items currently in the result set“.
      • Hibernate search facet-counts are that of the latter situation after facets have been applied (obviously the facet-counts apply to the former situation before any facets are applied but that’s not necessarily helpful if (A) facets from another facet-group are used to reduce the result-set, now those original facet-counts are wrong or (B) you need to remove or disable those facets that yield no results (i.e. have zero children). The facet-count of zero from hibernate-search here does not distinguish between those situations where a facet has children or does not have children.
    • The facet-count returned by the hibernate-search API on a multiple-valued field (i.e. many-to-many relationship) is quite simply entirely incorrect. This is not necessarily a bug as such, it’s is simply a case that hibernate-search never intended to return facet-counts for this multiple-value situation. (The filtering on this multiple-value many-to-many situation works perfectly though).
  • So basically, hibernate-search out-of-the box does not support our one-to-many and many-to-many combinational example because:
    • The facet-counts for the square-borders will be undesirable (zeros for all but the facets applied).
      • It is frequently desirable to not display facets with zero counts therefore, clicking a border-facet would reduce all other border-facet counts to zero thus eliminating them from the list and hence we cannot click them to increase the result set. If we leave them there with their facet-counts of zero we cannot determine if applying them will increase the result set or not. We can also get zero-results situation when used in a situation containing more than one facet group.
    • As stated previously, the facet-counts for the many-to-many square-stripe faceting will simple be utterly incorrect (although the result-set will be filtered correctly).

This blog http://outbottle.com/hibernate-search-multivalue-facet-counts/ provides a solution to both of these problems thus allowing the one-to-many many-to-many combinational example to be implemented in hibernate search without additional dependencies.

 

Conclusion

  • It would appear that the squares were in fact rectangles.
  • The different relationships possible between facets and result-set items have been examined.
  • In both situations individually and both situations combined, overall behavior of result-set items in response to the facet de/selection has been examined in detail through example and detailed explanation.
  • Detail surrounding facet-counts have been explored and best practices have been suggested.
  • Max, min and range faceting has been addressed.
  • Four jsFiddle examples to demonstrate behavior have been delivered.
  • For those of you familiar with the hibernate-search faceting API, you will also be aware of hibernate-search’s limitations. The items discussed throughout have been placed in a hibernate-search context. A solution to the hibernate-search dilemma is provided in here: http://outbottle.com/hibernate-search-multivalue-facet-counts/

Because facet-searching is a vast topic which becomes very specific very quickly depending on requirements, I expect this blog to be torn apart by critique and discussion. Both are very welcome so feel perfectly free to comment. It is a complex topic so if you’re having trouble, please pop a question however vague or however difficult it is to articulate the question. Discussion will help understanding.

When I first ventured into this I figured it would be simple enough, we’ve all used faceted searches either on LinkedIn or Amazon or even email (The inbox and outbox etc. links are facets in a since) but once I got in there I realized it’s a monster and amazingly head-wrecking so I hope this blog helps but, be careful it doesn’t also mislead. Different situations require different implementations/behavior. The intent of the blog is to explore faceted-searching, its complexities and oddities, not to define or direct any particular implementation.