Free Text vs. Filtered Search: Why Your Architecture Should Treat Them Differently


Since I am on the topic of “search", I want to do a little clarification. When you do search, there are typically 2 types of inputs. “Free text” and “Pre-Determined Filters”.

Free text is where you let the user enter a string of letters, numbers and symbols that you use to determine results.

On the flip side of that, you also often have a list of predetermined filters like color, size, or price ranges.

The two may seem similar but, if you know what you are doing, you can get some huge performance benefits from architecting your system to handle them individually.

Let's do some quick math.

Say your website has 10 different pre-determined filters people can choose from, and each of those filters has 10 values they can select from. That means (unless you allow them to select multiple values, which really isn’t that big of a deal) you have 10 to the 10th power possible filter combinations that you need to know what results get returned. This is easy to cache, index, pre-populate, etc.

Contrast that with free text searches of only 64 characters. We lowercase it and remove all special characters except for spaces. That is 26 letters + 10 numbers + 1 space character at a total of 37 possible characters. That means there are a total of 37 to the power of 64 different searches that could be run against our system.

Even if we 10x the amount of pre-determined filters or filter options, it is a tiny amount compared to the possible free text inputs that affect your search results.

What does this mean?

Free text search is tougher to run, fast and cost-efficient because it is difficult to anticipate what people will be searching. Notice I said “difficult”, not necessarily impossible, at least for the searches that count.

In my upcoming posts, I will outline some ideas on how to make both of these work separately or in tandem.