Improve your search results using Index Versioning and A/B testing


Previously, I talked about how caching searches by a TTL can leave you open to attack.

Let’s examine one of the alternatives to this, which I call “Index Versioning”.

The first time I can remember hearing about this concept was from the book I'm Feeling Lucky: The Confessions of Google Employee Number 59.

Basically, you start building your result set quietly in the background, then, when the time is right, you simply swap to the new result set/index.

This gets rid of the issues of ever having uncached searches that trigger load against your source of truth DB.

Instead, all of those searches, or at least the subset you choose to include, will be run against the source of truth in batch at a pace you can control that doesn’t cause your DB to slow or autoscale up.

Once you are happy with the new result set’s quality, then you simply do an A/B swap from the old data source to the new one. This method has the added benefit of a really easy rollback if, for some reason, you need to pull the new results.

You don’t necessarily need to do an A/B swap on the underlying server infrastructure. You could just swap key prefixes if you were using something like Redis. So the key for the term “cat” for version 1 would look like this v1:cat, but you would populate a v2:cat key for the next version, then either deploy or feature flag out an update to point at the v2 prefix when the time was right.

Another thing I love about this approach is the ability to A/B test. This means you could keep 90% of your searches pointed at the old result set and start testing the new result set against a mere 10%. Gathering data like this is essential to help guide your decision-making process.