Redis Eviction Nightmare

What do you do when your ElastiCache cluster’s evictions start spiking and it starts deleting key/value pairs?
Recently I had a client see a massive spike in evictions wreaking havoc on their systems. Without informing me they had to go up 2 sizes for their cluster which resulted in almost a $1,000 increase in their AWS bill.
I am going to break this into 2 parts. The first will explain what the issue is and the second will give you a cool tool that will help you pin point the cause of the issue.
Evictions:
What are evictions? Evictions happen when Redis runs out of memory. You can define an Eviction Policy but by default AWS ElastiCache will use volatile-lru
which evicts the least recently used (LRU) keys from those that have a TTL set.
Contrast this with out of the box Redis(not running on ElastiCache) which defaults to noeviction
and just won’t let you add anything else like most traditional DBs.
Diagnosing Eviction Causes:
The obvious question is “What is being added that is causing the database to fill up and evict other existing key value pairs?”. If your systems are anything like the ones I work on you potentially have billions of key value pairs so its not as simple as just running keys *
(That is horrible for performance so don’t do that anyway).
How do you do this? To my knowledge by default there are no logs or metrics that tell you what the distribution of keys is in a cluster (I say that and then tomorrow AWS will release that feature, they always do that as soon as I write or record a video saying it doesn’t exist).
Questions For You: In the next post I will give you a cool tool to help you diagnose issues but before I give you that how would you debug this?