Server Fire Fighting Pro Tip 2

Ever found yourself fighting a server outage that is costing your company upward of $100,000 an hour? I have.
With my consulting practice it’s not uncommon for me to get a call from a new customer with existing infrastructure that is currently on fire and I need to figure out the problem quickly.
How do you quickly, under pressure, get a solid picture of what the system looks like as a whole? Where do you even start?
Unless given explicit direction as to where in that giant haystack the needle of a problem is I oftentimes work my way from the outside inwards.This means starting at with the DNS hosted on route53. If a client came to me and said “here is the URL, it's broken, good luck” (Which does happen) I would start by figuring out what hardware that URL is pointing to. Is it an IP address? Is it an Alias? Is it a broken DNS record?
From there I would start to trace the request to the first resource. Is it pointed at an ALB? An API Gateway? A Cloud Front distribution? Heaven forbid directly at an EC2 instance.
Don’t assume anything when fighting these fires. Misconfigured DNS is a thing so double check your DNS settings and make sure you know what resources it is pointed at.
Heck even make sure your registered Domain is actually pointed at the hosted zone you think it is.