RAG Agents in Prod: 10 Lessons We Learned

7 July 2025

As most of you know, I have been laid low with bicep surgery recently. The good news is that gave me a lot of time to consume a lot of information on how to host AI/ML workloads in the cloud. I wanted to showcase a video for you that I found extremely interesting about running RAG agents and prod.

Now if you're not familiar with RAG, I would happily do a deeper dive on that. If you're interested, just shoot me a message, comment, etc., but it's basically a way to feed LLMs new up-to-the-date information so they can process it for you.

See LLM agents or LLM models have a training cutoff, so you don't want a model that was cut off in January 2025 to miss out on all the new information there, and that is what RAG does in a nut shell.

So without further ado, please enjoy this video called RAG agents in prod: 10 lessons we learned from the AI Engineer channel. I found it extremely informative, and I think it'll help you figure out how to host your LLM agent at scale in production. It doesn't get down in the nitty gritty of the details but it gives you some really good high-level principles to keep in mind while you're building out your production infrastructure.

Do you want to learn more about how to host your AI/ML models at scale in production and on AWS? Feel free to reach out to me or sign up for my upcoming tech talk on how to use a model context protocol to enhance your AI/ML agents and connect them to the internet to do actual actions on your behalf.

RAG Agents in Prod: 10 Lessons We Learned

Want more quick tips every weekday?