How to manage agentic/bot traffic in the new agentic world: Part 1
Expanding on my previous post posting the question “should I allow bots to crawl my product/service?”, I want to look at some ways you can manage that traffic to ensure it is not being abused.
There are typically 2 main concerns around bot traffic:
Are they stealing my data? If you have a lot of custom data people find useful then yes it is quite possible.
Will it slow me down or cost me more money? If someone spams you with one million more requests than you are used to in a day then you are likely going to end up spending more money to keep your site up or just go down. Either one is not ideal.
What can you do about it?
DDoS attacks and mass web scraping happen 24/7 on the internet. Bot web scraping patterns differ from the profession grade fleet of bots to consumer grade bots such as someone’s personal agent that just got stuck in a loop while crawling your website.
The real pros can get around some of what I am about to recommend but at that point you have a serious target on your back.
Getting around the rate limiting I am suggesting isn’t cheap so your average consumer isn’t going to do it. Even some of the mid level pros will be hesitant to spend the coin trying to bypass it.
Guide Traffic:
The first thing you can do is to try to guide legitimate bot traffic to pages that require less bandwidth. Last I checked Amazon’s home page is 260 KB and it includes a ton of HTML in there that the bots don’t need or want to process.
Use protocols like llms.txt to gently guide the bots to the markdown versions of these webpages that are 90% smaller payloads to send across the internet.
That is without even counting the binary Media content like images coming from the CDN if the bot is running in a browser and bothers to load up the whole page. Images that, unless the user is paying a bunch for an image to text model to parse, will be completely ignored.
Next Up:
This post got crazy long so I am breaking it up into a few different posts. In the meantime let me know how you are guiding bot traffic, legitimate or otherwise.