How do you determine legitimate bot traffic from malicious bot traffic?

9 April 2026

After analysing literally billions of requests at this point via various tools like Cloud Watch Insights, I have found a convenient way to determine good traffic from bad.

Legitimate crawlers like Meta’s link checkers put the link to their documentation right in the User agent:

meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)

Google, OpenAI, and Amazon all do the same thing:

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36

Heck, Anthropic even gives you an email address you can contact:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)

Is it computationally effective to spam out the same link over and over again in your user agent?

Probably not, but it makes it convenient for someone like me to figure out where the traffic is coming from and better decide if it is legitimate or not.

If you are getting spammed, try a thorough inspection of the User Agent to see if you can get a link straight to the source.

If you have a crawling service, consider putting a link to your docs in your user agent to better communicate to the websites you are crawling a reason not to block you.

I will point out that spoofing user agents is not rocket science, so don’t just trust the user agent. Ideally, these docs would give you a way to verify the user agent is legitimate.

For example, Google gives you a really simple way to verify their bot via DNS.

Question for you:

How are you determining what traffic is malicious or useless bot traffic vs legitimate traffic?

By the way, if you need help fending off malicious bot traffic, that is what we do at Schematical, so feel free to set up a time to chat with us here: https://schematical.com/consulting

How do you determine legitimate bot traffic from malicious bot traffic?

Want more quick tips every weekday?