Can your LLM of choice run a food truck?


Can your LLM of choice run a food truck?

This is probably not the question most people are pondering, but it is an interesting LLM benchmark that is gaining in popularity.

You can even play the simulation yourself to see how you score on the benchmark.

As a business owner, I find this fascinating.

Out of the top 14 models, only 4 of them managed to not go bankrupt, and one of those top 4 managed to still get a -31% ROI.

Now you might be wondering, are those winners possibly just a fluke?

Each model was run 5 times, so either they were really lucky 5 times in a row, or they actually had some skills.

Another interesting part is their “notable findings” section, where you can see Gemini 3 Flash got stuck in an infinite reasoning loop, or that any model that used the loan system went bankrupt.

In the end, I am not sure how useful this will be in real life, but I am amused watching these LLMs compete in simulations like these.

I realize running a business is more complicated than playing a videogame but at the same time, there might be some merit there.

Question for you:

Would the results of this simulation or other similar simulations influence which LLM model you chose to help run your business?