Update on Matt’s Vibe Coding journey

Are you sick of hearing how good AI Agents are at programming?
I know I am, but it's my job to evaluate and document my various experiments with cutting-edge technology.
Recently, I tried using Hermes Agent.
I ran Claud Opus 4.6, which was the latest model that was available on AWS Bedrock. As I look now, it looks like 4.7 is out.
The task I gave it was simple: just update the dependencies for Schematical.com.
The result…. The local version of Schematical I gave it access to is pretty much bricked.
Despite the fact that it had whatever Hermes Agent Browser Tools it ships with, Playwright MCP, and Apify, it seemed to get stuck in a doom loop of curling the local server.
Of course, there was the endless gaslighting where it told me things were working, and it had tested them.
I am not sure if it knows I have a complete list of every edit and tool call it had done, but it clearly had NOT tested anything.
When I attempted to correct it, the Hermes Agent tried to make a memory I wish I had saved for the purposes of this article, but it immediately reverted because it was garbage.
Basically, instead of saving my requirements to test its code and verify things before presenting them as fact, it created a rule to avoid frustrating me, two very different things.
I guess they never solved that sycophancy issue.
Another issue I had is that during a single session, it forgot both the directory we were working in and the URL we were testing.
Now Hermes automatically started to compress our sessions after a while. Originally, I thought Hermes was sending the conversation off to the model of my choice(CLaud Opus 4.6) and using whatever summary the model returned as the basis for future conversation.
In that case, this would be the fault of the model for not including the most basic of information required to do its job.
Upon further investigation of Herems Agent compression, it looks like there might be multiple phases, of which Phase 1 prunes old tool results with no LLM calls.
So it's possible Herems Agent was the one that shot us in the foot on this one. Either way, it was indeed "frustrating".
From there, it had me switch from turbo pack back to web pack and back again about 5 times, each time more sure it would fix the problem. Of course, it was wrong every time.
With all this said, I have some questions:
Am I dumb, or are the people hyping up these AI agents that can program better than John Carmack the same guys that were trying to sell us JPEGs for millions of dollars via NFTs a few years ago?
I am trying so hard not to dismiss this technology outright, but if it can not do something as simple as updating NPM packages without completely bricking what it's working on, then I really have to wonder.
Have any of you had any luck using this for more than vibe coding weekend projects? Like really maintaining codebases of significant sizes? For durations longer than a caffeine-fueled weekend hackathon?
Let me know if you have any tips or tricks, because so far this experiment has cost me a hundred or so bucks worth of tokens and has wasted a lot more than a hundred bucks worth of my time.