DAILY NEWS

Stay Ahead, Stay Informed – Every Day

Advertisement
The Open-Source Agent War of 2026: Hermes Agent vs AutoGPT vs OpenAI Agents vs CrewAI



Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

The Open-Source Agent War of 2026: Hermes Agent vs AutoGPT vs OpenAI Agents vs CrewAI

The AI Agent Ecosystem Is Getting Crowded Fast

In the last two years, “AI agents” went from experimental repos to full ecosystems.

Now we have:

AutoGPT spawning autonomous loops
CrewAI orchestrating multi-agent teams
OpenAI Agents offering structured tool execution
Hermes Agent pushing persistent memory and system-level architecture

And suddenly, developers are asking a very real question:

Which agent framework should I actually use in production?

Because the reality is:

They are not interchangeable
They are not solving the same problem
And they are not built with the same philosophy

In this post, I break down the landscape in a practical, engineering-focused way.

No hype.

No marketing.

Just architecture, tradeoffs, and real-world fit.

The Four Major Players

Let’s define the contenders clearly.

1. Hermes Agent

Hermes Agent is designed as a persistent, memory-driven agent system.

Core ideas:

long-term memory as a first-class layer
skill-based execution model
multi-agent orchestration
workflow-driven automation
system-like architecture

It behaves less like a chatbot framework and more like an AI operating system layer.

2. AutoGPT

AutoGPT is one of the earliest autonomous agent experiments.

Core ideas:

goal-driven loops
self-prompting behavior
tool usage through iteration
minimal structure, high autonomy

It is best described as:

A recursive agent loop with tool access.

3. CrewAI

CrewAI focuses on structured multi-agent collaboration.

Core ideas:

role-based agents
task delegation
sequential and parallel workflows
human-defined orchestration

It is designed for:

“AI teams working together.”

4. OpenAI Agents

OpenAI Agents focus on production-grade tool execution and orchestration.

Core ideas:

structured tool calling
safety and reliability layers
API-first agent design
enterprise readiness

It is less experimental and more controlled.

Design Philosophy Comparison

Framework
Philosophy

Hermes Agent
AI as a persistent system

AutoGPT
Fully autonomous loop

CrewAI
Collaborative agent teams

OpenAI Agents
Controlled production agents

This philosophical difference explains almost everything else.

Core Feature Comparison

Feature
Hermes Agent
AutoGPT
CrewAI
OpenAI Agents

Open Source
Yes
Yes
Yes
Partial

Self-hosting
Yes
Yes
Yes
Limited

Persistent Memory
Strong
Weak
Medium
Limited

Multi-agent support
Native
Experimental
Core feature
Structured

Tool integration
Modular
Basic
Good
Excellent

Learning capability
Strong (memory-driven)
Low
Medium
Medium

Ease of setup
Medium
Medium
Easy
Easy

Production readiness
Medium
Low–Medium
Medium
High

Community support
Growing
Large
Growing
Large

Extensibility
High
Medium
High
Medium

Developer Experience Comparison

Hermes Agent

Requires architectural thinking
Powerful but opinionated
Best for long-running systems
Feels like building infrastructure

AutoGPT

Easy to experiment with
Hard to control in production
Often unpredictable
Great for prototypes

CrewAI

Very developer-friendly
Clear role definitions
Easy mental model
Good balance of structure and flexibility

OpenAI Agents

Smooth API experience
Strong documentation
Production-focused
Less flexible at system level

Architecture Comparison

Hermes Agent Architecture

flowchart TD

User –> HermesCore

HermesCore –> MemoryLayer
HermesCore –> SkillSystem
HermesCore –> WorkflowEngine
HermesCore –> SubAgents
HermesCore –> ToolLayer

SubAgents –> SharedMemory
SkillSystem –> MemoryLayer
WorkflowEngine –> SubAgents

Enter fullscreen mode

Exit fullscreen mode

Key idea:

Everything revolves around persistent memory + system execution.

AutoGPT Architecture

flowchart TD

Goal –> AgentLoop
AgentLoop –> LLM
LLM –> ToolUse
ToolUse –> Observation
Observation –> AgentLoop

Enter fullscreen mode

Exit fullscreen mode

Key idea:

Infinite loop driven by self-prompting.

CrewAI Architecture

flowchart TD

Task –> ManagerAgent

ManagerAgent –> Worker1
ManagerAgent –> Worker2
ManagerAgent –> Worker3

Worker1 –> Output
Worker2 –> Output
Worker3 –> Output

Enter fullscreen mode

Exit fullscreen mode

Key idea:

Role-based collaboration.

OpenAI Agents Architecture

flowchart TD

UserRequest –> Orchestrator
Orchestrator –> ToolCalls
ToolCalls –> ExecutionLayer
ExecutionLayer –> Response

Enter fullscreen mode

Exit fullscreen mode

Key idea:

Structured tool execution pipeline.

Real-World Use Case Comparison

Scenario 1: Solo Developer

Best choice: CrewAI or Hermes Agent

CrewAI: easier setup, fast results
Hermes: better for long-term project memory

AutoGPT is too unstable for consistent use.

OpenAI Agents may feel too rigid.

Scenario 2: Startup Team

Best choice: Hermes Agent or OpenAI Agents

Hermes: evolving product knowledge + memory
OpenAI Agents: stable production workflows

CrewAI works well for internal coordination.

AutoGPT is not ideal.

Scenario 3: Enterprise

Best choice: OpenAI Agents

Why:

governance
reliability
safety controls
structured execution

Hermes Agent is promising but still maturing here.

Scenario 4: Research Lab

Best choice: Hermes Agent

Because:

persistent memory across experiments
evolving hypotheses tracking
multi-agent research pipelines

CrewAI also works well, but lacks deep memory layer.

Scenario 5: Personal Productivity

Best choice: CrewAI or AutoGPT

CrewAI: structured assistants
AutoGPT: experimental automation

Hermes Agent is powerful but heavier than needed for simple tasks.

Strengths and Weaknesses Breakdown

Hermes Agent

Strengths

Persistent memory
System-level architecture
Multi-agent coordination
Long-term reasoning support

Weaknesses

Complexity
Higher setup cost
Still evolving ecosystem

AutoGPT

Strengths

Simplicity of concept
Fully autonomous loops
Easy experimentation

Weaknesses

Unpredictable behavior
Weak production control
No real memory system

CrewAI

Strengths

Clean multi-agent model
Easy developer experience
Good structure for teams

Weaknesses

Limited long-term memory
Less system-level depth

OpenAI Agents

Strengths

Production-grade stability
Strong tool ecosystem
Excellent documentation

Weaknesses

Less open system design
Limited architectural flexibility
Dependency on platform constraints

When Hermes Agent Is the Wrong Choice

Hermes Agent is NOT ideal when:

you need quick one-off automation
you want zero-setup solutions
you are building simple chatbot flows
you require strict enterprise compliance out of the box
you don’t need long-term memory or state

In short:

If your problem is stateless, Hermes is overkill.

Decision Tree: Which Agent Framework Should You Choose?

Do you need persistent memory across time?
├── Yes → Hermes Agent
└── No → continue

Do you need production-grade tool reliability?
├── Yes → OpenAI Agents
└── No → continue

Do you need multi-agent teamwork structure?
├── Yes → CrewAI
└── No → continue

Do you want experimental autonomous behavior?
├── Yes → AutoGPT
└── No → CrewAI or OpenAI Agents

Enter fullscreen mode

Exit fullscreen mode

Final Thoughts: Where This Is All Heading

We are still in the early phase of agent frameworks.

Right now, each system is optimizing a different axis:

AutoGPT → autonomy
CrewAI → collaboration
OpenAI Agents → reliability
Hermes Agent → persistence + system thinking

But over the next 2–3 years, these boundaries will blur.

We will likely see:

memory becoming standard
multi-agent systems becoming default
workflows becoming composable
agents becoming long-running systems, not sessions

And eventually:

Agent frameworks will stop being “tools for prompts”and become “operating layers for digital workforces.”

In that future, Hermes Agent’s direction — persistent, system-oriented intelligence — may become less of a niche idea and more of a baseline expectation.

The real competition won’t be between frameworks.

It will be between architectures.

And that shift is already starting.



Source link

Event Triggers บน Garudust – DEV Community



Garudust’s core exposes a single basic primitive: agent.run(task). Every entry point — whether it’s a chat message, a cron job, or a webhook call — ends up in the same call. This means that any external system that can send an HTTP POST can be an event trigger for Garudust. This article explains how it currently works, the patterns that work in production, and concrete use cases. How the Webhook Adapter Works When Garudust is set up to use the webhook platform, it launches the Axum HTTP server and registers a POST endpoint at the path you specified. Incoming requests will look like this: { “text”: “A new billing invoice has arrived from Acme Corp for $4,200.”, “callback_url”: “https://your-system.example.com/garudust/reply”, “user_id”: “billing-watcher”, “session_key”: “billing-acme-corp” } Enter fullscreen mode Exit fullscreen mode Field Required Description text ✅ Task prompt that the agent will use to run callback_url ✅ URL where Garudust will POST the response user_id optional Used for role-based access control session_key optional pin conversation history; If not specified, it defaults to webhook:{callback_url} Garudust wraps this information as InboundMessage, sent via GatewayHandler, spawns agent.run() and when the agent finishes working it will POST the response back to callback_url: { “text”: “Invoice from Acme Corp for $4,200 — categorised as SaaS/Infrastructure. Flagged for approval above $3,000 threshold. Draft approval request sent to #finance.” } Enter fullscreen mode Exit fullscreen mode The immediate HTTP response to your POST is 202 Accepted — agent works asynchronously Security Garudust checks the HMAC-SHA256 signature for every incoming request. shared secret in config and sign every outgoing POST with: ───────────────── ────────────────────────── Event source (email, → Webhook adapter calendar, DB, queue) Filter / match logic → Your code (before POST) Task description → agent.run(task) Result handling → handler at your callback_url Enter fullscreen mode Exit fullscreen mode Your system owns the filter — Garudust owns the running agent. Both sides do not need to know each other’s internal structure. Use Cases 1. Billing Email Monitor An email processing service that captures emails from billing senders. When it finds a matching email, it retrieves the subject, sender, and amount and triggers Garudust: { “text”: “New invoice received: Stripe — $1,840 for May 2026. Attach to this month’s expense report and notify the finance channel if it exceeds the $1,500 alert threshold.”, “callback_url”: “https://your-ops.example.com/hooks/garudust”, “session_key”: “finance-inbox” } Enter fullscreen mode Exit fullscreen mode Agent uses its tool to read expense report files, add line items, and post to Slack. The email service just matches the sender and shoots — no need to know anything about expense reports or Slack. 2. GitHub PR Review Gate GitHub Actions workflow Call Garudust after a PR opens in the main branch. The workflow creates the payload from GitHub context: { “text”: “PR #214 opened by @alice: ‘feat: add OAuth2 PKCE flow’. Changed files: src/auth/oauth.rs, src/auth/pkce.rs, tests/auth_integration.rs. Diff summary attached. Review for security issues in the auth flow and post a summary comment.”, “callback_url”: “https://your-ci.example.com/garudust/pr-review”, “session_key”: “pr-214” } Enter fullscreen mode Exit fullscreen mode GitHub webhook Launch workflow → workflow Create task text → Garudust review and session_key tied to the PR number cause the next trigger (new commit, repeat review request) to continue in the same conversation thread. 3. Database Anomaly Alert Monitoring job query the database according to the schedule table and check the aggregate metric when the metric crosses the threshold. Instead of sending a static alert, it fires Garudust instead: { “text”: “Anomaly detected: orders table insert rate dropped 94% in the last 10 minutes (baseline 340/min, current 19/min). Last successful insert: 09:42 UTC. Investigate root cause and summarise for on-call.”, “callback_url”: “https://ops.example.com/garudust/incidents”, “session_key”: “incident-2026-05-23-orders” } Enter fullscreen mode Exit fullscreen mode Agent can use terminal or database tool to run additional queries, check deploy Latest and structured incident summary — monitoring job only checks for threshold breaches 4. Calendar External-Attendee Watch Integration layer poll Google Calendar (or receive push notification) and fire Garudust when an event is created with an attendee whose domain doesn’t match your organization: { “text”: “New calendar event: ‘Q3 partnership discussion’ on 2026-06-04 14:00 UTC. External attendees: jane@partner.com, bob@partner.com. Prepare a one-page briefing on Partner Corp using the CRM notes and recent email thread.”, “callback_url”: “https://your-system.example.com/garudust/calendar”, “session_key”: “meeting-prep-2026-06-04” } Enter fullscreen mode Exit fullscreen mode Calendar integration own filter logic “external attendee” – Garudust Take ownership of the briefing 5. Queue Worker Trigger Background worker pulls jobs from the task queue (SQS, Redis, RabbitMQ) and sends each piece to Garudust for work. Suitable for workloads that vary and require the agent to handle each piece in its own way: { “text”: “Customer support ticket #8821 (priority: high): User reports that export to CSV silently truncates rows above 10,000. Reproduce the scenario, identify the code path responsible, and draft a fix description for the engineering team.”, “callback_url”: “https://support.example.com/garudust/tickets”, “session_key”: “ticket-8821” } Enter fullscreen mode Exit fullscreen mode Queue worker dequeue, format task text, fire webhook Multiple tickets can be run as concurrent agent sessions simultaneously Session Keys and session_key are what make event triggers useful beyond traditional tasks. one-shot When you pin a key, all webhook calls that use the same key share a conversation history. This means: PR review trigger on commit 1 and re-review trigger on commit 2 are the same conversation — the agent remembers what was said earlier. Incident trigger and “How are you?” that the on-call engineer asks later use the same context. Billing session Accumulate invoices for a whole month from multiple triggers before creating a monthly summary. If you want completely separate sessions (each event is independent), you don’t need to specify a session_key — Garudust It uses callback_url as the key instead, giving a new context to the callback target that is unique. What this pattern doesn’t cover with the Webhook adapter is push targets — the external system must initiate the connection. If you want Garudust to pull data from the source itself (check inbox, poll API, watch files) without needing a scheduler, you have to use a cron job that polls or wait for a primitive watch/filter that doesn’t currently exist. For use cases that are truly push-based (GitHub webhook, queue worker, calendar push notification, email routing service), the current architecture supports it all and the division of duties is already clear.



Source link

What drawing lines on a football pitch taught me about the future of human-AI collaboration



Foreword: Before we begin, I have to say that you gotta trust me. I’m a normal sports fan and don’t overanalyze the heck out of a goal like this. Honestly, I’m not even sure if I should be writing this 🙂

Trust me, my first reaction was: WOW! What a goal!
When I looked at the 12 second mark in the above clip, I understood how this goal was scored (like every other armchair football analyst).
For those not football familiar, here’s what happens: Marquinhos blocks the initial shot attempt. Then, if you look closer 0:12 onwards you see Marquinhos’ initial momentum makes him rock back by a few inches giving Luis Díaz an opening to cut right and create a new shooting opportunity.
This Luis Diaz goal vs PSG is not talked about enough. That through pass from Harry Kane from deep, that first touch by Luis Diaz into space, the fake shot to fool Marquinhos and finally slotting the ball pass the goalkeeper pic.twitter.com/tS32lyUGEl— R (@RodneyFCB) May 6, 2026

TLDR;#
Here is the final result of how the goal was scored. Marquinhos rocks back ~0.6 ft, Luis Díaz cuts right ~8.9 ft resulting in the shot window opening up from ~5.5ft/8.6° to ~7.5ft/22.0°.

The part I keep thinking about is this: if I had let Codex fully guide the project, I would have ended up with a confident but wrong result. My football intuition caught things that Codex did not know to care about. The full story of how we got there is below.
The Long Version#
For everyone else continuing on this ride, let’s talk about the rest of the story.
Can Codex solve this?#

I gave Codex the screenshots of the play and described what I wanted. It thought other angles might help, so I pulled YouTube screenshots from a few alternate camera angles seen below.

After a bit of back and forth, I decided that the following two frames best represented the change in distances we wanted to calculate (player movement, goal window) so I asked Codex to use these as the primary angle and other angles for validation.

Codex did what AI tools are very good at: it turned a fuzzy idea into a working direction. It suggested ways to measure distances, picked a Python image-processing stack, started detecting points, and generated annotated outputs. The first version looked convincing which I later found out was a problem.
The problem Codex created#
I do not know much about computer vision, but I do know football. In the first attempt, Codex claimed that the “goal window became 2.3x wider” which did not pass the smell test for me. I manually compared the open goal window to the right of the goalkeeper in pixels and it definitely didn’t seem like the new window was 2.3x wider.
Out of curiosity, I asked Codex to “show its work” by “visualizing” how it came up with the distance calculations for my review. As I looked at the following annotated images, I quickly found out how off the calculations were. Codex tried to find goal posts, player positions, field lines, and distances but each measurement looked erroneous.

Some points were not exactly on the bottom of the goal posts. Some foot markers were close, but not close enough. The shot cone angle points were not in the expected places. This was a problem because if the goal post marker was five pixels off, or the player foot marker was placed on a shadow instead of the boot, the final number inherited that error.

What I discovered was that agents can spend a lot of time “working” but it doesn’t reduce the hallucination tendencies.
Learning 1: Human domain expertise still matters#
I asked Codex whether using the known football dimensions would simplify the measurement problem. Specifically, a regulation goal is 24 feet wide, 8 feet high. Similarly, the six-yard box, penalty area and penalty mark also have known dimensions. If the broadcast frame showed the goal mouth clearly, maybe we could use the known goal width to calibrate the shot window. Codex agreed that these would be “very useful” but I was left wondering why it had not suggested these originally.
Learning: The first insight came from me and not AI. I used my domain expertise to think through the known knowns to calculate the unknowns.
Learning 2: Human-in-the-loop still matters#
After repeatedly prompting Codex again and again various versions of “Figure out better ways to detect objects more accurately” I had an epiphany. Why was I building a fully automated pipeline? Why couldn’t I provide “human judgement” to make the detection better?
The automated pipeline kept placing the left goal post marker a few pixels inside the post rather than at the base. It was a small error that compounded through every downstream calculation. I figured if I could just click where I knew the base was, we could skip the guesswork entirely. So, I asked Codex if I could mark the important points and objects in the images used for calculations.
Codex built a manual distance workbench using HTML/CSS where I could load the frame, zoom in, place small points, drag endpoints, mark things as approved, and save the review data. Instead of treating automatic detections as truth, the system started treating my marks as the source of truth.
That helped as it turned the problem from “guess real-world distances from pixels” into something more grounded:

Mark the inside width of the goal.
Mark the open part of the goal.
Compare the two.
Convert the ratio into feet.

Here’s what that looked like in the workbench.

Learning: Codex will happily write more code before it suggests letting you help. You have to ask for the human-in-the-loop pipeline; it will not offer it.
Learning 3: Agents can name what you don’t know to ask#
Calculating the player movement was harder. For movement, we needed to compare player foot positions across two different broadcast frames where the camera angle changes slightly. The players move across the pitch which is a flat plane, but the image is perspective-distorted. You cannot just subtract pixel coordinates and call it distance.
I described the perspective problem to Codex and it came back with “homography” which is a word I had never heard of before. In simple terms, homography lets you map points from one view of a flat surface to another view of that same surface. In this case, the flat surface is the football pitch. We could apply this concept to this use-case because in our case the pitch is the same planar surface between these two frames and the players feet represent same points on that plane.
This is where the collaboration got interesting. I steered the measurement process using domain intuition. I knew which points mattered as well as when a foot marker felt wrong. I knew that same-foot movement was a better metric than mixing left and right foot positions. Codex did the engineering work. It wrote the workbench server, the UI, the JSON review files, the geometry helpers, the tests, and the final HTML generator. As I saw the benefits of our initial collaboration, I kept getting ambitious and asking more and more of Codex. We kept building: a goal measurement tool, a movement tracker, a ghost alignment layer, and a final QA tool for placing labels directly on the shareable image.
Here’s what it ended up looking like and it was glorious!

Learning: Don’t be afraid to be ambitious! What AI can accomplish might just surprise you.
Learning 4: Your context unlocks better tools than the agent’s defaults#
The final visual still needed to be understandable without reading a measurement report. I had another idea: what if we placed the players from the previous frame into the later frame at low opacity, like ghosts? That would show how far they had moved without relying only on arrows.
Codex first tried to cut out the players using Python. The result was not great. Here are a couple of early examples:

In the first example, the cutouts are basically ovals around players in the previous frame which also includes the grass from the rest of the frame that frankly just looks like eggs placed in the middle of a frame.
In the second example, the cutouts were too wide, included too much grass, and did not follow the body outlines closely enough.

Then I remembered we were on a Mac. Apple has Vision APIs for foreground extraction and segmentation. Out of curiosity, I asked Codex if we could use Apple Neural Engine (ANE) for this problem. Although the solution Codex came up with didn’t leverage ANE, Codex’s suggested Apple Vision APIs gave me a much better cutout as seen below.

The cutout script used Apple Vision’s VNGenerateForegroundInstanceMaskRequest to generate a foreground mask. The resulting player cutouts were much tighter around the bodies than the earlier Python attempt. I do not know whether this is the same underlying API Apple uses for iPhone camera app portrait mode but visually it felt like the right class of tool.
I then used the ghost alignment tool in the workbench to place the previous-frame cutout into the later frame.

When the ghost finally sat inside the later frame at the right scale, in the right position, and pointing the right direction, it was a pure Michael Scott happy moment for me. That was the instant the whole project felt real.

Learning: Codex reached for Python tools because those are its safe defaults. Only because I asked about MacOS APIs did it try the Apple Vision APIs, which produced much cleaner cutouts. The agent will not ask what hardware or ecosystem you have. You have to offer that context yourself.
Conclusion#
Yes, professional tracking systems exist. This exercise was never about building a better one. It was about seeing how far a curious non-engineer could get with just an agent and some domain knowledge.
My biggest learning from this exercise is that when I use AI in a domain I know well, I can push back. I can tell when something feels wrong and I can add missing context. My football intuition caught things that Codex did not know to care about.
But when I use AI in a domain I do not know, what am I failing to ask for? What assumptions am I accepting because the output looks polished/correct? What useful result am I leaving on the table because my prompt does not contain the domain nuance?
The most important lesson I learned was not that AI should replace human judgement but that AI gets much more useful when human judgement has a place to go.





Source link