DAILY NEWS

Stay Ahead, Stay Informed – Every Day

Advertisement
Your AI’s tests pass. That doesn’t mean the code works.



You ask a coding agent to fix a bug. It writes the code, writes the tests, CI goes green, you merge. The bug’s still there.

The agent’s job was to turn the check green. The honest way to do that is to fix the code. The lazy way is to write a test that passes no matter what the code does. CI can’t tell those two apart. A green check means the tests passed, not that the code is right.

It’s easy to miss in review, because the test sits right there looking like proof:

test(“parses the config”, () => {
const result = parseConfig(rawInput);
expect(result).toBeDefined();
});

Enter fullscreen mode

Exit fullscreen mode

That passes whether parseConfig works perfectly or returns nothing useful on every input. It checks nothing. Adding more tests like it just raises your coverage number, not your odds of catching a bad change.

So I built ClaimCheck (https://github.com/moonrunnerkc/claimcheck). Instead of trusting the agent’s tests, it tries to break them. If a test still passes after the supposedly fixed code is broken on purpose, the test was never really checking the fix, and it gets blocked. Same answer every time, no AI making the call. So far it’s caught every cheat in a set of twelve hand-built cases. Twelve is small, and there’s no public release yet, so treat that as a direction, not a finished result.

Some cheats slip through anyway. If the agent writes a real, solid test that locks in the wrong answer, every check passes. The only way to know the answer’s wrong is to already know the right one, and nothing in the pull request can tell you that except the agent you’re trying to catch. The one thing that helps is a clue from outside it, like a human-written bug report you can run the fix against.

There’s a second, wider tool, Swarm Orchestrator (https://github.com/moonrunnerkc/swarm-orchestrator). It flags suspicious changes and keeps a tamper-evident record for audits. The record-keeping is the solid part. The catching is not: on real pull requests its accuracy is still low, and that’s the half I’m hardening now.

The next step is comparing the old code’s behavior to the new directly. The catch is that a wrong change and a harmless cleanup can look the same from the outside, and a tool that blocks good code is worse than one that lets a bad change through. That’s the part I’m still working out.



Source link

Event Triggers บน Garudust – DEV Community



Garudust’s core exposes a single basic primitive: agent.run(task). Every entry point — whether it’s a chat message, a cron job, or a webhook call — ends up in the same call. This means that any external system that can send an HTTP POST can be an event trigger for Garudust. This article explains how it currently works, the patterns that work in production, and concrete use cases. How the Webhook Adapter Works When Garudust is set up to use the webhook platform, it launches the Axum HTTP server and registers a POST endpoint at the path you specified. Incoming requests will look like this: { “text”: “A new billing invoice has arrived from Acme Corp for $4,200.”, “callback_url”: “https://your-system.example.com/garudust/reply”, “user_id”: “billing-watcher”, “session_key”: “billing-acme-corp” } Enter fullscreen mode Exit fullscreen mode Field Required Description text ✅ Task prompt that the agent will use to run callback_url ✅ URL where Garudust will POST the response user_id optional Used for role-based access control session_key optional pin conversation history; If not specified, it defaults to webhook:{callback_url} Garudust wraps this information as InboundMessage, sent via GatewayHandler, spawns agent.run() and when the agent finishes working it will POST the response back to callback_url: { “text”: “Invoice from Acme Corp for $4,200 — categorised as SaaS/Infrastructure. Flagged for approval above $3,000 threshold. Draft approval request sent to #finance.” } Enter fullscreen mode Exit fullscreen mode The immediate HTTP response to your POST is 202 Accepted — agent works asynchronously Security Garudust checks the HMAC-SHA256 signature for every incoming request. shared secret in config and sign every outgoing POST with: ───────────────── ────────────────────────── Event source (email, → Webhook adapter calendar, DB, queue) Filter / match logic → Your code (before POST) Task description → agent.run(task) Result handling → handler at your callback_url Enter fullscreen mode Exit fullscreen mode Your system owns the filter — Garudust owns the running agent. Both sides do not need to know each other’s internal structure. Use Cases 1. Billing Email Monitor An email processing service that captures emails from billing senders. When it finds a matching email, it retrieves the subject, sender, and amount and triggers Garudust: { “text”: “New invoice received: Stripe — $1,840 for May 2026. Attach to this month’s expense report and notify the finance channel if it exceeds the $1,500 alert threshold.”, “callback_url”: “https://your-ops.example.com/hooks/garudust”, “session_key”: “finance-inbox” } Enter fullscreen mode Exit fullscreen mode Agent uses its tool to read expense report files, add line items, and post to Slack. The email service just matches the sender and shoots — no need to know anything about expense reports or Slack. 2. GitHub PR Review Gate GitHub Actions workflow Call Garudust after a PR opens in the main branch. The workflow creates the payload from GitHub context: { “text”: “PR #214 opened by @alice: ‘feat: add OAuth2 PKCE flow’. Changed files: src/auth/oauth.rs, src/auth/pkce.rs, tests/auth_integration.rs. Diff summary attached. Review for security issues in the auth flow and post a summary comment.”, “callback_url”: “https://your-ci.example.com/garudust/pr-review”, “session_key”: “pr-214” } Enter fullscreen mode Exit fullscreen mode GitHub webhook Launch workflow → workflow Create task text → Garudust review and session_key tied to the PR number cause the next trigger (new commit, repeat review request) to continue in the same conversation thread. 3. Database Anomaly Alert Monitoring job query the database according to the schedule table and check the aggregate metric when the metric crosses the threshold. Instead of sending a static alert, it fires Garudust instead: { “text”: “Anomaly detected: orders table insert rate dropped 94% in the last 10 minutes (baseline 340/min, current 19/min). Last successful insert: 09:42 UTC. Investigate root cause and summarise for on-call.”, “callback_url”: “https://ops.example.com/garudust/incidents”, “session_key”: “incident-2026-05-23-orders” } Enter fullscreen mode Exit fullscreen mode Agent can use terminal or database tool to run additional queries, check deploy Latest and structured incident summary — monitoring job only checks for threshold breaches 4. Calendar External-Attendee Watch Integration layer poll Google Calendar (or receive push notification) and fire Garudust when an event is created with an attendee whose domain doesn’t match your organization: { “text”: “New calendar event: ‘Q3 partnership discussion’ on 2026-06-04 14:00 UTC. External attendees: jane@partner.com, bob@partner.com. Prepare a one-page briefing on Partner Corp using the CRM notes and recent email thread.”, “callback_url”: “https://your-system.example.com/garudust/calendar”, “session_key”: “meeting-prep-2026-06-04” } Enter fullscreen mode Exit fullscreen mode Calendar integration own filter logic “external attendee” – Garudust Take ownership of the briefing 5. Queue Worker Trigger Background worker pulls jobs from the task queue (SQS, Redis, RabbitMQ) and sends each piece to Garudust for work. Suitable for workloads that vary and require the agent to handle each piece in its own way: { “text”: “Customer support ticket #8821 (priority: high): User reports that export to CSV silently truncates rows above 10,000. Reproduce the scenario, identify the code path responsible, and draft a fix description for the engineering team.”, “callback_url”: “https://support.example.com/garudust/tickets”, “session_key”: “ticket-8821” } Enter fullscreen mode Exit fullscreen mode Queue worker dequeue, format task text, fire webhook Multiple tickets can be run as concurrent agent sessions simultaneously Session Keys and session_key are what make event triggers useful beyond traditional tasks. one-shot When you pin a key, all webhook calls that use the same key share a conversation history. This means: PR review trigger on commit 1 and re-review trigger on commit 2 are the same conversation — the agent remembers what was said earlier. Incident trigger and “How are you?” that the on-call engineer asks later use the same context. Billing session Accumulate invoices for a whole month from multiple triggers before creating a monthly summary. If you want completely separate sessions (each event is independent), you don’t need to specify a session_key — Garudust It uses callback_url as the key instead, giving a new context to the callback target that is unique. What this pattern doesn’t cover with the Webhook adapter is push targets — the external system must initiate the connection. If you want Garudust to pull data from the source itself (check inbox, poll API, watch files) without needing a scheduler, you have to use a cron job that polls or wait for a primitive watch/filter that doesn’t currently exist. For use cases that are truly push-based (GitHub webhook, queue worker, calendar push notification, email routing service), the current architecture supports it all and the division of duties is already clear.



Source link

How I built a 6-node 12-GPU on-prem AI cluster running 1000+ agents


TL;DR — 6 machines, 12 GPUs, 1,000+ concurrent agents, P95 18 ms, voice

Why I built this

I’m Franck. Toulouse, France. Over 3 years I paid roughly €280,000 to Azure + OpenAI before doing the math properly:

Latency: 1.2s voice round-trip — incompatible with the voice-first UX I wanted.

Compliance: customer data on US servers. Not GDPR-native, just GDPR-compliant-on-paper.

Quotas: random throttling at the worst times.

Lock-in: Azure outage = my product offline.

I decided to rebuild everything on-prem. This is the result.

The cluster

6 machines, 3 tiers, 12 GPUs total,

Tier 1 — GPU compute (heavy inference)

M1 “La Créatrice” — Ryzen 5700X3D, 6× RTX 3080+, 46 GB RAM. Primary LLM node, runs qwen3.5-9b, qwen3.5-35b-a3b, deepseek-r1, the Claude 4.5/4.6 distillations, and the Whisper CUDA pipeline.

M2 “Le Forge” — multi-GPU NVIDIA, secondary inference, failover from M1 in 1.3s.

Tier 2 — CPU/RAM (orchestration, memory)

M3 “Le Cerveau” — high-RAM CPU node. PostgreSQL + Redis + Pinecone. Runs the orchestrator, the 3-quorum consensus engine (M1+M2+M3), and the analytics/monitoring agents.

Tier 3 — production / work

M4 “Bridge Windows” — Windows 11, 2 GPUs, trading bot live.

M5 “Interface Relay” — Linux i5-6500, 15 GB RAM. Dev interface, 15+ MCP servers, Claude Code.

M6 “Mobile Ops” — laptop. SSH + VPN. Client demos and on-site ops.

The 9 layers I added on top of Ubuntu

L9 — Vocal / conversational (Whisper CUDA STT, Piper TTS, wake word, 50+ languages)
L8 — Multi-agent orchestration (MCP-native, consensus engine)
L7 — Trading consensus engine (multi-model voting GPT/Gemini/Claude)
L6 — Browser + web automation (Chrome DevTools Protocol)
L5 — MCP tool registry (88+ handlers)
L4 — GPU cluster management (Docker Swarm, failover
L3 — Domino pipeline engine (835 chains)
L2 — systemd service layer (98 units)
L1 — Linux boot integration (GRUB hooks, ZRAM, kernel params)

Real numbers

Metric
Value

Concurrent agents
1,000+

P95 latency (cluster internal)
18 ms

Voice pipeline end-to-end

Aggregate throughput
67 tok/s

Python lines
280,741

Public repos
44 (all MIT)

Cost comparison (1M tokens/day, team of 10)

Provider
€/month
P95
Concurrent agents
Data residency

Azure OpenAI
1,500
800ms-3s
~20
US

AWS Bedrock
1,800
700ms-2.5s
~15
US

Mistral Cloud
800
400-800ms
~30
EU

JARVIS OS
0
18 ms
1,000+
Air-gapped

For a 50K€ turn-key deployment, break-even vs Azure is 7 months, and the marginal cost is zero after that.

What I sell now

JARVIS OS turn-key — 20K€ to 250K€ depending on scope.

62 PDF trainings — from €39, 293h of content based on production code (+48 private).

IA infra audit — €1,500, report in 48h.

1-to-1 mentorship — €250/h.

Fractional CTO — TJM €1,000-1,150 / CDI €85-95K. Toulouse / remote.

Honest weaknesses

Consensus voting is empirical. No formal verification of the agreement function.

Tier-2 failure (M3 down) is the weakest scenario — orchestrator dies, cluster keeps inferring but loses persistent memory.

MCP protocol bet — if Anthropic deprecates parts of MCP, I have 88 handlers to refactor.

kWh-per-token efficiency — cloud probably wins on aggregate watts/token, on-prem wins on marginal cost.

Links



Source link