software – Page 25

TECH & AI

Your AI database agent does not know what revenue means

jackminion May 14, 2026 0

The fastest way to get a wrong answer from an AI database agent is to ask a simple business question.

What was revenue last month?

That sounds easy.

The database has invoices, subscriptions, payments, refunds, credits, discounts, taxes, trials, failed charges, and test accounts.

The model sees tables.

Your business sees definitions.

If those definitions are not part of the system, the model has to guess.

Valid SQL can still be wrong

A table called payments may include failed attempts.

subscriptions may include trials.

amount may be gross, net, pre-tax, post-tax, or stored in cents.

created_at may mean invoice creation, payment capture, or customer signup.

An AI agent can write syntactically valid SQL against all of that and still answer the wrong question.

This is why natural-language SQL needs metric context, not just schema context.

Approved views beat clever prompts

A prompt can tell the model how to calculate MRR.

An approved view makes the definition executable.

Instead of exposing raw invoice and payment tables, expose something like:

reporting.monthly_recurring_revenue

Enter fullscreen mode

Exit fullscreen mode

with reviewed columns, tenant scope, time grain, currency assumptions, and test-account filtering already handled.

The model still helps users ask flexible questions.

But the business definition lives in infrastructure, not in a fragile instruction.

What should travel with the tool

For AI reporting, the MCP tool should carry context such as:

metric description
allowed dimensions
time zone and grain
exclusions
freshness timestamp
exact vs estimated status
scope and tenant boundaries
warnings the final answer must preserve

Otherwise the model may produce a confident answer while hiding the caveats that matter.

Longer version: Metric definitions for AI database agents

The practical rule:

If a metric is important enough for a leadership meeting, it is important enough to define before an agent calculates it.

Source link

TECH & AI

Six Claude Code Skills That Close the AI Agent Feedback Loop

jackminion May 14, 2026 0

AI agents write code that compiles, runs locally, and breaks the first time it touches your Kubernetes cluster. The cluster is full of state the model never sees: the env vars on the running pod, the schema in your real Postgres, the headers your upstream auth-service sends, the topics your consumer subscribes to. Without that context, the code an agent writes for your live infrastructure is informed guessing, whether you’re shipping a new feature or fixing a regression.

mirrord closes that gap. It runs a local process as if it were a real pod inside your cluster: real env vars, real DNS, real network, optionally real inbound traffic. A real example: Daylight Security pairs Cursor with mirrord for daily development. Their team cut their typical edit-test cycle from 5–8 minutes to about 5 seconds. The reason isn’t faster CPUs; it’s that the agent now operates against the real cluster the way a senior engineer would, instead of guessing from logs.

We recently shipped six Agent Skills that teach AI agents how and when to use mirrord. The whole bundle installs in one command.

# Claude Code
/plugin marketplace add metalbear-co/skills

Enter fullscreen mode

Exit fullscreen mode

# Any Agent Skills consumer
npx skills add metalbear-co/skills

Enter fullscreen mode

Exit fullscreen mode

Here’s what each skill does, with a concrete prompt that triggers it.

1. mirrord-quickstart

Zero-to-first-session for engineers (and agents) who have never used mirrord. Detects your OS, walks through CLI install or VS Code / IntelliJ setup, finds your target pod in the cluster, runs your first session. Your local process can now reach every service, database, and queue in the cluster.

Try: “I’m new to mirrord, help me run my Node app against my staging cluster.”

The agent installs mirrord, lists targets in your namespace, picks a likely match, and runs mirrord exec –target … — node server.js. No copy-paste from docs.

2. mirrord-config

Generates and validates mirrord.json, which tells mirrord what to do and where to do it. mirrord’s config surface is wide: traffic stealing vs mirroring, filesystem modes, env injection, target selection, database and queue behavior. The skill turns “I want X behavior” into valid config without you opening the docs.

Try: “Steal traffic from pod/api-server, but only requests carrying my baggage header so I don’t break anyone else’s session.”

The agent writes the right config, validates it against the schema, and explains what it does. The interesting part: the skill covers the full mirrord.json surface (target selection, traffic modes, env injection, file system hooks), not just filters. Filtered steal is one of the things that lets multiple developers share one cluster without colliding, but it’s only one of the patterns mirrord-config knows how to set up.

3. mirrord-operator

Sets up the mirrord Operator for teams. Mirroring traffic from a pod is concurrency-safe out of the box; you only need the operator when multiple developers want to steal the same pod’s traffic with different filters, share branched databases, or split a Kafka topic. The operator brokers session boundaries, RBAC, and the routing rules that make those interactions work without collisions.

Try: “Install the operator on our EKS cluster and configure RBAC so only the dev group can use it.”

monday.com runs 350+ engineers on a single shared staging cluster this way. The operator is what makes that scale work: concurrent filtered steal so multiple devs share one pod, queue splitting so they share one SQS topic, DB branching so they share one database, RBAC so they don’t touch workloads they shouldn’t, and the rest of the routing rules that let 350 developers work on the same cluster at the same time.

4. mirrord-ci

Run integration tests in CI in isolation against your staging cluster, instead of spinning up an ephemeral test environment for each PR. The service under test runs in the CI runner with mirrord; mirrord steals the cluster traffic destined for it and routes it to your build, so test traffic follows the same path it would in production, with only that one service swapped. That catches the integration bugs mocks miss, with one shared staging cluster instead of one ephemeral cluster per PR.

Try: “Set up GitHub Actions to run our integration tests against the staging cluster.”

The agent writes the workflow, injects your kubeconfig from a secret, sets MIRRORD_CI_API_KEY, and wires mirrord ci start around your service and mirrord ci stop in the cleanup hook.

5. mirrord-db-branching

Per-developer database branches. Copy-on-write Postgres (or any supported DB), so two engineers can develop against “the same” database without stepping on each other’s writes.

Try: “Give me an isolated DB branch off the staging Postgres for this feature.”

The agent provisions the branch via the operator, points your local process at the branch, and tears down when the session ends. No more “who deleted the test users?” Slack threads.

6. mirrord-kafka

Kafka queue splitting. Each developer gets a slice of the topic that only they consume, while the original consumer keeps running in the cluster. Lets you run a real Kafka workload locally without intercepting messages other people care about.

Try: “Set up queue splitting on the orders.created topic for my local consumer.”

The agent configures the operator’s Kafka splitter, gives your local process a per-developer consumer group, and confirms message routing.

Install

# Claude Code
/plugin marketplace add metalbear-co/skills

# Any Agent Skills consumer
npx skills add metalbear-co/skills

Enter fullscreen mode

Exit fullscreen mode

Repo: github.com/metalbear-co/skills. Issues and PRs welcome; we ship updates fast.

Source link

TECH & AI

Every AI coding assistant is shipping the same security bugs.

jackminion May 13, 2026 0

*Not a promo.. I mean why would anyone promote something free, actually looking to get some contributors to help us seal sone holes of ai-coded products and encourage founders of ai-written products to respect security and privacy.*

So, here it goes.. Nowadays many of us are building with Claude Code, Copilot, Cursor, Codex, Gemini, or any AI coding assistant, this is worth running against your project. – To be honest, I did think of building a tool around this, but it doesn’t sound nice to monetize on vulnerabilities for me, nor do I see much logic having a ‘blackbox’ that allegedly scans your projects. We’re talking about security here, so IMO such things should be open source and allow contributions.

And of course – my good friend AI helped me speed up the shipment of this repo 🙂

Some of most common things that appear :

JWT secrets set to “secret” or “changeme”

API keys in NEXT_PUBLIC_ env vars, fully exposed to the browser
User input going directly into system prompts via string interpolation
Vector databases using one shared namespace for all users — any user’s RAG query can
surface another user’s documents
Agents handed child_process access with no scope restrictions

These aren’t obscure edge cases, this is how most of AI-generated code comes out, if you allow it to produce HUGE chunks instead of targeted and controlled ai-coding. Even knowing tons about security and vulnerabilities, having AI write code might still expose you to some common cases.

The problem with existing references

OWASP, NIST, and CWE are good. They were written for a world where developers wrote most of their code by hand. They don’t cover MCP tool poisoning, cross-agent prompt injection, or what happens when your agent’s long-term memory accepts unsanitized writes. Ok, that’s not entirely true – today AI-generated code is allover the place, so we see more and more tools to review the code, etc, but many are paid and/or complicated which is an entry barrier for a vibe coder.

What I and few AIs shipped

A 258-item checklist across 17 categories, with a detection method for every item: static grep or AST pattern, runtime test, or config inspection. Severity rated. 33 items in Category 6 specifically cover LLM integration vulnerabilities that don’t appear elsewhere.

More usefully: a companion prompt.md that turns the full checklist into a structured codebase scan you can run in one command.

Running it

From your project root, with Claude Code installed:

claude “$(curl -s https://raw.githubusercontent.com/a-leks/genai-app-security-checklist/main/prompt.md)”

Enter fullscreen mode

Exit fullscreen mode

With Gemini CLI:

gemini “$(curl -s https://raw.githubusercontent.com/a-leks/genai-app-security-checklist/main/prompt.md)”

Enter fullscreen mode

Exit fullscreen mode

The model reads your codebase, runs all 258 checks, and returns a markdown report with severity, file path, line number, code snippet, and a specific remediation for each finding.

What the output looks like

### (6.1) Prompt injection — user input in system prompt
– Severity: Critical
– File: app/api/chat/route.ts
– Line: 34
– Snippet:
const systemPrompt = `You are a helpful assistant. User context: ${req.body.userBio}`
– Remediation: Move user-supplied content to the user message role, never system.
Strip prompt control characters before passing any user string to the model.

Enter fullscreen mode

Exit fullscreen mode

The LLM-specific items worth knowing

6.26 — MCP tool poisoning. If your agent uses third-party MCP servers, tool results from those servers enter the agent’s context as trusted input. An attacker who controls one of those servers can inject instructions through it.

6.27 — Agent memory poisoning. Whatever your agent writes to long-term memory gets read back in future sessions. If malicious content reaches that memory store, it executes next time the agent retrieves it.

6.30 — Cross-agent prompt injection. In multi-agent systems, output from Agent A becomes input to Agent B. If an attacker can influence Agent A’s output, Agent B processes the attack payload without knowing its origin is untrusted.

Where to find it

https://github.com/a-leks/genai-app-security-checklist

Apache 2.0. Contributions welcome — especially new LLM attack patterns with detection methods and real-world references.

Source link

DAILY NEWS