DAILY NEWS

Stay Ahead, Stay Informed – Every Day

Advertisement
The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin



OpenRouter is a service that provides access to most LLMs with a singular API, which has become exceedingly useful as of late given the rapid cadence of new LLM releases. Due to the company’s role as an intermediary between users and the LLM APIs, OpenRouter has robust, representative data on how users interact with LLMs and it publishes this data on the AI Model Rankings page: a welcome deviation from the labs themselves which generally keep this data secret for competitive reasons. Recently, I checked the OpenRouter rankings and noticed something peculiar.Retrieved May 25, 2026.Two new models are now beating LLM darling Claude in terms of token usage and by more than 50%? I’ve heard of DeepSeek Flash V4: it’s an open-source release from DeepSeek that is not only fast/cheap, but also performs closer to the leading LLM models at a very low cost so it’s no surprise that it’s incredibly popular. But what the heck is Hy3 preview? I’ve never heard of Hy3 or anyone talking about it. Googling it returns an announcement from Chinese megacorp Tencent about Hy3’s open-source release: the model page itself on Hugging Face is sparse and includes oddly honest benchmark results that are not favorable for the model compared to other Chinese open-source models.Coding-oriented benchmark results for Hy3 from Tencent’s Hugging Face repo.A Hacker News search for Hy3 only returned a single submission that isn’t about Hy3, and Reddit discussion is more about the open-weights release. One Reddit thread also noted the rise of Hy3 but from May 6, when Hy3 was offered by OpenRouter for free; that free endpoint is no longer available, and therefore Hy3’s usage in the weekly rankings above is from paying users.Hy3 preview is apparently popular in domains outside of agentic coding as well.Retrieved May 25, 2026.Did I miss something? After some nonscientific testing, the model quality is indeed on par with the other Chinese models indicated and not close to models such as Claude Opus 4.7 and GPT 5.5. It’s not a magic overlooked diamond-in-the-rough, so there has to be something else at play. Fortunately, OpenRouter has the data to narrow down possible explanations, but after checking the data I became more confused.Hy3 preview is available from the OpenRouter API at a stated price of $0.066/1M tokens input which is indeed cheaper than the current top-ranked model DeepSeek V4 Flash with a stated price of $0.10/1M tokens input. Given the drastically rising cost of LLMs and coding agents, it makes sense that a cheaper model would prevail, but only if it offered similar quality and that doesn’t appear to be the case.Here’s the chart of Hy3 preview model usage over time on OpenRouter from the model page:Hy3 preview has no usage data before May 8, which implies that is the time the model switched from the free SKU to the paid SKU. Usage is also steady over time since then with the initial rankings shown in this post being several weeks after launch, showing that the usage is at least organic (or very expensive to fake) and not a one-off outlier. Of note, if you do the math on the numbers presented here, the input-token-to-output-token breakdown on LLM API calls is now 98% input, 2% output in aggregate.For the OpenRouter AI Model Rankings, there have historically been spikes by specific apps switching their default to a particular LLM, such as when Kilo Code offered Grok Code Fast 1 for free in September 2025, which rocketed it up in popularity. That does not appear to be the case here because apps only constitute a very small part of Hy3 preview’s activity.The top 5 apps accout for OpenRouter’s value proposition is the ability to automatically route a given API request to different providers: for open-weight models such as DeepSeek V4 Flash, OpenRouter lists 13 providers, but Hy3 preview only has one provider despite its open weights: the Singapore-based SiliconFlow. Their usage page on OpenRouter shows that SiliconFlow had relatively little usage…until Hy3.The green area corresponds to free Hy3 usage while the blue area corresponds to paid Hy3 usage: OpenRouter does not differentiate them on mouseover which I suspect is a bug.Coincidentially that data visualization shows that usage didn’t drop drastically when Hy3 preview moved from free to paid, which in itself is interesting: if users were not getting value from the free model, they likely would have stopped using it once the costs hit their wallet.What am I missing? Am I overthinking it and the answer is really because “it’s the cheapest” and it received sufficient loss leader traction from the free period?…but is Hy3 preview actually the cheapest LLM backed by a major company on OpenRouter? While I was double-checking some assumptions, I found that OpenRouter has data that shows Hy3 preview is not the cheapest well-performing LLM available: it’s actually DeepSeek V4 Flash, but with interesting caveats.LLM Economics in 2026#So here are a few more notes about how LLM APIs work that aren’t often discussed. LLM calls are still stateless, which means that after every turn (including user messages to the LLM asking questions), all of the tokens in the current conversation thread are reprocessed, meaning that in the case of agents, the count of input tokens increases cumulatively with each successive message and is one reason why starting new threads frequently as context fills up is encouraged for effective agent use.Reverse-chronological OpenRouter logs from one minute of Zed Agent use with DeepSeek V4 Flash selected.But even before agentic workflows, large inputs such as full PDFs bloated context similarly. As a result, most LLM providers implemented prompt caching, which reuses input tokens processed earlier in the conversation: this is a win-win that saves time/compute for the LLM provider and the savings are passed to the customer. Most LLM providers cache inputs automatically, including when accessed through OpenRouter: the disk-lightning-bolt symbol next to the cost indicates tokens were cached and the cache may not always be hit, especially if OpenRouter switches providers mid-thread. The odd API provider out is the Anthropic (Claude) API which requires paying for a cache write first for some reason.Typically, cache read costs are 10% of the input costs: this is the case for the latest models from OpenAI API, Anthropic API, and Google Gemini API. For the 13 providers that serve DeepSeek V4 Flash, cache read costs are between 20% and 50% of input cost, which makes sense as they may not have the same economies of scale. There’s one DeepSeek V4 Flash provider that’s an exception, though:That’s a 2% cache read cost! (multiply by 2, move decimal left 2 places) How are DeepSeek’s cache read prices so low? DeepSeek has implemented a new approach to KV caching starting with V4 and as the model’s creator it is positioned to best leverage its own innovations, which as mentioned the benefits are passed to the customer. The DeepSeek V4 Pro variant model, when served by DeepSeek, has a cache read cost of 0.83%! (use a calculator for that one)Remember how I showed that 98% of LLM API costs are now input tokens, which are aggressively cached? That means the “stated” prices of LLMs are now misleading, but unusually in a pro-customer way because the effective price will be much cheaper! To counter this ambiguity, OpenRouter now has a table for effective prices on the model page, which accounts for the cost savings from cache hits. Here’s the effective pricing for DeepSeek V4 Flash via OpenRouter by provider, which is different for each provider as they have different cache read costs and cache hit rates:Retrieved May 25, 2026; these values update every hour.The prices are all over the place, but notice the second row where DeepSeek itself is the provider, which is priced at a whopping $0.018/1M input tokens! That 2% cache read really pays off. Comparing apples to apples with Hy3 preview, the effective pricing for Hy3 preview as noted on its model page from SiliconFlow (a whopping 44% cache read cost) is $0.034/1M: nearly double DeepSeek V4 Flash from DeepSeek! Of course, this is only applicable if DeepSeek is explicitly used as the provider, which some downstream OpenRouter clients/agents may not support: the OpenRouter prices match the prices directly from DeepSeek, so using a direct DeepSeek API key will work the same.There is also an elephant in the room: DeepSeek is a China-based company and some may not want—or may not legally be able—to give their payment processing information or LLM input data to a Chinese company who has set prompt training = true on their OpenRouter data policy information, which is a legitimate concern.Yes, subscription-based LLM services such as Claude Code and Codex are still the best bang for your buck if you’re able to consistently exhaust the usage limits. But the super-cheap DeepSeek V4 Flash via the API doesn’t lock you into a subscription, and if you need a bit more agentic compute to finish a project, it’s cheaper than paying for extra usage from the subscription services. At the least, it’s a microeconomic check against additional pricing shenanigans that will likely continue through 2026 as competition in agentic AI heats up.Overall, I still don’t understand the popularity of Hy3 preview on OpenRouter. Given the available data and analysis above, my guess is that a single large app not affiliated with Tencent is indeed using Hy3 as its data-processing backbone, and this app isn’t solely an agentic coding app. But one of the advantages of OpenRouter is that it’s low-lift to switch models and providers: it wouldn’t surprise me if DeepSeek V4 Flash gets a spike in a few weeks once people catch on to its pricing.



Source link

Six Claude Code Skills That Close the AI Agent Feedback Loop



AI agents write code that compiles, runs locally, and breaks the first time it touches your Kubernetes cluster. The cluster is full of state the model never sees: the env vars on the running pod, the schema in your real Postgres, the headers your upstream auth-service sends, the topics your consumer subscribes to. Without that context, the code an agent writes for your live infrastructure is informed guessing, whether you’re shipping a new feature or fixing a regression.

mirrord closes that gap. It runs a local process as if it were a real pod inside your cluster: real env vars, real DNS, real network, optionally real inbound traffic. A real example: Daylight Security pairs Cursor with mirrord for daily development. Their team cut their typical edit-test cycle from 5–8 minutes to about 5 seconds. The reason isn’t faster CPUs; it’s that the agent now operates against the real cluster the way a senior engineer would, instead of guessing from logs.

We recently shipped six Agent Skills that teach AI agents how and when to use mirrord. The whole bundle installs in one command.

# Claude Code
/plugin marketplace add metalbear-co/skills

Enter fullscreen mode

Exit fullscreen mode

# Any Agent Skills consumer
npx skills add metalbear-co/skills

Enter fullscreen mode

Exit fullscreen mode

Here’s what each skill does, with a concrete prompt that triggers it.

1. mirrord-quickstart

Zero-to-first-session for engineers (and agents) who have never used mirrord. Detects your OS, walks through CLI install or VS Code / IntelliJ setup, finds your target pod in the cluster, runs your first session. Your local process can now reach every service, database, and queue in the cluster.

Try: “I’m new to mirrord, help me run my Node app against my staging cluster.”

The agent installs mirrord, lists targets in your namespace, picks a likely match, and runs mirrord exec –target … — node server.js. No copy-paste from docs.

2. mirrord-config

Generates and validates mirrord.json, which tells mirrord what to do and where to do it. mirrord’s config surface is wide: traffic stealing vs mirroring, filesystem modes, env injection, target selection, database and queue behavior. The skill turns “I want X behavior” into valid config without you opening the docs.

Try: “Steal traffic from pod/api-server, but only requests carrying my baggage header so I don’t break anyone else’s session.”

The agent writes the right config, validates it against the schema, and explains what it does. The interesting part: the skill covers the full mirrord.json surface (target selection, traffic modes, env injection, file system hooks), not just filters. Filtered steal is one of the things that lets multiple developers share one cluster without colliding, but it’s only one of the patterns mirrord-config knows how to set up.

3. mirrord-operator

Sets up the mirrord Operator for teams. Mirroring traffic from a pod is concurrency-safe out of the box; you only need the operator when multiple developers want to steal the same pod’s traffic with different filters, share branched databases, or split a Kafka topic. The operator brokers session boundaries, RBAC, and the routing rules that make those interactions work without collisions.

Try: “Install the operator on our EKS cluster and configure RBAC so only the dev group can use it.”

monday.com runs 350+ engineers on a single shared staging cluster this way. The operator is what makes that scale work: concurrent filtered steal so multiple devs share one pod, queue splitting so they share one SQS topic, DB branching so they share one database, RBAC so they don’t touch workloads they shouldn’t, and the rest of the routing rules that let 350 developers work on the same cluster at the same time.

4. mirrord-ci

Run integration tests in CI in isolation against your staging cluster, instead of spinning up an ephemeral test environment for each PR. The service under test runs in the CI runner with mirrord; mirrord steals the cluster traffic destined for it and routes it to your build, so test traffic follows the same path it would in production, with only that one service swapped. That catches the integration bugs mocks miss, with one shared staging cluster instead of one ephemeral cluster per PR.

Try: “Set up GitHub Actions to run our integration tests against the staging cluster.”

The agent writes the workflow, injects your kubeconfig from a secret, sets MIRRORD_CI_API_KEY, and wires mirrord ci start around your service and mirrord ci stop in the cleanup hook.

5. mirrord-db-branching

Per-developer database branches. Copy-on-write Postgres (or any supported DB), so two engineers can develop against “the same” database without stepping on each other’s writes.

Try: “Give me an isolated DB branch off the staging Postgres for this feature.”

The agent provisions the branch via the operator, points your local process at the branch, and tears down when the session ends. No more “who deleted the test users?” Slack threads.

6. mirrord-kafka

Kafka queue splitting. Each developer gets a slice of the topic that only they consume, while the original consumer keeps running in the cluster. Lets you run a real Kafka workload locally without intercepting messages other people care about.

Try: “Set up queue splitting on the orders.created topic for my local consumer.”

The agent configures the operator’s Kafka splitter, gives your local process a per-developer consumer group, and confirms message routing.

Install

# Claude Code
/plugin marketplace add metalbear-co/skills

# Any Agent Skills consumer
npx skills add metalbear-co/skills

Enter fullscreen mode

Exit fullscreen mode

Repo: github.com/metalbear-co/skills. Issues and PRs welcome; we ship updates fast.



Source link

How I Use Claude to Build Full-Stack Apps in Under 4 Hours — The Complete Workflow



Three months ago, I spent 3 weeks building a SaaS dashboard. Last week, I built a more complex one in 3 hours and 42 minutes — using Claude as my co-pilot.

The difference wasn’t just “using AI.” It was a specific, repeatable workflow that eliminates the bottlenecks most developers hit when coding with AI.

Here’s exactly how I do it — step by step, with real prompts.

The Problem: Most People Use AI Wrong

I see developers making the same mistakes:

❌ Pasting entire codebases into Claude and hoping for the best
❌ Using vague prompts like “build me a dashboard”
❌ Not breaking down the problem before asking AI
❌ Copy-pasting AI output without understanding it
❌ Not using AI for the things it’s actually best at

The secret? AI is a junior developer that never sleeps, never gets bored, and has read every Stack Overflow answer ever written. But like any junior dev, it needs clear direction.

My 4-Hour Framework

I divide every project into 4 phases of ~1 hour each:

Phase
Time
What AI Does
What I Do

1. Blueprint
60 min
Generates architecture, tech choices
Define requirements, review plan

2. Scaffold
60 min
Generates boilerplate, database schema
Set up repos, configure env

3. Build
60 min
Writes core feature code
Review, test, iterate

4. Polish
45 min
CSS, error handling, edge cases
Final review, deploy

Let me walk through each phase.

Phase 1: Blueprint (60 Minutes)

Before writing a single line of code, I spend an hour planning with Claude. This is the most important phase and the one most people skip.

Step 1: Define the Problem

I start with a clear, structured prompt:

I’m building a SaaS product. Here’s what I need:

Product: A subscription analytics dashboard
Users: SaaS founders who want to track MRR, churn, and LTV
Data Source: Stripe API
Tech Stack: Next.js 14 (App Router), TypeScript, Prisma, PostgreSQL, TailwindCSS
Timeline: Need a working prototype today

Give me:
1. A complete database schema with all relationships
2. API route structure (REST endpoints)
3. Component hierarchy (what pages/components I need)
4. The order I should build things in (dependency graph)
5. Potential gotchas I might hit

Enter fullscreen mode

Exit fullscreen mode

Why this works: Claude generates a concrete plan. No more “I’ll figure it out as I go.” You get a roadmap.

Step 2: Generate the Database Schema

Then I drill into each part:

Based on the schema you generated, write:
1. Complete Prisma schema with all models, relations, and indexes
2. Seed data (at least 20 records per model) that looks realistic
3. Migration SQL if needed

Format as a single `schema.prisma` file I can copy directly.

Enter fullscreen mode

Exit fullscreen mode

Step 3: API Contract

For each API route, give me:
1. The endpoint path and HTTP method
2. Request body/params type (TypeScript interface)
3. Response type (TypeScript interface)
4. Authentication requirement
5. Brief description of what it does

Format as a TypeScript file with all types exported.

Enter fullscreen mode

Exit fullscreen mode

Phase 1 output: You now have a complete spec — database schema, API types, component list, and build order. This would take 2-3 days to produce manually.

Phase 2: Scaffold (60 Minutes)

Now let AI generate all the boring stuff.

Generate Project Structure

Set up a Next.js 14 project with:
– App Router (not Pages Router)
– TypeScript strict mode
– TailwindCSS with these custom colors: (your palette)
– Prisma with PostgreSQL
– NextAuth.js for authentication (GitHub + email)
– shadcn/ui component library

Give me the exact commands to run and the folder structure.

Enter fullscreen mode

Exit fullscreen mode

Generate Type Definitions

Create a complete `types/index.ts` file that includes:
– All database model types (from our schema)
– All API request/response types
– All component prop types
– Utility types (pagination, API response wrapper, etc.)

Make it fully typed. No `any` allowed.

Enter fullscreen mode

Exit fullscreen mode

Generate Utility Functions

Write these utility functions:
1. `apiResponse(data, status, message)` — standardized API response
2. `validateRequest(schema, body)` — Zod validation wrapper
3. `paginate(query, page, limit)` — cursor-based pagination
4. `formatCurrency(amount, currency)` — i18n currency formatting
5. `calculateMRR(subscriptions)` — Monthly Recurring Revenue calc
6. `calculateChurn(subscriptions, period)` — Churn rate calc

Each function should be production-ready with proper error handling.

Enter fullscreen mode

Exit fullscreen mode

Phase 2 output: A complete project skeleton with types, utils, auth, and database — ready to build features on top of.

Phase 3: Build (60 Minutes)

This is where the magic happens. I build features one at a time, using a specific prompt pattern.

The Feature Prompt Pattern

For every feature, I use this template:

Build me the (FEATURE NAME) feature.

Context:
– Tech stack: Next.js 14, TypeScript, Prisma, TailwindCSS, shadcn/ui
– Database schema: (paste relevant models)
– API types: (paste relevant types)

Requirements:
1. (Specific requirement 1)
2. (Specific requirement 2)
3. (Specific requirement 3)

Give me:
1. The API route code (app/api/…)
2. The React component code
3. Any Prisma queries needed
4. Test cases for edge cases

Important rules:
– Use Server Components by default, Client Components only when needed
– Handle loading states and errors
– Use optimistic updates where appropriate

Enter fullscreen mode

Exit fullscreen mode

Example: Building the Dashboard Page

Build me the main dashboard page.

It should show:
1. Revenue chart (line chart, last 12 months) — use Recharts
2. Current MRR card with % change from last month
3. Active subscribers count
4. Churn rate card
5. Top 5 plans by revenue (horizontal bar chart)
6. Recent transactions table (last 10, with pagination)

Layout:
– Top row: 3 stat cards
– Middle row: Revenue chart (span 2/3), top plans chart (span 1/3)
– Bottom row: Recent transactions table (full width)

Use shadcn/ui Card, Table, and Badge components.

Enter fullscreen mode

Exit fullscreen mode

The key here is specificity. I tell Claude:

Exactly which UI components to use
The exact layout I want
The exact data sources

Vague prompts = vague output. Specific prompts = production-ready code.

Phase 4: Polish (45 Minutes)

The last phase is where good apps become great apps.

Error Handling

Go through all API routes and add:
1. Input validation with Zod
2. Proper error responses (400, 401, 403, 404, 500)
3. Error logging
4. Rate limiting considerations

Also add a global error handler for unhandled exceptions.

Enter fullscreen mode

Exit fullscreen mode

Edge Cases

For the dashboard, handle these edge cases:
1. No data yet (empty state with helpful message)
2. Very large numbers (format as K/M/B)
3. Negative growth (red indicators)
4. Stale data (show “last updated” timestamp)
5. Loading states for every async component
6. Mobile responsiveness (stack cards vertically on small screens)

Enter fullscreen mode

Exit fullscreen mode

CSS Polish

Polish the dashboard UI:
1. Add subtle animations (fade-in for cards, chart animations)
2. Consistent spacing and border radius
3. Hover effects on interactive elements
4. Loading skeletons for all data components
5. Dark mode support (use CSS variables or Tailwind dark: prefix)

Enter fullscreen mode

Exit fullscreen mode

Phase 4 output: A polished, production-ready app that handles errors gracefully and looks professional.

The Results

Using this workflow, here’s what I’ve shipped:

Project
Time
Features
Would’ve Taken (Manual)

SaaS Analytics Dashboard
3h 42m
Charts, tables, auth, CRUD
2-3 weeks

Blog Platform
4h 15m
CMS, auth, comments, SEO
1-2 weeks

E-commerce Admin
5h 10m
Inventory, orders, analytics
3-4 weeks

Task Management App
3h 55m
Kanban, real-time, teams
2 weeks

The key insight: I’m not asking Claude to build the entire app at once. I’m using it as a force multiplier in each phase, giving it clear, specific tasks.

5 Tips That Made the Biggest Difference

1. Never Ask AI to “Build an App”

Instead, ask it to build one feature at a time. “Build me a login page” works. “Build me a SaaS” doesn’t.

2. Always Generate Types First

Types are the contract between you and AI. Generate them in Phase 1, reference them in every prompt. This dramatically reduces hallucinations.

3. Use Claude Projects

Claude Projects let you attach files (schema, types, utils) that persist across conversations. This means you never have to re-paste context.

4. Review, Don’t Just Accept

AI will write code that works but might not be ideal. Always review:

Security (auth, input validation)
Performance (N+1 queries, unnecessary re-renders)
Accessibility (keyboard nav, screen readers)

5. Iterate with Specific Feedback

Instead of “this doesn’t look right,” say:

“The cards should be 1/3 width on desktop, full width on mobile”
“Add a subtle blue left border to the stat cards”
“The chart tooltip should show the exact date and amount”

Common Mistakes & How to Avoid Them

Mistake
Fix

Pasting 2000 lines of code
Share files via Claude Projects instead

“Fix this bug” with no context
Include error message, expected behavior, relevant code

Building everything at once
One feature, one prompt, one PR at a time

Ignoring AI warnings
Read every warning, investigate red flags

Not testing
Run code after every major generation, test edge cases

The Bottom Line

Claude (and AI in general) isn’t a magic wand. It’s a force multiplier that works best when you:

Plan first — Spend time on the blueprint before coding

Be specific — Detailed prompts = detailed output

Iterate fast — Small, focused tasks over big, vague ones

Review carefully — You’re the senior dev, AI is the junior

Use the right tools — Claude Projects, shadcn/ui, Prisma, etc.

With this workflow, I’ve gone from multi-week projects to multi-hour projects — without sacrificing quality.

What’s your AI coding workflow? I’d love to hear what’s working for you in the comments.

If you found this helpful, follow me for more AI developer content. I write about practical AI workflows, not hype.



Source link