DAILY NEWS

Stay Ahead, Stay Informed – Every Day

Advertisement
The Hidden 43% — How Teams Are Wasting Almost Half Their LLM API Budget



You look at your provider dashboard and see one number: the total bill. It’s like getting an electricity bill that just says “$5,000” with no breakdown of whether it was the AC, the fridge, or someone leaving the lights on all month.

tbh, most AI startups are flying blind right now. We recently looked into the cost breakdown for several teams and found something crazy: almost 43% of LLM API spend is completely wasted. It’s not about paying for usage; it’s about paying for bad architecture.

Here’s where the leaks are actually happening:

Retry Storms (34% of waste)Your agent fails to parse a JSON response, so it retries. And retries. Sometimes 5-10 times in a loop. You aren’t just paying for the failure, you are paying for the massive context window sent every single time.
Duplicate Calls (85% of apps have this issue)Multiple users asking the exact same question, or internal systems running the same RAG pipeline on the same document. Without caching at the provider level, you’re paying OpenAI to generate the identical tokens twice.
Context BloatSending the entire 50-page document history when the user just asked “what’s the summary of page 2”. RAG is great, but shoving everything into the prompt “just in case” is burning your runway.
Wrong Model SelectionUsing GPT-4o or Claude 3 Opus for simple classification tasks when Haiku or GPT-3.5-turbo would do it for a fraction of the cost.

You can’t fix what you can’t see. That’s exactly why I built LLMeter (https://llmeter.org?utm_source=devto&utm_medium=article&utm_campaign=hidden-43-percent-llm-waste). It’s an open-source dashboard that gives you per-customer and per-model cost tracking. Stop guessing who or what is draining your API budget.

Fwiw, just setting up basic budget alerts and seeing the breakdown by tenant usually drops a team’s bill by 20% in the first week. Give it a try, it’s open source (AGPL-3.0) and you can self-host or use the free tier.



Source link

Stop Writing Endpoints. Start Defining Systems.



For a long time, I thought building APIs meant writing endpoints.

You know the pattern:

Define a route
Validate input
Query the database
Transform the result
Send a response

Do that over and over again.

Different routes. Same structure.

The Illusion of Control

Writing endpoints feels productive.

You’re in control of everything:

The logic
The validation
The data flow

But after a while, something becomes obvious:

You’re not building systems.

You’re repeating patterns.

The Real Problem

Most APIs look like this:

app.get(‘/users/:id’, async (req, res) => {
const id = req.params.id;

if (!id) {
return res.status(400).json({ error: ‘Missing id’ });
}

const user = await db.users.findById(id);

if (!user) {
return res.status(404).json({ error: ‘Not found’ });
}

return res.json(user);
});

Enter fullscreen mode

Exit fullscreen mode

Now multiply that by:

Dozens of endpoints
Multiple resources
Different validation rules
Slight variations in logic

You end up with:

Repeated code
Inconsistent patterns
Hard-to-maintain systems

You’re Not Writing Logic. You’re Rewriting Structure.

Look closer at most endpoints.

They follow the same shape:

Extract input
Validate input
Execute query
Handle errors
Return response

The structure doesn’t change.

Only the details do.

So why are we rewriting the structure every time?

The Shift: Define, Don’t Rewrite

Instead of writing endpoints…

Define them.

What if your API looked like this instead?

get:
user:
GetUserById:
input:
id: number
where:
id: $param.id
response:
id: number
name: string
email: string

Enter fullscreen mode

Exit fullscreen mode

No route handler.

No repeated boilerplate.

Just a definition.

What This Changes

When you define systems instead of writing endpoints:

Structure becomes consistent
Validation becomes automatic
Queries become predictable
Behavior becomes visible

You’re no longer guessing how something works.

You can read it directly.

From Endpoints to Systems

Traditional approach:

Every endpoint is custom
Logic is scattered
Behavior is implicit

System-driven approach:

Endpoints follow a pattern
Logic is structured
Behavior is explicit

You move from “code-first” to “contract-first.”

Where the Code Goes

This doesn’t eliminate code.

It moves it.

Instead of writing endpoint logic repeatedly…

You write:

A compiler that reads definitions
A pipeline that executes them
A system that enforces rules

Code becomes the engine.

Not the repetition.

Example Flow

With a system-driven approach, a request might flow like this:

Request → Parse Definition → Validate → Build Query → Execute → Format Response

Enter fullscreen mode

Exit fullscreen mode

The difference is:

The flow is constant
The behavior is defined in configuration

Why This Matters

Without this approach:

Every developer writes endpoints differently
Bugs are repeated across routes
Refactoring becomes painful

With this approach:

Patterns are enforced
Behavior is predictable
Systems scale cleanly

“Isn’t This Less Flexible?”

Yes.

And that’s the point.

Unlimited flexibility leads to:

Inconsistency
Complexity
Fragile systems

Constraints lead to:

Where This Fits

This kind of system works best when:

You have repeated CRUD patterns
You want consistent APIs
You care about long-term maintainability

It doesn’t replace every use case.

But it replaces most of the boring, repetitive ones.

The Bigger Idea

This isn’t just about APIs.

It’s about how we build software.

Instead of:

Writing everything manually
Repeating patterns
Hoping for consistency

We can:

Define systems
Enforce structure
Let the engine handle execution

Final Thought

Writing endpoints feels like control.

But it’s often just repetition.

Defining systems feels restrictive at first.

But it leads to something better:

Clarity.

Consistency.

Scalability.

That’s why I stopped writing endpoints…

…and started defining systems.



Source link

PCIe Device Passthrough: NIC Name Instability and MAC Pinning



My Proxmox node rebooted, and suddenly the host was unreachable via SSH. I had to plug in a physical monitor and keyboard only to find that my primary network interface, which had been enp4s0 for months, had decided to rename itself to enp5s0.

Because my /etc/network/interfaces file was explicitly tied to enp4s0, the bridge didn’t come up, the IP wasn’t assigned, and I was locked out of my own hardware.

What I expected

I expected the Linux kernel to consistently enumerate my PCIe devices. In a static hardware environment where nothing has moved, the PCI bus address should be deterministic. If the NIC is plugged into the same slot and the BIOS hasn’t changed, enp4s0 should stay enp4s0 forever. This is the “happy path” most documentation assumes.

What actually happened

The reality is that PCIe enumeration is not always a constant. I’m using a mix of onboard NICs and a PCIe expansion card. I also have a GPU passed through to a VM.

The surprise here is how the kernel’s predictable network interface naming (systemd-udevd) interacts with the PCIe topology. When I added a new PCIe device and tweaked some BIOS settings for IOMMU, the way the kernel mapped the physical slots to the virtual naming changed. A slight shift in how the PCIe switch reported the devices caused the index to jump.

This isn’t just a “one-time fluke.” If you’re running a multi-node cluster or using GPUs that might move addresses (something I’ve documented before in GPU PCI Address Instability), you’ll find that the kernel is surprisingly flexible with where it puts things.

The root cause is that enp4s0 is a name derived from the PCI location. If the location changes—even by one digit—the name changes. If your network config depends on that name, your system is one reboot away from a blackout.

The Fix: MAC Pinning

The only way to stop this is to stop relying on the PCI slot location and start relying on the hardware’s unique identifier: the MAC address.

I decided to use systemd .link files. This allows me to tell the kernel: “I don’t care where this device is on the PCIe bus; if it has this MAC address, call it eth0.”

1. Identify the MAC address

First, I had to find the actual MAC of the problematic NIC while I had console access.

ip link show

Enter fullscreen mode

Exit fullscreen mode

I looked for the interface that was currently named enp5s0 (the “wrong” name) and copied the link/ether value.

2. Create the .link file

I created a custom link file in /etc/systemd/network/. I chose the name 10-lan.link to ensure it loads early in the boot process.

# /etc/systemd/network/10-lan.link
(Match)
MACAddress=00:11:22:33:44:55

(Link)
Name=eth0

Enter fullscreen mode

Exit fullscreen mode

(Note: I’ve anonymized the MAC address above. Use your actual hardware MAC here.)

3. Update the network configuration

Once the interface is pinned to eth0, I had to update the Proxmox network configuration to match. I edited /etc/network/interfaces to replace the volatile enp4s0 with the stable eth0.

# Example snippet from /etc/network/interfaces
auto eth0
iface eth0 inet manual

auto vmbr0
iface vmbr0 inet static
address 10.0.0.x/24
gateway 10.0.0.1
bridge-ports eth0
bridge-stp off
bridge-fd 0

Enter fullscreen mode

Exit fullscreen mode

4. Apply and verify

I ran systemd-networkd-restart (or just rebooted, since I was already at the console) and verified the name with ip a. The NIC was now consistently eth0, regardless of whether the PCIe bus shifted.

Why this matters

If you’re just running a single VM on a desktop, this is a minor annoyance. But if you’re building a production-grade homelab, this is a critical failure point.

You’ll hit this specifically in these scenarios:

Adding/Removing PCIe Hardware: Adding a new NVMe drive or a GPU can shift the enumeration of other devices on the same root complex.

BIOS Updates: A BIOS update often resets PCIe lane bifurcation or IOMMU settings, which can completely reorder how the kernel sees your NICs.

Using PCIe Switches: Some high-end motherboards or riser cables use PCIe switches that can report different topologies depending on the power state of the devices.

The Tradeoff

The tradeoff here is that you’re moving away from the “modern” predictable naming convention back to the “old” ethX style. Some people find eth0 ugly or outdated, but in a headless server environment, “ugly” is better than “unreachable.”

I’ve also seen people try to fix this using udev rules in /etc/udev/rules.d/. While that works, .link files are the native systemd way to handle this and are generally cleaner to maintain.

Lessons Learned

The biggest lesson here is that documentation for Proxmox and Debian assumes your hardware topology is a constant. It isn’t.

When you’re doing complex things like PCIe passthrough—which I’ve detailed in my GPU Passthrough Gotcha Guide—you are intentionally messing with the PCI bus. You’re telling the host kernel to ignore certain devices so the VM can claim them. This volatility is a side effect of that power.

If you are passing through NICs or GPUs, do not trust the default interface names. Pin your critical management interfaces to their MAC addresses immediately. It takes five minutes to set up and saves you from a midnight trip to the server rack because a reboot decided your network card now lives at enp6s0.

For those of you managing larger fleets or complex AI agent infrastructure, this kind of hardware-level stability is the foundation. You can’t build a reliable multi-agent AI pipeline if the underlying Kubernetes worker nodes are randomly losing their network identity.

Next time you’re configuring a new node, don’t just copy the enpXsX name from the GUI. Take the extra step to pin it. Your future self will thank you when the next BIOS update doesn’t break your entire cluster.



Source link