distributedsystems – DAILY NEWS

TECH & AI

I built a free system design whiteboard for engineering interviews

jackminion Jun 21, 2026 0

I bombed a system design interview last year — not because I didn’t know the architecture, but because I spent the first 5 minutes fighting Excalidraw.

So I built SystemDesignBoard — a free, keyboard-first whiteboard specifically for system design interviews.

What it does

You open it, press a key, and start drawing. No account, no onboarding, no drag-from-a-sidebar friction.

R → place a Service node

C → place a Database/Cache/Queue

A → connect two nodes

N → open the scratchpad for scale math

The features I’m most proud of

Animated connectors that show communication type

Instead of just drawing arrows, connectors visually encode how services talk:

⇄ sync — paired dashes (request + ACK)

≋ stream — near-solid fast line with glow (continuous pipeline)

This matters in interviews — your interviewer can glance at your diagram and immediately understand the communication pattern.

Cloud provider badges

Tag any node as AWS (EC2, Lambda, RDS, S3), GCP (GKE, Cloud Run, Firestore), or Azure. Each subtype has its own icon.

Trade-off logging

Right-click any node → Log Trade-offs → attach your CAP theorem stance, consistency level, and scaling strategy directly to the component.

Diagram-as-Code

Type:(Mobile App) -> (API Gateway)(API Gateway) -> (Auth Service)(Auth Service) -> (Users DB)(Feed Service) -> (Posts DB x3)(Feed Service) -> (Redis Cache)Hit Apply — it auto-lays out the whole architecture in seconds.

Export to animated GIF

Export your diagram as a GIF that shows live traffic flow animations. Great for sharing after an interview or in a design doc.

Tech stack

React + TypeScript + Vite

@xyflow/react (ReactFlow v12) for the canvas

Zustand + Immer for state with full undo/redo

html-to-image + gifshot for PNG/GIF export

It’s free and open

No signup required. Works entirely in the browser. Free during beta.

👉 systemdesignboard.com

Would love feedback — especially from anyone who’s done system design interviews recently. What’s missing? What’s annoying? Drop a comment below.

Source link

TECH & AI

The Hidden Networking Problem Behind AI Agent Failures

jackminion May 20, 2026 0

AI agents are being built as if the network is a perfect, low‑latency, lossless abstraction… but it isn’t. And as these systems scale, the real failures won’t come from model quality, but from latency, packet loss, protocol behavior, and the messy reality of distributed systems instead. If we want agents that actually work in production, networking has to become a first‑class design concern again.

The Part of the AI Conversation That’s Missing

As of now, the AI world is tightly focused on bigger models, longer context windows, agent frameworks, orchestration layers, and clever prompting. That’s perfectly fine, all interesting. But none of those things matter if the network underneath can’t reliably deliver data.

AI agents all run across:

And even then, most agent architectures are designed as if the network is a solved problem, but it isn’t and never was.

The Actual Failure Modes Aren’t “AI Issues”, They’re Network Problems

Here are the patterns that continue to show up in modern distributed systems, now amplified by AI workloads:

Latency Amplification

Agents that depend on synchronous calls to remote interference endpoints collapse whenever RTT spikes. A small jump, say 40ms to 120 ms, can turn a responsive agent into a stalled one.

Retry Storms

Agents retry due to their assumption that the service is slow, not the network. Multiply that across dozens of agents, and you get a self-inflicted outage.

Partial observability

Your dashboard can say that everything is green, but your packet capture says otherwise. Retransmits, duplicate ACKs, microbursts, all the concepts that explain behavior, rarely show up in Layer-7-only observability.

Protocol mismatch

HTTP/2 and gRPC work fine until you introduce:

MTU fragmentation
middleboxes
head-of-line blocking
asymmetric routing

Then your ‘fast’ protocol becomes bottlenecked.

Edge constraints

Everyone wants ‘AI at the edge,’ but nobody talks about:

Agents can’t reliably count on shipping huge context windows or raw telemetry upstream.

Practical Advice for Anyone Deploying Agents

If you’re designing or deploying agents, this is the minimum for reliability:

Measure at the packet level, not the application level alone.
Design for variable latency, instead of just ideal latency.
Use protocols that can degrade gracefully.
Implement real backpressure instead of simple retries.
Cache intelligently, especially when it comes to embedding and model outputs.
Stream context in prioritized chunks.
Instrument NIC/PHY telemetry, rather than just HTTP metrics.
Test under real network conditions, this includes loss, jitter, and reordering.

If your agent’s architecture can’t handle the network at its worst, it won’t survive the real world.

Observability Has to Go Below Layer 7 Again

Modern observability stacks are great at, logs, traces, and service metrics. But they’re blind to the things that actually break distributed systems, which are:

What is MTU?
Maximum Transmission Unit (MTU) is the size of the largest protocol data unit that can be communicated in a single network layer transaction. If your AI’s context window data exceeds this without proper fragmentation handling, you see “mysterious” packet loss.

packet loss
bufferbloat
link flaps
retransmit storms
NIC queue saturation

If you want agents that behave predictably, you need visibility into the layers where unpredictability thrives.

This doesn’t mean you have to capture full PCAPs everywhere; even lightweight NIC counters and synthetic probes can reveal the truth just as easily.

Why Rust Keeps Showing Up in These Conversations

Rust isn’t just a “fast” language; it has you think like a systems engineer with its core concepts:

ownership
memory layout
buffer lifetimes
concurrency (without data races)

That mindset is essential whenever you’re building telemetry collectors, edge inference runtimes, protocol parsers, or agent‑side networking components.

Rust gives you the tools to build small, reliable pieces of infrastructure that agents depend on.

Where This Is All Heading

Here’s what I expect to see over the next few years:

Network‑aware agents will outperform everything else out there.
Observability will shift down the stack, closer to the packet and NIC levels.
Hybrid inference (local and remote) will become the default.
Protocol engineering will matter again, and efficiency will beat sheer force.

The teams that understand networking will create the agents that thrive.

Final Thought

If you want AI agents that are reliable and useful, make networking your primary design concern. Treat the network as a critical infrastructure. Start now, and audit your agent architecture for network assumptions and proactively engineer for real-world environments.

The future of AI belongs to those who prioritize improved networking for their product. Actively invest in understanding (and solving) your network challenges. Your agents’ success depends on it.

Have you run into an ‘AI problem’ that turned out to be a networking issue in disguise? I’d love to hear your stories (and how you debugged them) in the comments below.

Source link