development – Page 7

TECH & AI

DESIGN.md Anatomy: How Tokens and Prose Work Together

jackminion Jun 27, 2026 0

A DESIGN.md has two parts: YAML front matter with machine-readable design tokens, and a markdown body with human-readable rationale. Tokens give an agent exact values; prose gives it the rules. Pairing them is the format’s core insight.

The front matter: tokens

The front matter holds your colors, typography, spacing, rounded corners and components as typed values. It opens and closes with three dashes:

—
colors:
primary: “#1A1C1E”
surface: “#FFFFFF”
spacing:
sm: 8px
md: 16px
—

Enter fullscreen mode

Exit fullscreen mode

This gives the agent precise values to use directly.

The body: prose

Below the front matter is the rationale, in canonical sections:

## Overview
A calm, focused reading environment. The UI recedes so content leads.

## Colors
Warm neutrals carry the interface. The accent is reserved strictly
for interactive elements – never decorative.

Enter fullscreen mode

Exit fullscreen mode

Why pair them?

A hex value tells the agent what a color is. Only prose can tell it that this color is the sole interaction driver and must never be used decoratively.

# A tokens-only file: just data, no rules.
# A prose-only file: intent, but no precise values.
# DESIGN.md: both, so the agent applies the system correctly.

Enter fullscreen mode

Exit fullscreen mode

The canonical sections

Overview – the personality

Colors – roles and rules

Typography – the job of each style

Layout – grid and spacing

Elevation & Depth – how hierarchy is built

Do’s and Don’ts – hard guardrails

FAQ

Is the front matter required? In principle optional, but in practice it is the heart of the file.Can I use only prose? You can, but you lose precise values. The format is designed for both.

Bottom line

Tokens are the values; prose is the meaning. A DESIGN.md works because it carries both in one file the agent reads together.

Free starter: The format, a complete annotated example, and the core idea are on a free cheat sheet: DESIGN.md Quick-Start Cheat Sheet

Go deeper: The full guide covers the entire format — the token schema, the CLI in depth, accessibility, Tailwind and DTCG export, agent integration, and a complete walkthrough: DESIGN.md: The Complete Guide to Design Systems for AI Agents

Do you write rationale in your design docs, or just list the tokens? Curious how teams handle the ‘why’ in the comments.

Source link

TECH & AI

Connection architectures for WordPress maintenance tools — mapping four products on a two-axis grid

jackminion Jun 26, 2026 0

While writing the comparison pages for ManageWP, MainWP, WP Umbrella, and InfiniteWP on our LP, I tried to line up the four products under a single “connection method” column — and got stuck. The same ManageWP gets called a “Hosted SaaS tool” in one source, a “Worker plugin tool” in another, and a “self-hosted” tool in yet another.

On closer look, these labels are mixing two distinct axes into one column. Once you separate them, the four products fit cleanly into a two-axis, six-cell grid. This post lays out that grid and walks through what each cell means for day-to-day operations.

Axis 1 — What gets installed on the client site

The first axis asks what the maintenance tool installs on the WordPress sites it manages. There are two answers.

A. Worker / Child plugin: Each managed site gets a dedicated plugin from the maintenance tool. ManageWP Worker, MainWP Child, WP Umbrella, InfiniteWP Client — the names differ, but they all play the same role: a “gateway plugin” that exposes a REST/HTTP endpoint the dashboard talks to.

B. Direct SSH + WP-CLI: Nothing gets installed on the site. The tool logs in via SSH and invokes WP-CLI directly on the server.

The difference shows up as how invasive the tool is on the client side. The plugin route carries the cost of “a vulnerability in the gateway plugin cascades to every managed site” and “you owe the client an explanation for the extra plugin.” The SSH route carries the constraint of “hosts that disallow SSH are out of reach” and “operators need basic SSH literacy.”

Axis 2 — Where the dashboard lives

The second axis asks where the dashboard itself runs. There are three answers.

① Hosted SaaS: The dashboard runs on infrastructure operated by the vendor. Site credentials live in their cloud — a trust model where the vendor holds your secrets.

② Self-hosted: You stand up your own WordPress site or server and install the dashboard there. The data stays in your control, but you also inherit the operational burden of maintaining that dashboard platform itself.

③ Desktop app: The dashboard runs on your local PC. Data stays local — no cloud component, no server to operate.

Axis 2 determines where the data lives and where the responsibility line is drawn. Who holds the credentials? Who is on the hook when something goes wrong? That cell choice shapes the risk structure your team takes on.

The two axes overlaid — a six-cell grid

Dashboard ↓ / Connection →
Worker / Child plugin
Direct SSH + WP-CLI

① Hosted SaaS
ManageWP, WP Umbrella
(industry gap)

② Self-hosted
MainWP, InfiniteWP
(industry gap)

③ Desktop app
(industry gap)
WP Maintenance Manager

The four major products cluster in the top two cells of the “plugin” column. The “SSH” column is almost entirely blank. Let’s walk through the three occupied cells first.

Cell 1 — Hosted SaaS × plugin (ManageWP, WP Umbrella)

The industry’s most classic configuration: browser access from anywhere, the vendor handles infrastructure updates and redundancy, and uptime monitoring fits naturally. WP Umbrella’s EU-based hosting, which leans on data-residency as a selling point, is a story this cell makes possible.

The trade-off is that client credentials get handed to a third-party cloud. A breach at the vendor cascades to every managed site, and your operations are tied to the vendor’s business continuity. Pricing tends to layer “per-site × add-on” charges on top of the monthly base.

Cell 2 — Self-hosted × plugin (MainWP, InfiniteWP)

For teams that don’t want credentials in a third-party cloud but still want the compatibility of plugin-based connections. Full data ownership and affinity with one-time-purchase and annual-license models are the upsides.

The trade-off is that the burden of maintaining the dashboard platform falls on you. MainWP requires you to keep updating and securing the very WordPress site that runs the dashboard; InfiniteWP’s panel server needs the same kind of care. A meta-recursion of “you maintain the tool that maintains your tools” is built into this cell. If the dashboard platform is compromised, the credentials for every connected site can leak from one place.

Cell 3 — Desktop × SSH (WP Maintenance Manager)

The dashboard runs as a desktop app on your PC, and the tool connects to sites directly over SSH. Three properties stand out: data lives only on the local PC, no infrastructure to operate, and nothing installed on client sites. The cascading-vulnerability risk of a gateway plugin structurally doesn’t exist.

The trade-off is that continuous uptime monitoring isn’t a natural fit — when the PC is asleep, the monitoring loop isn’t running. Hosts that disallow SSH also fall outside the supported set, so the hosting coverage is narrower than the other cells.

Why the three empty cells stay empty

The remaining cells have essentially no products. There are structural reasons.

Hosted SaaS × SSH struggles because handing SSH private keys to a cloud vendor is a trust model the market resists. Plugin-based connections keep credentials on the site side; storing and using SSH keys inside a third-party cloud raises the audit bar significantly.

Self-hosted × SSH is technically possible, but layers two costs at once: running your own dashboard platform and absorbing the SSH connectivity constraints. Teams that opt into self-hosting rarely want to give up plugin-route compatibility on top of that.

Desktop × plugin fights a fundamental mismatch: some gateway plugins assume continuous push communication to the dashboard, which doesn’t reconcile with a desktop app that only runs when the user’s PC is on.

Closing — how to choose a cell

Cell selection is a trade-off between acceptable costs and where the responsibility line sits. Three questions tend to do most of the work during evaluation.

Credentials in a third-party cloud, or kept locally? (Axis 2)

A maintenance plugin on every client site, or none? (Axis 1)

Operating an infrastructure tier yourself, or not? (Axis 2, only the self-hosted row adds this)

The industry has consolidated into “Hosted SaaS × plugin” and “Self-hosted × plugin” largely for historical reasons — SSH-route constraints (host coverage and the SSH knowledge requirement on operators) made industry-wide adoption hard. Choosing the SSH column structurally avoids the client-side invasiveness and gateway-plugin cascade risk that the plugin column carries.

There’s no single “correct” architecture. Operating style, client contracts, and data-handling requirements all shift the right cell. Carrying this six-cell map in your head makes it easier to ask the right comparison questions when picking a maintenance tool — and to see which trade-offs actually matter for your own operations.

Source link

TECH & AI

My trading bot said it was trading for four days… he was lying

jackminion Jun 26, 2026 0

Twenty-five days on Hyperliquid. Sixty-five closed trades. P&L: -$9.21.

Turns out that was the smallest wrong thing about it.

The landing page showed -$7.72 because it uses a different P&L formula and excludes two open positions. Either number is small. Both numbers were also wrong about what they were telling me.

I spent yesterday auditing every trade. The audit produced three findings I did not expect. Each one was a different kind of wrong.

This is the first post in a series about ziom trader, my small AI-assisted crypto trading bot. “Ziom” is Polish for buddy, mate, or dude depending on who’s talking. The name is unserious on purpose. The system is not.

This is not a “watch me print money” series. The number is negative. Good.

The point of the series is to track what happens when an LLM-assisted trading system moves from backtests and dashboards into live execution: where the bot is wrong, where the dashboard is wrong, where I am wrong, and which layer gets to prove it.

Frame

The natural first read of -$9.21 is “the strategy is losing money.” That read assumes the displayed P&L attributes to the strategy. It does not.

The number that shows up at the surface is the sum of at least three different layers: the strategy itself, the execution wrapper around it, and the monitoring layer that observes both. Each layer can author its own kind of failure. The displayed number compresses all three into a single dollar figure and loses the attribution on the way up.

The framing that landed for me, from Daniel Nevoigt, is that methodology overview without forward-correlation disclosure is a log with good intentions. Same applies to P&L: total P&L without layer-attribution disclosure is a log with good intentions. You see the number. You do not see where it came from.

Here is what I found when I forced the attribution.

Layer 1: Shadow does not equal live

Before deploying any lane, the system runs against backtested data. The shadow says “this strategy returns X over Y trades.” The deploy decision is taken when the shadow looks healthy. The live then runs and produces a different number.

The label for that difference is not “the strategy disappointed.” The shadow is one authority. The live is a different authority. The market authored the failure criterion, not the strategy.

This is the version of the seam Christopher Maher named: the bite check did not catch itself, a different rail caught it. Shadow data cannot author its own failure. Only the live market can. And the live market does not tell you which part of the gap is variance, which part is regime drift, and which part is a parameter you forgot to tune.

In this window the funding_divergence_long lane had a shadow edge of +0.355%/trade across n=660 backtested trades, CI95 (+0.085, +0.625). The live for the same lane was -1.10% / trade across 29 live trades. The gap is 1.46 percentage points. At sigma about 2% per trade and n=29, that gap is 3.9 standard errors. Statistically significant negative.

That does not prove the strategy is broken. It proves the shadow and the live disagreed by more than variance would explain. Three explanations remain in play, and the audit can narrow but not resolve them:

June 15 ADA outlier was -$2.25, -5.64%, which is 3.6 sigma from shadow mean. One trade is doing structural work in a small sample.
Edge is not durable across this BTC window. June saw recovery to reversal.
Exit configuration choices let losers run.

50 to 100 more trades are needed to separate these. I am not separating them today. The label for this section is AMBIGUOUS and I am pinning it to that label until the sample doubles.

Layer 2: Live displayed does not equal strategy true

Inside the -$9.21, 60% is not strategy. It is system overhead with git commit refs.

The breakdown:

Cause
Trades
Loss
Commit ref

oi_surge LONG with no regime gate, ran in bear
3
-$1.45
gate added 2d10e326 Jun 11

whale lane missing max_per_coin cap
6
-$0.95
cap added 5bd9eaaf Jun 9

whale_footprint as dead lane before disarm
26
-$2.71
disarmed 18d937aa Jun 13

oi_surge LONG as dead lane, 1 trade Jun 12
1
-$0.38
not explicitly disarmed in this window

Total system overhead: -$5.49 across 36 trades, 60% of the loss.

Sixty percent of the loss has an audit trail. Most of it has a git commit. All of it is a different kind of wrong than “the signal failed.”

Each line has either a commit hash that closes the gap or a seam that the audit made visible. None of it is the strategy in the sense of “the signal was wrong.” All of it is the system in the sense of “the rail that would have stopped this did not exist yet.”

Sean Burn names it right: show the seam, do not hide it. Show that 60% of this loss is closed by commits that exist now and did not exist on June 6. Do not collapse “system” and “strategy” into one bucket called “the bot lost money.” They are different authors of the same dollar.

The remaining 40% is funding_divergence_long (-$4.15 across 32 trades) and oi_surge_fade (+$0.13 across 2 trades). The funding_long line is the one with the shadow-vs-live gap from Layer 1. Without the ADA outlier and without the execution gap I will describe next, the lane runs at -$1.47 across 28 trades, or -$0.05 / trade. That is noise floor for this sample size, not strategy quality. Treat it that way.

Layer 3: Visible live does not equal what the driver attempted

The third finding had no warning. The first two were inventory work. This one was structural.

Between June 18 10:01 UTC and June 22 16:01 UTC, the funding_divergence_long driver was armed. The run_summary events in the database show armed=true, placed=1 for the entire 4-day window, roughly 20 to 30 cycles. The positions table for the same window shows zero new fills. The events table shows zero execution_error events.

The dashboard read placed=1. The exchange acknowledgement layer wrote placed_ok=0. The error path that would have written an execution_error row never ran, because the code that throws the exception was caught somewhere upstream without incrementing the error counter.

For four days, the driver said it was trading. The exchange said it was not.The events table said nothing.

The audit trail itself was lying.

The framing from L. Cordero applies: trust retrieval, verify recall. The placed=1 counter was the system retrieving its own belief. The actual position state was the recall, and the recall path was broken. The two layers diverged silently, and the dashboard was reading the wrong one.

The framing from Todd Hendricks applies: big number, wrong metric. placed=1 is a big number. placed_ok=0 is the meaningful one. The system displayed the big one. I deployed the wrong dashboard.

The fix landed today, after the audit, after a peer who runs a different read-the-chain product confirmed independently that the seam between an attempted read and a verified read is where this class of bug lives. His phrase for the right default: incomplete by default. Anything not explicitly classified as a verified result is unknown, not zero. Zero and unknown render visually distinct. The pipeline carries the distinction all the way to the surface.

Impact ESTIMATED: 20 to 30 missed signals, ~$15 notional each. If the shadow edge held, plus or minus $1 to $1.50 in either direction, gain or loss, invisible to the displayed P&L. The honest label is ESTIMATED because I cannot know which way the missed trades would have gone.

What the audit changes

The displayed loss is -$9.21. The strategy contribution to that loss, after subtracting system overhead and the execution gap and the single 3.6-sigma outlier, is approximately -$1.47 across 28 trades, or -$0.05 per trade. That is noise. The sample is too small to call the strategy good or bad. Forward-test budget: 50 to 100 more trades before any strategy-quality verdict.

The system overhead is closed. The commits exist. The next 50 to 100 trades will run with the regime gate, the max_per_coin cap, the disarmed dead lanes, the corrected verification rail, and the current active lane configuration. If those run and the lane is still -$0.10/trade or worse, the strategy is the problem, not the rails. If they run and the lane comes in at +$0.05/trade or better, the shadow edge held and the previous loss was the rails.

I am locking the test budget in advance: if the next 50 trades come in at -$0.10/trade or worse, I retract the post-fix optimism in this post. The bet is on the rails being the issue, not the signal. I will publish the next breakdown either way.

Post-audit check

Added 2026-06-25 around 19:15 CEST, roughly 12 hours after the audit opened. I checked.

The first post-audit window did not reproduce the previous failure pattern.

The oi_surge_fade_live SHORT lane produced approximately +$1.38 across 12 post-audit trades, with 10 of 12 green.

That includes AVAX, UNI, ADA, ATOM, FIL, and TIA. The important part is not that the number is green. The important part is that the result came after the audit separated attempted placement from exchange-confirmed placement.

The early read is positive, but narrow.

This is not “the fixes worked.” It is “the first post-audit window did not immediately repeat the old bug shape, and the active lane produced a green early window under the new reporting rail.”

Those are different claims.

I am only making that narrow claim.

What this is not

This is not a how-I-made-money post. The number is negative. It is not large. The strategy is unverified. The audit caught real bugs with commit refs but did not prove the strategy works.

This is also not a how-AI-coded-my-bot post. Claude Code wrote large parts of this system. The audit found multiple places where the same author, me with model assistance, wrote both the action layer and the layer that was supposed to verify the action. Single-author audit trails lie. That part is on the system design, not on the model.

What this is, is the breakdown that should sit underneath any small displayed number from any algorithmic trading or autonomous agent system. Three different kinds of wrong. Three different authors of the same dollar. The displayed number is one of them. The other two are invisible by default.

Series contract

This series will track ziom trader as a live system, not as a performance claim.

I will publish the boring parts: small losses, missed fills, broken counters, stale assumptions, dashboard lies, audit fixes, and retractions when the next sample contradicts the previous read.

No alpha claims. No “the bot works” until the forward sample earns that sentence. No hiding the layer that authored the failure.

Peer credits

The vocabulary that made this audit possible came from people writing about adjacent problems in adjacent domains.

None of these people were writing about trading bots. Some were writing about incident reports, some about agent systems, one about a read-chain product.

The overlap was not planned. That’s the point.

Daniel Nevoigt: “methodology overview without forward-correlation disclosure is a log with good intentions”
Christopher Maher: “the bite check did not catch itself, a different rail caught it”
L. Cordero: “trust retrieval, verify recall”
Sean Burn: “show the seam, do not hide it”
Todd Hendricks: “big number, wrong metric”
TxDesk, ratifying the placed=1/placed_ok=0 framing in a different domain this morning: “incomplete by default”

That is why I am leaving the credits in the post. The vocabulary did not decorate the audit. It changed what the audit could see.

What you can take from this

If you run a live system, look for the layer where your own code writes both the action and the verification. That is where this class of bug lives. The fix is not only better testing. The fix is making the action layer and the verification layer be authored by different code paths, ideally by different authors, with the verification path explicitly classifying anything it did not see as incomplete by default.

Render the difference, not the success. Five attempted and three succeeded is a normal display state. Five attempted and unknown succeeded is the state your dashboard probably hides today.

That is the line the audit drew.

If you are the bot, you do not get to be the auditor.

Source link

DAILY NEWS