Twenty-five days on Hyperliquid. Sixty-five closed trades. P&L: -$9.21.
Turns out that was the smallest wrong thing about it.
The landing page showed -$7.72 because it uses a different P&L formula and excludes two open positions. Either number is small. Both numbers were also wrong about what they were telling me.
I spent yesterday auditing every trade. The audit produced three findings I did not expect. Each one was a different kind of wrong.
This is the first post in a series about ziom trader, my small AI-assisted crypto trading bot. “Ziom” is Polish for buddy, mate, or dude depending on who’s talking. The name is unserious on purpose. The system is not.
This is not a “watch me print money” series. The number is negative. Good.
The point of the series is to track what happens when an LLM-assisted trading system moves from backtests and dashboards into live execution: where the bot is wrong, where the dashboard is wrong, where I am wrong, and which layer gets to prove it.
Frame
The natural first read of -$9.21 is “the strategy is losing money.” That read assumes the displayed P&L attributes to the strategy. It does not.
The number that shows up at the surface is the sum of at least three different layers: the strategy itself, the execution wrapper around it, and the monitoring layer that observes both. Each layer can author its own kind of failure. The displayed number compresses all three into a single dollar figure and loses the attribution on the way up.
The framing that landed for me, from Daniel Nevoigt, is that methodology overview without forward-correlation disclosure is a log with good intentions. Same applies to P&L: total P&L without layer-attribution disclosure is a log with good intentions. You see the number. You do not see where it came from.
Here is what I found when I forced the attribution.
Layer 1: Shadow does not equal live
Before deploying any lane, the system runs against backtested data. The shadow says “this strategy returns X over Y trades.” The deploy decision is taken when the shadow looks healthy. The live then runs and produces a different number.
The label for that difference is not “the strategy disappointed.” The shadow is one authority. The live is a different authority. The market authored the failure criterion, not the strategy.
This is the version of the seam Christopher Maher named: the bite check did not catch itself, a different rail caught it. Shadow data cannot author its own failure. Only the live market can. And the live market does not tell you which part of the gap is variance, which part is regime drift, and which part is a parameter you forgot to tune.
In this window the funding_divergence_long lane had a shadow edge of +0.355%/trade across n=660 backtested trades, CI95 (+0.085, +0.625). The live for the same lane was -1.10% / trade across 29 live trades. The gap is 1.46 percentage points. At sigma about 2% per trade and n=29, that gap is 3.9 standard errors. Statistically significant negative.
That does not prove the strategy is broken. It proves the shadow and the live disagreed by more than variance would explain. Three explanations remain in play, and the audit can narrow but not resolve them:
June 15 ADA outlier was -$2.25, -5.64%, which is 3.6 sigma from shadow mean. One trade is doing structural work in a small sample.
Edge is not durable across this BTC window. June saw recovery to reversal.
Exit configuration choices let losers run.
50 to 100 more trades are needed to separate these. I am not separating them today. The label for this section is AMBIGUOUS and I am pinning it to that label until the sample doubles.
Layer 2: Live displayed does not equal strategy true
Inside the -$9.21, 60% is not strategy. It is system overhead with git commit refs.
The breakdown:
Cause
Trades
Loss
Commit ref
oi_surge LONG with no regime gate, ran in bear
3
-$1.45
gate added 2d10e326 Jun 11
whale lane missing max_per_coin cap
6
-$0.95
cap added 5bd9eaaf Jun 9
whale_footprint as dead lane before disarm
26
-$2.71
disarmed 18d937aa Jun 13
oi_surge LONG as dead lane, 1 trade Jun 12
1
-$0.38
not explicitly disarmed in this window
Total system overhead: -$5.49 across 36 trades, 60% of the loss.
Sixty percent of the loss has an audit trail. Most of it has a git commit. All of it is a different kind of wrong than “the signal failed.”
Each line has either a commit hash that closes the gap or a seam that the audit made visible. None of it is the strategy in the sense of “the signal was wrong.” All of it is the system in the sense of “the rail that would have stopped this did not exist yet.”
Sean Burn names it right: show the seam, do not hide it. Show that 60% of this loss is closed by commits that exist now and did not exist on June 6. Do not collapse “system” and “strategy” into one bucket called “the bot lost money.” They are different authors of the same dollar.
The remaining 40% is funding_divergence_long (-$4.15 across 32 trades) and oi_surge_fade (+$0.13 across 2 trades). The funding_long line is the one with the shadow-vs-live gap from Layer 1. Without the ADA outlier and without the execution gap I will describe next, the lane runs at -$1.47 across 28 trades, or -$0.05 / trade. That is noise floor for this sample size, not strategy quality. Treat it that way.
Layer 3: Visible live does not equal what the driver attempted
The third finding had no warning. The first two were inventory work. This one was structural.
Between June 18 10:01 UTC and June 22 16:01 UTC, the funding_divergence_long driver was armed. The run_summary events in the database show armed=true, placed=1 for the entire 4-day window, roughly 20 to 30 cycles. The positions table for the same window shows zero new fills. The events table shows zero execution_error events.
The dashboard read placed=1. The exchange acknowledgement layer wrote placed_ok=0. The error path that would have written an execution_error row never ran, because the code that throws the exception was caught somewhere upstream without incrementing the error counter.
For four days, the driver said it was trading. The exchange said it was not.The events table said nothing.
The audit trail itself was lying.
The framing from L. Cordero applies: trust retrieval, verify recall. The placed=1 counter was the system retrieving its own belief. The actual position state was the recall, and the recall path was broken. The two layers diverged silently, and the dashboard was reading the wrong one.
The framing from Todd Hendricks applies: big number, wrong metric. placed=1 is a big number. placed_ok=0 is the meaningful one. The system displayed the big one. I deployed the wrong dashboard.
The fix landed today, after the audit, after a peer who runs a different read-the-chain product confirmed independently that the seam between an attempted read and a verified read is where this class of bug lives. His phrase for the right default: incomplete by default. Anything not explicitly classified as a verified result is unknown, not zero. Zero and unknown render visually distinct. The pipeline carries the distinction all the way to the surface.
Impact ESTIMATED: 20 to 30 missed signals, ~$15 notional each. If the shadow edge held, plus or minus $1 to $1.50 in either direction, gain or loss, invisible to the displayed P&L. The honest label is ESTIMATED because I cannot know which way the missed trades would have gone.
What the audit changes
The displayed loss is -$9.21. The strategy contribution to that loss, after subtracting system overhead and the execution gap and the single 3.6-sigma outlier, is approximately -$1.47 across 28 trades, or -$0.05 per trade. That is noise. The sample is too small to call the strategy good or bad. Forward-test budget: 50 to 100 more trades before any strategy-quality verdict.
The system overhead is closed. The commits exist. The next 50 to 100 trades will run with the regime gate, the max_per_coin cap, the disarmed dead lanes, the corrected verification rail, and the current active lane configuration. If those run and the lane is still -$0.10/trade or worse, the strategy is the problem, not the rails. If they run and the lane comes in at +$0.05/trade or better, the shadow edge held and the previous loss was the rails.
I am locking the test budget in advance: if the next 50 trades come in at -$0.10/trade or worse, I retract the post-fix optimism in this post. The bet is on the rails being the issue, not the signal. I will publish the next breakdown either way.
Post-audit check
Added 2026-06-25 around 19:15 CEST, roughly 12 hours after the audit opened. I checked.
The first post-audit window did not reproduce the previous failure pattern.
The oi_surge_fade_live SHORT lane produced approximately +$1.38 across 12 post-audit trades, with 10 of 12 green.
That includes AVAX, UNI, ADA, ATOM, FIL, and TIA. The important part is not that the number is green. The important part is that the result came after the audit separated attempted placement from exchange-confirmed placement.
The early read is positive, but narrow.
This is not “the fixes worked.” It is “the first post-audit window did not immediately repeat the old bug shape, and the active lane produced a green early window under the new reporting rail.”
Those are different claims.
I am only making that narrow claim.
What this is not
This is not a how-I-made-money post. The number is negative. It is not large. The strategy is unverified. The audit caught real bugs with commit refs but did not prove the strategy works.
This is also not a how-AI-coded-my-bot post. Claude Code wrote large parts of this system. The audit found multiple places where the same author, me with model assistance, wrote both the action layer and the layer that was supposed to verify the action. Single-author audit trails lie. That part is on the system design, not on the model.
What this is, is the breakdown that should sit underneath any small displayed number from any algorithmic trading or autonomous agent system. Three different kinds of wrong. Three different authors of the same dollar. The displayed number is one of them. The other two are invisible by default.
Series contract
This series will track ziom trader as a live system, not as a performance claim.
I will publish the boring parts: small losses, missed fills, broken counters, stale assumptions, dashboard lies, audit fixes, and retractions when the next sample contradicts the previous read.
No alpha claims. No “the bot works” until the forward sample earns that sentence. No hiding the layer that authored the failure.
Peer credits
The vocabulary that made this audit possible came from people writing about adjacent problems in adjacent domains.
None of these people were writing about trading bots. Some were writing about incident reports, some about agent systems, one about a read-chain product.
The overlap was not planned. That’s the point.
Daniel Nevoigt: “methodology overview without forward-correlation disclosure is a log with good intentions”
Christopher Maher: “the bite check did not catch itself, a different rail caught it”
L. Cordero: “trust retrieval, verify recall”
Sean Burn: “show the seam, do not hide it”
Todd Hendricks: “big number, wrong metric”
TxDesk, ratifying the placed=1/placed_ok=0 framing in a different domain this morning: “incomplete by default”
That is why I am leaving the credits in the post. The vocabulary did not decorate the audit. It changed what the audit could see.
What you can take from this
If you run a live system, look for the layer where your own code writes both the action and the verification. That is where this class of bug lives. The fix is not only better testing. The fix is making the action layer and the verification layer be authored by different code paths, ideally by different authors, with the verification path explicitly classifying anything it did not see as incomplete by default.
Render the difference, not the success. Five attempted and three succeeded is a normal display state. Five attempted and unknown succeeded is the state your dashboard probably hides today.
That is the line the audit drew.
If you are the bot, you do not get to be the auditor.





Leave a Reply