DAILY NEWS

Stay Ahead, Stay Informed – Every Day

Advertisement
I’m a high-school student and I built a free app to stop forgetting everything over the summer



Hey everyone,

The school year is almost over, and every summer I hit the same problem: by September I’ve forgotten most of what I learned. It happens with everything. I feel confident with math formulas while I’m using them, but after a few months away they look like I’ve never seen them. Same with languages — my English and German feel solid at the end of the year or after finishing a book, and a few weeks later my vocabulary and grammar have basically evaporated.

So I built a small app to fight that, called Revise: https://revise-o1t7.onrender.com/

The idea is simple: give yourself regular practice on the things you’ve actually studied, so they stick. You can practise math, history, languages, chemistry and biology — it generates exercises, reading texts (which you can translate word-by-word right in the app) and flashcards, all on a spaced-repetition schedule so things come back just as you’re about to forget them.

It works with whatever AI you already use: you copy a generated prompt into ChatGPT, Claude or Gemini (the free versions are fine), and paste the reply back. No API key, no subscription. It’s free, and you can install it to your phone from the menu.

I’ve been using it for a while and it’s genuinely helped me remember things, not just learn them once. I’d love your feedback — what’s confusing, what’s missing, what you’d want next (there’s a feedback button in your account settings).

Thanks for reading, and I hope it helps someone! 🙂



Source link

Most Repos Look Fine. Until They Don’t.



You’ve been there.

You clone a repo. The README looks solid. There’s a Dockerfile. Maybe a docker-compose.yml. Everything appears set up.

Then you spend the next three hours chasing a missing config variable, an outdated base image, or a local development assumption that only makes sense if you’ve worked on the project for six months.

No one documents these things properly. They live in team memory, Slack threads, and that one engineer who “just knows.”

That’s the kind of engineering pain nobody tracks, and everybody absorbs.

The Hidden Tax on Every Team

Let’s be honest about what’s really happening.

Most engineering teams have gotten good at the visible stuff: code reviews, test coverage, deployment pipelines. But there’s a layer below all of that, repository readiness, that almost nobody validates with the same rigor.

Can a new developer clone this repo and actually run it?Does the Docker setup reflect how the project actually works today?Has the repo drifted from the workflow the team thinks it has?Is there tribal knowledge baked into the setup that stays invisible until something breaks?

These aren’t dramatic problems. That’s exactly why they survive so long.

The cost doesn’t show up in a postmortem. It shows up as:

onboarding that takes days instead of hours
“works on my machine” becoming a running joke
CI failures no one can explain right away
setup bugs being rediscovered by every new hire

None of that gets assigned a ticket. It just quietly eats time.

Why I Built dockgate

I got tired of watching this happen.

Not just to me, but to every developer stuck in the gap between “the repo exists” and “the repo actually works.” That gap has a real cost, and most of it is preventable.

So I built dockgate, a CLI that sits squarely in that gap and does one thing well:

It tells you whether a repository is actually ready to run, maintain, and trust.

Not whether the code is clean.Not whether the tests pass.Whether the operational layer, the Docker setup, project conventions, and environment assumptions, reflects reality.

What dockgate Actually Does

1. It detects what kind of project it’s dealing with

A Node.js repo, a Python repo, and a multi-service project should not be evaluated against identical expectations. dockgate starts with project detection so its checks stay relevant instead of noisy.

2. It uses a rules engine, not guesswork

This is where the tool stops being a script and starts becoming infrastructure.

Instead of scanning for random files and printing generic advice, dockgate uses a structured rule catalog. That makes evaluations more consistent, repeatable, and extensible.

You can use it for:

onboarding triage
repository audits
drift detection over time
validating Docker setup before handoff

Rules evolve as standards evolve. That’s where the leverage comes from.

3. It doesn’t just diagnose, it points forward

A lot of tools are good at listing what’s wrong. Fewer are designed to help you move toward a better state.

dockgate includes setup-oriented workflows so it can be part of actual remediation, not just diagnosis.

4. It fits how developers already work

It’s a CLI. It runs in the terminal. It works with hooks, audit scripts, and existing shell workflows.

That matters.

Good developer tools don’t ask people to change how they work just to get value.

What Makes It Different

There’s no shortage of linting tools, repo templates, and CI validators out there.

But dockgate focuses on something most of them skip: the setup layer. More specifically, the distance between “this repo exists” and “this repo is actually ready.”

That difference shows up in:

Docker support that works on paper but breaks in practice
README instructions that were accurate six months ago
environment assumptions only one team member still understands
local setups that quietly diverge from production

When the setup layer is unclear, the team pays for it every time someone new joins, every time a CI assumption breaks, and every time somebody has to reverse-engineer how the project is supposed to run.

dockgate makes that invisible layer visible.

On Shipping It Like a Real Tool

One thing I felt strongly about from the start was this:

there’s a huge graveyard of useful scripts that never became useful tools because they never crossed the gap between “works on my machine” and “someone else can install and trust this.”

I wanted dockgate to cross that gap deliberately.

That meant doing the less glamorous work too:

npm package support
PyPI wrapper support for Python environments
a changelog
a release checklist
a proper license
regression fixtures
a GitHub Actions publishing workflow

It also meant treating mixed-language teams as real users. dockgate is fundamentally an npm package, but developer teams rarely live in a single ecosystem. A PyPI wrapper lowers friction, and in developer tooling, accessibility often decides whether something gets tried at all.

The Lesson I Didn’t Expect

Building this reinforced something I keep coming back to:

some of the highest-value engineering work lives in problems people dismiss as small.

Setup friction looks small.Repository drift looks small.Docker inconsistency looks small.

Until it’s slowing down every sprint and nobody can fully explain why.

That’s the thing about infrastructure drag: it rarely announces itself. It accumulates. And it is always cheaper to catch early.

What’s Next

Right now, dockgate focuses on repository readiness and Docker validation. But the direction is bigger than that.

I can see it growing into:

stronger standards-driven validation
better drift detection over time
richer project profiles and baselines
more actionable remediation workflows

The foundation is rules-driven and extensible, which means it can grow with the teams that use it.

One Last Thought

A repository is not just a folder of code.

It’s an operational interface for every developer who touches it. When that interface is confusing, fragile, or full of hidden assumptions, the team pays for it whether they acknowledge it or not.

“It looks fine” is one of the most expensive things a repository can say.

dockgate is built to stop teams from taking that at face value.

Try dockgate on npm



Source link

Why Your Gemini Bill Doesn’t Match the Model Names


Why Your Gemini Bill Doesn’t Match the Model Names

tl;dr – Across roughly 3,300 paired skill-eval runs, Gemini 3.5 Flash cost $1.05 per task against Gemini 3.1 Pro’s $0.66, for scores that were effectively identical: 88.6 versus 87.9.

The pricing is even stranger when you look at the actual task costs. Gemini 3.5 Flash and Gemini 4.5 Flash are separated by almost 8× in per-task cost, while Gemini 3.1 Pro comes in cheaper than both. The invoice does not appear to follow the naming hierarchy.

Where the numbers come from?

The benchmark ran every task twice, once with the relevant skill applied and once without, across four Gemini models in OpenHands, totaling roughly 800 tasks per model. Rather than relying on dashboard estimates, we pulled per-call token counts directly from agent session logs and computed costs using Google’s published per-token prices. We then compared the resulting per-task costs across models.

The headline data

Model
$/task (w/ skill)
Score
Pts per $
Input tokens
Turns
List $/Mtok

3.1 Flash Lite
$0.035
70.2
2,006
0.31M
17
$0.25

3 Flash Preview
$0.135
85.4
633
0.63M
24
$0.50

3.1 Pro Preview
$0.66
87.9
132
0.65M
26
$2.00

3.5 Flash
$1.05
88.6
85
1.41M
39
$1.50

A few things stand out from this data.

Cost order and name order are uncorrelated. Gemini 3.1 Pro is cheaper per task than Gemini 3.5 Flash despite carrying a higher per-token list price, while Gemini 4.5 Flash and Gemini 4.5 Flash-Lite, which sit in the same product family, differ dramatically in actual spend. Model names describe intended positioning, but they are a poor guide to real-world agent costs.
Scores do improve with each model generation, which is a genuine positive trend and a good reason to track releases, but capability gains do not automatically translate to cost reductions.
Finally, the practical value pick is Gemini 3 Flash Preview, which lands within three points of the leading models at roughly one-fifth the per-task cost, making it the most efficient option for workloads where a score in the 85 range is acceptable.

Why volume beats unit price

The cost of an agentic task is the product of two variables:

`Task cost = price-per-token × tokens the model decides to spend`

Enter fullscreen mode

Exit fullscreen mode

Model names establish the first variable. The second is determined at runtime by the model’s behavior on the specific task, and it only becomes visible after you read your session logs.

For Gemini 3.5 Flash, the per-task cost breaks down as follows:

Non-cached input: $0.72

Cache-read input: $0.14

Output (including thinking): $0.19

The dominant driver is input volume. Gemini 3.5 Flash sent 1.41 million tokens of context across 39 agent turns per task. Pro sent roughly half that volume across 26 turns, and even at its higher list price of $2.00 per million tokens, its lower volume resolves to a lower total bill.

A model with a cheaper per-token rate that takes more turns to reach an answer will erode its own discount. It is also worth noting that 63-75% of input across these runs was cache-read, which means the effective sensitivity to turn count is even higher than raw list prices suggest: the multiplier is accumulating in your session logs, not on your pricing page.

Skills move cost by tier

Adding a relevant skill to each run changed per-task cost in opposite directions depending on which model ran it:

Pro saw cost drop $0.20 per task (-23%) while the score gained 20 points. The model used fewer turns and less exploratory backtracking, which suggests it was able to act on the structured guidance directly rather than discovering the solution path through iteration.
3.5 Flash was essentially flat, with cost shifting by less than $0.03 in either direction.
3 Flash Preview and Flash Lite each spent slightly more tokens for marginal score gains (+$0.03 and +$0.01 respectively).

The underlying pattern is consistent: a skill compresses the solution path for a model capable of following structured guidance precisely, reducing turn count and therefore total cost. For a model still resolving ambiguity through exploration, the same skill adds context to process rather than a shortcut to apply, and the cost holds steady or rises marginally. A skill is a shortcut for a capable model and overhead for a weaker one.

In practical terms, this produces two clear operating points. Pro with a relevant skill at $0.66 per task is the most cost-efficient route to top-tier performance. Gemini 3 Flash Preview with a skill at $0.135 per task delivers roughly five times the score-per-dollar of either leader, for a score three points lower, which is a reasonable trade for many workloads.

Measure, don’t assume

Four takeaways from this data that apply beyond this specific benchmark:

1/ Do not budget from the rate card. Cost your workload based on measured tokens and turns on your specific tasks, with your specific prompts, in your specific agent harness. Per-token list prices are a useful first filter for ordering candidates, not a reliable predictor of relative spend.

2/ Read cost at the session layer. Aggregate dashboards can show $0 while spend accumulates in the background. Token usage needs to come from raw API responses or agent session logs to be trusted for budgeting purposes.

3/ Watch turn count first. The 39-versus-26 turn gap between 3.5 Flash and Pro is the primary cause of the price inversion observed here, and turn count is the variable most commonly absent from observability tooling. It is the multiplier on everything else in the cost equation.

4/ Re-measure when models update. Gemini 3.5 Flash is a newer release than Gemini 3 Flash Preview and scores higher, but it costs roughly eight times more in this agentic context. Capability improvements and cost improvements are independent variables, and any cost benchmark needs to be re-run with each version update rather than assumed to hold.

Caveats

These results come from a single agent harness (OpenHands), a single benchmark with explicit skill-relevance disclosure, and a specific sample window. Different tasks, prompt structures, and turn-length patterns will shift the absolute numbers and may shift the relative rankings. The finding to carry forward is not a specific model recommendation but a methodology: in agentic settings, cost rankings are not derivable from per-token rates alone, and the ranking that applies to your workload depends on that workload’s specific behavioral profile.

A model name is a pricing tier, not a cost forecast. In agentic workflows, the deciding variable is how many tokens the model chooses to spend to reach an answer, a figure visible only after you run the work and read the logs. The rate card gives you one of the two inputs; only measurement gives you both.

Next: which skills actually earn their tokens? In these runs, 42% produced significant performance gains while 5% were net overhead. We’ll follow up on this analysis in the next post.



Source link