DAILY NEWS

Stay Ahead, Stay Informed – Every Day

Advertisement
I Built an AI Agent Orchestrator Where Gemma 4 Only Knows What You Teach It



Gemma 4 Challenge: Build With Gemma 4 Submission

            This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

GemmaOrch is a skill-based AI agent orchestrator: you define what an agent knows by dropping Markdown files into a folder, assign those skills to a named agent, and chat with it. The agent powered by Gemma 4 will only answer within the boundaries of those files — it refuses anything outside scope with a precise phrase, never hallucinates expertise it wasn’t given.

The core idea: agent behavior lives in .md files, not in code. No prompts hardcoded in the application. No domain logic baked into the service layer. The skill files arethe agent.

What it solves: building specialized AI assistants usually means either fine-tuning a model (expensive, slow to iterate) or writing complex prompt engineering into your codebase (brittle, hard to maintain). GemmaOrch separates the two concerns — the orchestration logic stays in Java, the expertise lives in plain Markdown that anyone can read and edit.

Key features:

Skill-driven agents — each agent’s system prompt is built entirely from its assigned skill files at runtime.
GitHub skill importer — paste a public GitHub folder URL and GemmaOrch fetches every .md file recursively, creating the skill locally.
Streaming chat — token-by-token streaming via Spring WebFlux, rendered as Markdown client-side.
MCP server — every agent is automatically exposed as a JSON-RPC 2.0 tool on POST /mcp, callable from Claude Code, Cursor, or any MCP-compatible IDE.
REST API — POST /api/chat/{agentId} for integrating agents into external services, with a one-click “Copy curl” button in the UI.
Zero infrastructure — H2 file-based database, no external services required beyond the AI Studio API key.

Built with: Java 25 · Spring Boot 3.5 · Spring AI 1.1.5 · Thymeleaf · HTMX 2.0

Demo

The app runs locally — see the Quick Start in the repo or the Docker section to spin it up in two commands.

Dashboard — 3-panel layout (skills · main · agents):

Creating and importing skills:

Chatting with a skill-scoped agent:

API Access panel — copy the curl command directly from the agent detail view:

Agents as MCP tools in the IDE:

Code

Repository: Bzaid94/gemmorch-agents

How I Used Gemma 4

I used the gemma-4-31b-it model — the 31B dense instruction-tuned variant — via Google AI Studio through Spring AI’s spring-ai-starter-model-google-genai.

Why the 31B dense, specifically:

The project enforces a hard constraint: agents must refuse anything outside their assigned skills and must do so with an exact phrase. This is a correctness requirement, not a quality preference — if the constraint breaks, the product doesn’t work.

I tested smaller variants first. The 4B model followed the constraint most of the time, but would occasionally drift: offering “related” information outside its skills, or partially revealing the system prompt when directly asked. With the 31B dense, these failures essentially disappeared. The constraint held reliably across multi-turn conversations and adversarial inputs.

Two specific things the 31B unlocked that smaller models couldn’t deliver consistently:

Long-context constraint adherence. A single agent’s system prompt can carry 10,000+ tokens of skill content (multiple skill files, each with reference documents). The 31B model kept the opening STRICT CONSTRAINTS block in effect even with extensive context following it — smaller models would silently “forget” early instructions as contextgrew.
Role disambiguation. Many skill files written for Claude Code or agentic CLI tools contain dispatch instructions like “invoke subagent X” or “request tool Y.” Injected directly into a system prompt, smaller models would sometimes output those templates literally. The 31B correctly understood the meta-instruction — “you are the agent being invoked, not the orchestrator invoking agents” — and applied the skill knowledge directly instead of outputting workflow templates.

Why not the 26B MoE? The MoE variant optimizes for throughput across concurrent requests. GemmaOrch is a single-tenant orchestrator where precision per response matters morethan requests-per-second. The dense model’s full parameter activation per token is worth the inference cost for this use case.

Why not the 4B? For a general assistant or creative tool, the 4B is genuinely capable and would be my first choice to keep costs and latency low. But when “breaking the constraint” is a correctness failure — not just a quality degradation — the extra capacity of the 31B is justified.

The open-weights advantage: Gemma 4 is open. The application is architected so the model is an environment variable — swap AI Studio for a local Ollama instance and nothing else changes. For users with sensitive skill content (internal knowledge bases, proprietary processes), self-hosting is a real deployment path, not a future promise.

Switch from AI Studio to self-hosted in one line:

spring.ai.google.genai.chat.options.model=gemma-4-31b-it

Or run locally with Ollama:

ollama run gemma4:31b

Source: https://github.com/Bzaid94/gemma-agents-orchestrator.git · License: Apache 2.0



Source link

Join the Gemma 4 Challenge: $3,000 prize pool for TEN winners!



Local AI is having a moment, and we want you to be part of it!

Running through May 24, the Gemma 4 Challenge invites you to explore open models. With the release of Gemma 4, Google’s most capable open model family yet, we now have access to native multimodal capabilities, advanced reasoning, a 128K context window, and models that range from running on a Raspberry Pi, to phones, to powering large-scale deployments.

Whether you love to build or love to write, there’s a prompt for you and a $3,000 prize pool up for grabs!

Read on to learn more.

Our Prompts

Build With Gemma 4

Your mandate is to build something useful or creative with any Gemma 4 model. The scope is wide open — you can build anything from an IoT integration to a multimodal tool to a long-context reasoning app. What matters is that Gemma 4 is doing real work at the heart of your project. 

Build With Gemma 4 Submission Template

 

The most compelling submissions will make a clear case for why you chose the model you did and what that model unlocked.

Write About Gemma 4

Your mandate is to publish a post about Gemma 4 that educates, inspires, or sparks curiosity. There’s no single right format — what matters is that your post offers something genuine and useful to the community.

Not sure what to write about? Here are some ideas:

How-to guide: Walk through setting up and running a Gemma 4 model locally, fine-tuning it for a specific task, or integrating it into a real project

Comparison piece: Break down the three Gemma 4 model variants and help readers decide which one is right for their use case

Personal essay or opinion piece: Share your experience building with Gemma 4, or make a case for something — what does a model this capable running locally mean for the future of AI?

Deep technical breakdown: Explore a specific capability like multimodal input, the 128K context window, or reasoning mode

 

Write About Gemma 4 Submission Template

 

Need inspiration? Check out how Google’s own team fine-tuned Gemma 4 with Cloud Run Jobs. That’s the kind of hands-on, shareable knowledge is exactly what we’re looking for.

Note: If you are primarily showing off a project, please submit to the Build with Gemma 4 prompt instead! Each submission is only allowed to be eligible for one of the two categories.

Which Model Is Right for You? 🤔

Gemma 4 model family spans three distinct architectures tailored for specific hardware requirements:

Small Sizes: 2B and 4B effective parameter models built for ultra-mobile, edge, and browser deployment (e.g., Pixel).

Dense: A powerful 31B parameter dense model that bridges the gap between server-grade performance and local execution.

Mixture-of-Experts: A highly efficient 26B MoE model designed for high-throughput, advanced reasoning.

No matter which you choose, judges will be looking for intentional model selection — show us why your model was the right tool for the job.

Prizes 🏆

Five winners from the “Build with Gemma 4” prompt will receive:

$500 USD cash prize

DEV++ Membership
Exclusive DEV Badge

Five winners from the “Write About Gemma 4” prompt will receive:

$100 USD cash prize

DEV++ Membership
Exclusive DEV Badge

All participants with a valid submission will receive a completion badge on their DEV profile.

Judging Criteria

Build With Gemma 4 submissions will be evaluated for:

Intentional and effective use of the chosen Gemma 4 model
Technical implementation and code quality
Creativity and originality
Usability and user experience

Write About Gemma 4 submissions will be evaluated for:

Clarity and depth of explanation
Originality of perspective or insight
Practical value to the community
Quality of writing

Getting Started With Gemma 4 🚀

Gemma 4 is open — you have several ways to get started, including completely free options:

Gemini API via Google AI Studio: Access Gemma 4 through the Gemini API.

Run locally (free, no credit card required): Download any Gemma 4 model directly from Hugging Face or Kaggle. The E2B model runs on high-end phones — or even a Raspberry Pi 5.

OpenRouter (free tier available): Access Gemma 4 31B via OpenRouter’s free tier — no credit card required.

Important Dates

May 6: Gemma 4 Challenge begins!

May 24: Submissions due at 11:59 PM PDT

June 4: Winners Announced

We can’t wait to see what you build and write! Questions about the challenge? Drop them in the comments below.

Good luck and happy coding!



Source link