{"id":2879,"date":"2026-05-11T05:04:27","date_gmt":"2026-05-10T22:04:27","guid":{"rendered":"https:\/\/daiilynews.cu.ma\/i-built-an-ai-agent-orchestrator-where-gemma-4-only-knows-what-you-teach-it\/"},"modified":"2026-05-11T05:04:27","modified_gmt":"2026-05-10T22:04:27","slug":"i-built-an-ai-agent-orchestrator-where-gemma-4-only-knows-what-you-teach-it","status":"publish","type":"post","link":"https:\/\/daiilynews.cu.ma\/?p=2879","title":{"rendered":"I Built an AI Agent Orchestrator Where Gemma 4 Only Knows What You Teach It"},"content":{"rendered":"<p><br \/>\n<br \/>\n                Gemma 4 Challenge: Build With Gemma 4 Submission<\/p>\n<pre><code>            This is a submission for the Gemma 4 Challenge: Build with Gemma 4\n<\/code><\/pre>\n<p>What I Built<\/p>\n<p>GemmaOrch is a skill-based AI agent orchestrator: you define what an agent knows by dropping Markdown files into a folder, assign those skills to a named agent, and chat with it. The agent powered by Gemma 4 will only answer within the boundaries of those files \u2014 it refuses anything outside scope with a precise phrase, never hallucinates expertise it wasn&#8217;t given.<\/p>\n<p>The core idea: agent behavior lives in .md files, not in code. No prompts hardcoded in the application. No domain logic baked into the service layer. The skill files arethe agent.<\/p>\n<p>What it solves: building specialized AI assistants usually means either fine-tuning a model (expensive, slow to iterate) or writing complex prompt engineering into your codebase (brittle, hard to maintain). GemmaOrch separates the two concerns \u2014 the orchestration logic stays in Java, the expertise lives in plain Markdown that anyone can read and edit.<\/p>\n<p>Key features:<\/p>\n<p>Skill-driven agents \u2014 each agent&#8217;s system prompt is built entirely from its assigned skill files at runtime.<br \/>\nGitHub skill importer \u2014 paste a public GitHub folder URL and GemmaOrch fetches every .md file recursively, creating the skill locally.<br \/>\nStreaming chat \u2014 token-by-token streaming via Spring WebFlux, rendered as Markdown client-side.<br \/>\nMCP server \u2014 every agent is automatically exposed as a JSON-RPC 2.0 tool on POST \/mcp, callable from Claude Code, Cursor, or any MCP-compatible IDE.<br \/>\nREST API \u2014 POST \/api\/chat\/{agentId} for integrating agents into external services, with a one-click &#8220;Copy curl&#8221; button in the UI.<br \/>\nZero infrastructure \u2014 H2 file-based database, no external services required beyond the AI Studio API key.<\/p>\n<p>Built with: Java 25 \u00b7 Spring Boot 3.5 \u00b7 Spring AI 1.1.5 \u00b7 Thymeleaf \u00b7 HTMX 2.0<\/p>\n<p>Demo<\/p>\n<p>The app runs locally \u2014 see the Quick Start in the repo or the Docker section to spin it up in two commands.<\/p>\n<p>Dashboard \u2014 3-panel layout (skills \u00b7 main \u00b7 agents):<\/p>\n<p>Creating and importing skills:<\/p>\n<p>Chatting with a skill-scoped agent:<\/p>\n<p>API Access panel \u2014 copy the curl command directly from the agent detail view:<\/p>\n<p>Agents as MCP tools in the IDE:<\/p>\n<p>Code<\/p>\n<p>Repository: Bzaid94\/gemmorch-agents<\/p>\n<p>How I Used Gemma 4<\/p>\n<p>I used the gemma-4-31b-it model \u2014 the 31B dense instruction-tuned variant \u2014 via Google AI Studio through Spring AI&#8217;s spring-ai-starter-model-google-genai.<\/p>\n<p>Why the 31B dense, specifically:<\/p>\n<p>The project enforces a hard constraint: agents must refuse anything outside their assigned skills and must do so with an exact phrase. This is a correctness requirement, not a quality preference \u2014 if the constraint breaks, the product doesn&#8217;t work.<\/p>\n<p>I tested smaller variants first. The 4B model followed the constraint most of the time, but would occasionally drift: offering &#8220;related&#8221; information outside its skills, or partially revealing the system prompt when directly asked. With the 31B dense, these failures essentially disappeared. The constraint held reliably across multi-turn conversations and adversarial inputs.<\/p>\n<p>Two specific things the 31B unlocked that smaller models couldn&#8217;t deliver consistently:<\/p>\n<p>Long-context constraint adherence. A single agent&#8217;s system prompt can carry 10,000+ tokens of skill content (multiple skill files, each with reference documents). The 31B model kept the opening STRICT CONSTRAINTS block in effect even with extensive context following it \u2014 smaller models would silently &#8220;forget&#8221; early instructions as contextgrew.<br \/>\nRole disambiguation. Many skill files written for Claude Code or agentic CLI tools contain dispatch instructions like &#8220;invoke subagent X&#8221; or &#8220;request tool Y.&#8221; Injected directly into a system prompt, smaller models would sometimes output those templates literally. The 31B correctly understood the meta-instruction \u2014 &#8220;you are the agent being  invoked, not the orchestrator invoking agents&#8221; \u2014 and applied the skill knowledge directly instead of outputting workflow templates.<\/p>\n<p>Why not the 26B MoE? The MoE variant optimizes for throughput across concurrent requests. GemmaOrch is a single-tenant orchestrator where precision per response matters morethan requests-per-second. The dense model&#8217;s full parameter activation per token is worth the inference cost for this use case.<\/p>\n<p>Why not the 4B? For a general assistant or creative tool, the 4B is genuinely capable and would be my first choice to keep costs and latency low. But when &#8220;breaking the constraint&#8221; is a correctness failure \u2014 not just a quality degradation \u2014 the extra capacity of the 31B is justified.<\/p>\n<p>The open-weights advantage: Gemma 4 is open. The application is architected so the model is an environment variable \u2014 swap AI Studio for a local Ollama instance and nothing else changes. For users with sensitive skill content (internal knowledge bases, proprietary processes), self-hosting is a real deployment path, not a future promise.<\/p>\n<p>Switch from AI Studio to self-hosted in one line:<\/p>\n<p>spring.ai.google.genai.chat.options.model=gemma-4-31b-it<\/p>\n<p>Or run locally with Ollama:<\/p>\n<p>ollama run gemma4:31b<\/p>\n<p>Source: https:\/\/github.com\/Bzaid94\/gemma-agents-orchestrator.git \u00b7 License: Apache 2.0<\/p>\n<p><br \/>\n<br \/><a href=\"https:\/\/dev.to\/bzaid94\/gemma-agents-orchestrator-8cm\">Source link <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Gemma 4 Challenge: Build With Gemma 4 Submission This is a submission for the Gemma 4 Challenge: Build with Gemma 4 What I Built GemmaOrch is a skill-based AI agent orchestrator: you define what an agent knows by dropping Markdown files into a folder, assign those skills to a named agent, and chat with it. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2880,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[676],"tags":[761,765,921,762,763,923,922,764,760],"class_list":["post-2879","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tech-ai","tag-coding","tag-community","tag-devchallenge","tag-development","tag-engineering","tag-gemma","tag-gemmachallenge","tag-inclusive","tag-software"],"_links":{"self":[{"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/posts\/2879","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=2879"}],"version-history":[{"count":0,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/posts\/2879\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=\/wp\/v2\/media\/2880"}],"wp:attachment":[{"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=2879"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=2879"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/daiilynews.cu.ma\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=2879"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}