llm – Page 3 – DAILY NEWS

TECH & AI

Building a Terminal AI Agent (v9)

jackminion May 24, 2026 0

Building AI agents that operate directly in the terminal offers huge productivity gains for developers. In this guide, we will provide hands-on, hands-on instructions on how to build a custom CLI AI agent based on a local LLM. 1. Analysis of CLI AI Agent Ecosystem Currently, there are several solutions in the CLI AI agent market: Key tools: Aider: GitHub Copilot-based, real-time code modification function Continue.dev: VSCode-based, handle complex tasks OpenCode: Open source, simple coding help Custom Scripts: Customize with your own script Current problems: Most tools rely on cloud API Poor performance when running locally Lack of complex tooling features Cost issues 2. Setting up local LLM API endpoint locally To run LLM, follow these steps: 1. Install LM Studio: # macOS brew install lm-studio # or download directly wget https://github.com/lmstudio-ai/LMStudio/releases/latest/download/LMStudio-MacOS.dmg Enter fullscreen mode Exit fullscreen mode 2. Run local model: # Download model and run lm-studio –model “Nous-Hermes-2-Mistral-7B-DPO.Q4_K_M.gguf” Enter fullscreen mode Exit fullscreen mode 3. API server settings: # Install Ollama curl -fsSL https://ollama.com/install.sh | sh # Download model ollama pull mistral # Run API server ollama serve Enter fullscreen mode Exit fullscreen mode 3. Building a simple Python CLI agent Now let’s create a basic CLI agent: # ai_agent.py import os import json import subprocess from typing import Dict, List, Any import openai class TerminalAIAgent: def __init__(self, model=”ollama/mistral”): self.model = model self.client = openai.OpenAI( base_url=”http://localhost:11434/v1″, api_key=”ollama” ) self.conversation_history = () def run_command(self, command: str) -> str: “””Execute the command and return the result””” try: result = subprocess.run( command, shell=True, capture_output=True, text=True, timeout=30 ) return result.stdout + result.stderr except Exception as e: return f”Error: {str(e)}” def get_context(self) -> str: “””Collect current working directory information””” pwd = os.getcwd() files = os.listdir(pwd) return f”Working directory: {pwd}\nFiles: {‘, ‘.join(files(:10))}” def chat(self, user_input: str) -> str: “””Conversation with AI””” self.conversation_history.append({ “role”: “user”, “content”: user_input }) # System prompt system_prompt = “”” You are a helpful AI assistant that helps developers with coding tasks. Your responses should be concise and actionable. You can execute shell commands and modify files. “”” messages = ( {“role”: “system”, “content”: system_prompt}, {“role”: “user”, “content”: f”Context: {self.get_context()}”}, ) + self.conversation_history response = self.client.chat.completions.create( model=self.model, messages=messages, temperature=0.3 ) ai_response = response.choices(0).message.content self.conversation_history.append({ “role”: “assistant”, “content”: ai_response }) return ai_response # Usage if __name__ == “__main__”: agent = TerminalAIAgent() print(“Start AI Agent (type ‘exit’ to exit)”) while True: user_input = input(“\n> “) if user_input.lower() == ‘exit’: break response = agent.chat(user_input) print(response) Enter fullscreen mode Exit fullscreen mode 4. Integrate with tmux Integrate with terminal multiplexers to improve workflow: # Create a tmux session tmux new-session -d -s ai_agent # Run the agent within a session tmux send-keys -t ai_agent “python ai_agent.py” Enter Enter fullscreen mode Exit fullscreen mode tmux script: # tmux_ai.sh #!/bin/bash SESSION=”ai_agent” # Check whether session exists if ! tmux has-session -t $SESSION 2>/dev/null; then tmux new-session -d -s $SESSION tmux send-keys -t $SESSION “python ai_agent.py” Enter fi tmux attach -t $SESSION Enter fullscreen mode Exit fullscreen mode 5. Custom tool development Code search tool: # tools/code_search.py import os import re from typing import List, Dict class CodeSearchTool: def __init__(self, root_dir: str = “.”): self.root_dir = root_dir def search_in_files(self, pattern: str, file_extensions: List(str) = None) -> List(Dict): “””Search for a pattern within a file””” results = () if file_extensions is None: file_extensions = (‘.py’, ‘.js’, ‘.ts’, ‘.java’, ‘.cpp’) for root, dirs, files in os.walk(self.root_dir): for file in files: if any(file.endswith(ext) for ext in file_extensions): file_path = os.path.join(root, file) try: with open(file_path, ‘r’, encoding=’utf-8′) as f: content = f.read() matches = re.finditer(pattern, content) for match in matches: results.append({ ‘file’: file_path, ‘line’: content(:match.start()).count(‘\n’) + 1, ‘context’: self.get_context(content, match.start()) }) except Exception: continue return results def get_context(self, content: str, position: int, context_lines: int = 3) -> str: “””Context Extraction””” lines = content.split(‘\n’) line_num = content(:position).count(‘\n’) start = max(0, line_num – context_lines) end = min(len(lines), line_num + context_lines + 1) return ‘\n’.join(lines(start:end)) # Usage example search_tool = CodeSearchTool() results = search_tool.search_in_files(r”def\s+(\w+)\s*\(“) print(json.dumps(results, indent=2)) Enter fullscreen mode Exit fullscreen mode Git tools: # tools/git_tool.py import subprocess import json from typing import Dict, List class GitTool: def get_status(self) -> Dict: “””Check Git status””” try: result = subprocess.run((‘git’, ‘status’, ‘–porcelain’), capture_output=True, text=True) return { ‘status’: result.stdout.strip(), ‘has_changes’: bool(result.stdout.strip()) } except Exception as e: return {‘error’: str(e)} def get_branch_info(self) -> Dict: “””Branch information””” try: branch = subprocess.run((‘git’, ‘branch’, ‘–show-current’), capture_output=True, text=True).stdout.strip() return {‘branch’: branch} except Exception as e: return {‘error’: str(e)} def commit_changes(self, message: str) -> Dict: “””Commit changes””” try: subprocess.run((‘git’, ‘add’, ‘.’), capture_output=True) result = subprocess.run((‘git’, ‘commit’, ‘-m’, message), capture_output=True, text=True) return { ‘success’: True, ‘output’: result.stdout, ‘error’: result.stderr } except Exception as e: return {‘success’: False, ‘error’: str(e)} # Usage example git_tool = GitTool() status = git_tool.get_status() print(json.dumps(status, indent=2)) Enter fullscreen mode Exit fullscreen mode 6. Context window management Context management for handling large code bases: python # context_manager.py import os import hashlib from typing import List, Dict, Set class ContextManager: def __init__(self, max_tokens: int = 8000): self.max_tokens = max_tokens self.token_cache = {} def calculate_tokens(self, text: str) — 📥 **Get the full guide on Gumroad**: https://gumroad.com/l/auto ($5) Enter fullscreen mode Exit fullscreen mode

Source link

TECH & AI

How I built a 6-node 12-GPU on-prem AI cluster running 1000+ agents

jackminion May 20, 2026 0

TL;DR — 6 machines, 12 GPUs, 1,000+ concurrent agents, P95 18 ms, voice

Why I built this

I’m Franck. Toulouse, France. Over 3 years I paid roughly €280,000 to Azure + OpenAI before doing the math properly:

Latency: 1.2s voice round-trip — incompatible with the voice-first UX I wanted.

Compliance: customer data on US servers. Not GDPR-native, just GDPR-compliant-on-paper.

Quotas: random throttling at the worst times.

Lock-in: Azure outage = my product offline.

I decided to rebuild everything on-prem. This is the result.

The cluster

6 machines, 3 tiers, 12 GPUs total,

Tier 1 — GPU compute (heavy inference)

M1 “La Créatrice” — Ryzen 5700X3D, 6× RTX 3080+, 46 GB RAM. Primary LLM node, runs qwen3.5-9b, qwen3.5-35b-a3b, deepseek-r1, the Claude 4.5/4.6 distillations, and the Whisper CUDA pipeline.

M2 “Le Forge” — multi-GPU NVIDIA, secondary inference, failover from M1 in 1.3s.

Tier 2 — CPU/RAM (orchestration, memory)

M3 “Le Cerveau” — high-RAM CPU node. PostgreSQL + Redis + Pinecone. Runs the orchestrator, the 3-quorum consensus engine (M1+M2+M3), and the analytics/monitoring agents.

Tier 3 — production / work

M4 “Bridge Windows” — Windows 11, 2 GPUs, trading bot live.

M5 “Interface Relay” — Linux i5-6500, 15 GB RAM. Dev interface, 15+ MCP servers, Claude Code.

M6 “Mobile Ops” — laptop. SSH + VPN. Client demos and on-site ops.

The 9 layers I added on top of Ubuntu

L9 — Vocal / conversational (Whisper CUDA STT, Piper TTS, wake word, 50+ languages)
L8 — Multi-agent orchestration (MCP-native, consensus engine)
L7 — Trading consensus engine (multi-model voting GPT/Gemini/Claude)
L6 — Browser + web automation (Chrome DevTools Protocol)
L5 — MCP tool registry (88+ handlers)
L4 — GPU cluster management (Docker Swarm, failover
L3 — Domino pipeline engine (835 chains)
L2 — systemd service layer (98 units)
L1 — Linux boot integration (GRUB hooks, ZRAM, kernel params)

Real numbers

Metric
Value

Concurrent agents
1,000+

P95 latency (cluster internal)
18 ms

Voice pipeline end-to-end

Aggregate throughput
67 tok/s

Python lines
280,741

Public repos
44 (all MIT)

Cost comparison (1M tokens/day, team of 10)

Provider
€/month
P95
Concurrent agents
Data residency

Azure OpenAI
1,500
800ms-3s
~20
US

AWS Bedrock
1,800
700ms-2.5s
~15
US

Mistral Cloud
800
400-800ms
~30
EU

JARVIS OS
0
18 ms
1,000+
Air-gapped

For a 50K€ turn-key deployment, break-even vs Azure is 7 months, and the marginal cost is zero after that.

What I sell now

JARVIS OS turn-key — 20K€ to 250K€ depending on scope.

62 PDF trainings — from €39, 293h of content based on production code (+48 private).

IA infra audit — €1,500, report in 48h.

1-to-1 mentorship — €250/h.

Fractional CTO — TJM €1,000-1,150 / CDI €85-95K. Toulouse / remote.

Honest weaknesses

Consensus voting is empirical. No formal verification of the agreement function.

Tier-2 failure (M3 down) is the weakest scenario — orchestrator dies, cluster keeps inferring but loses persistent memory.

MCP protocol bet — if Anthropic deprecates parts of MCP, I have 88 handlers to refactor.

kWh-per-token efficiency — cloud probably wins on aggregate watts/token, on-prem wins on marginal cost.

Links

Source link