DAILY NEWS

Stay Ahead, Stay Informed – Every Day

Advertisement
Gartner Says 40% of AI Agents Will Be Decommissioned by 2027. The Kill Switch Is Why.



Gartner predicts that by 2027, 40% of enterprises will demote or decommission autonomous AI agents due to governance gaps identified only after production incidents occur.

The instinct when something goes wrong: kill it. Revoke access. Freeze the wallet. Shut it down.

Cerbos published the counter-argument that CISOs are now adopting: “Allow or revoke. Deploy or kill. That works in a lab. It does not work in a hospital, a bank, a payments network, or any environment where the agent is doing something a human used to do, and stopping it instantly creates a different incident than the one you were trying to prevent.”

The kill switch creates a second incident. The industry needs a dimmer switch.

Why Binary Stop Creates Cascading Failure

An AI agent processing payments is not a standalone program. It is embedded in a workflow. Other agents depend on its outputs. Downstream systems expect its responses. Customers are mid-transaction.

# What happens when you kill an agent mid-workflow:

# Agent: procurement_bot (handles vendor payments)
# Status: anomaly detected (unusual vendor, high amount)
# Instinct: KILL IT

kill_switch_consequences = {
“in_flight_transactions”: 12, # Now orphaned
“downstream_agents_waiting”: 3, # Will timeout and retry
“vendor_expectations”: 4, # Payments promised, never delivered
“reconciliation_gap”: “$14,200”, # Money left in limbo
“sla_violations”: 2, # Customer-facing deadlines missed
“recovery_time”: “4-8 hours”, # Manual intervention required
“second_incident_severity”: “P2” # The kill caused its own incident
}

# The kill switch “solved” a suspicious $800 transaction
# But created $14,200 in orphaned transactions + 2 SLA violations
# Net result: worse than the original anomaly

Enter fullscreen mode

Exit fullscreen mode

mintmcp documented the gap: “Most organizations can monitor what their AI agents are doing but the majority cannot stop them when something goes wrong.” The organizations that CAN stop them discover that stopping creates its own damage.

The Dimmer Switch Pattern

Instead of binary on/off, production agent governance needs graduated response:

from rosud_pay import Governance, DimmerSwitch

# Production-grade agent control (not binary kill):
governance = Governance.configure(
agent=”procurement_bot”,
control=DimmerSwitch(
# Level 5: Full autonomy (normal operation)
level_5={
“daily_limit”: 5000,
“per_tx_max”: 1000,
“categories”: “all_authorized”,
“approval_required”: False
},

# Level 4: Reduced autonomy (first sign of anomaly)
level_4={
“daily_limit”: 2000, # Reduced
“per_tx_max”: 500, # Reduced
“categories”: “existing_vendors_only”,
“approval_required”: False,
“trigger”: “anomaly_score > 0.3”
},

# Level 3: Supervised (confirmed anomaly)
level_3={
“daily_limit”: 500,
“per_tx_max”: 100,
“categories”: “pre_approved_list”,
“approval_required”: “above_50”, # Human approves > $50
“trigger”: “anomaly_score > 0.6”
},

# Level 2: Restricted (investigation active)
level_2={
“daily_limit”: 0, # No new spending
“existing_commitments”: “honor”, # Finish in-flight
“approval_required”: “all”,
“trigger”: “security_team_escalation”
},

# Level 1: Frozen (confirmed breach)
level_1={
“all_transactions”: “blocked”,
“in_flight”: “graceful_complete_or_refund”,
“notification”: “all_downstream_agents”,
“trigger”: “confirmed_compromise”
}
)
)

# Result: anomaly detected → Level 5 to Level 4 in 50ms
# No orphaned transactions. No SLA violations. No second incident.
# Investigation proceeds while agent continues at reduced capacity.
# If confirmed malicious: gradual freeze, not instant kill.

Enter fullscreen mode

Exit fullscreen mode

The 40% Decommission Problem

Gartner’s 40% prediction is not about agent capability. It is about governance response. When the only response to a production incident is “turn it off,” organizations conclude the agent is too risky to operate.

builtin documented the pattern: enterprises now treat AI agents as first-class identities requiring JIT (just-in-time) access and instant kill switches. But the kill switch alone is insufficient. What they actually need:

# What enterprises discover after decommissioning agents:

decommission_reasons = {
“governance_gap_discovered_after_incident”: 0.65, # 65%
“no_graduated_response_available”: 0.52, # 52%
“kill_switch_caused_secondary_damage”: 0.38, # 38%
“could_not_prove_agent_was_safe_to_restart”: 0.44, # 44%
“audit_trail_insufficient_for_root_cause”: 0.41 # 41%
}

# The path from “decommission” to “keep running safely”:
from rosud_pay import AgentLifecycle

lifecycle = AgentLifecycle.configure(
agent=”procurement_bot”,
governance={
# Graduated response (not binary)
“response_levels”: 5,
“auto_escalation”: True,
“auto_de_escalation”: True, # Return to normal after resolution

# Prove safety for restart
“restart_criteria”: {
“root_cause_identified”: True,
“fix_deployed”: True,
“governance_gap_closed”: True,
“audit_trail_complete”: True
},

# Continuous governance (not point-in-time)
“monitoring”: “real_time”,
“anomaly_detection”: “behavioral_baseline”,
“budget_enforcement”: “per_transaction”,

# The key differentiator: DIMMER, not SWITCH
“on_anomaly”: “reduce_autonomy”, # Not “kill”
“on_resolution”: “restore_autonomy” # Automated recovery
}
)

Enter fullscreen mode

Exit fullscreen mode

The Business Case for Graduated Control

lumenova documented the shift: AI governance maturity is now treated like a credit rating. Institutional clients demand proof of model lineage, hallucination rates, and governance capabilities before granting mandates.

The organizations that decommission agents lose the investment. The organizations with graduated control keep agents running safely through incidents:

Incident detected: reduce autonomy (not kill)
Investigation proceeds: agent continues at restricted level
Root cause found: fix deployed, autonomy restored
No second incident. No orphaned transactions. No SLA violations.
Agent stays in production. Investment preserved.

The Bottom Line

The kill switch is the reason 40% of agents will be decommissioned. Not because agents are dangerous. Because the only response to danger is destruction. That is not governance. That is giving up.

rosud-pay provides the dimmer switch for agent spending. Five levels of graduated response. Automatic escalation on anomaly detection. Automatic de-escalation on resolution. In-flight transaction protection. Zero orphaned payments. Zero secondary incidents.

Keep your agents running safely through incidents. Do not kill them and call it governance.

Implement graduated agent control: rosud.com/docs



Source link

My Side Project Security Audit Results — I’m Embarrassed to Share



I recently did a security audit of all the side projects I’m running. FastAPI backend, Telegram bot, PWA, Streamlit app and many more. I thought, “I made it with some care, so it’ll be okay.” Wrong. We honestly share each problem we found, why we made it that way, and how we fixed it. This is not a theoretical checklist, but rather bugs that I have actually deployed to production. 1. Authentication bypass due to empty secret (Critical) My code _API_SECRET = os.environ.get(“https://dev.to/justjinoit/API_SECRET_KEY”https://dev.to/justjinoit/, ‘”https://dev.to/justjinoit/) def verify_api_key(x_api_key: str = Header(default=”https://dev.to/justjinoit/)): if _API_SECRET and x_api_key != _API_SECRET: # ← Bug here raise HTTPException(status_code=401) Enter fullscreen mode Exit fullscreen mode if _API_SECRET and … Let’s look at the conditions. If there is no API_SECRET_KEY environment variable on the server, _API_SECRET becomes an empty string — falsy — and the entire condition is skipped. All requests pass as if authenticated. Why was it designed like this? I tried to “handle it gracefully” so that the server would not crash even if environment variables were not set during local development. The problem is that the “elegant processing” made it to production, and the moment you didn’t set API_SECRET_KEY on the server, the entire API was opened. How to modify _API_SECRET = os.environ.get(“https://dev.to/justjinoit/API_SECRET_KEY”https://dev.to/justjinoit/, ‘”https://dev.to/justjinoit/) def verify_api_key(x_api_key: str = Header(default=””https://dev.to/justjinoit/)): if not _API_SECRET: raise HTTPException(status_code=500, detail=”https://dev.to/justjinoit/API_SECRET_KEY not configured”https://dev.to/justjinoit/) if not secrets.compare_digest(x_api_key, _API_SECRET): raise HTTPException(status_code=401, detail=”https://dev.to/justjinoit/Unauthorized”https://dev.to/justjinoit/) Enter fullscreen mode Exit fullscreen mode No secret = 500 error, not open access. Secrets.compare_digest() is also applied to prevent timing attacks. Lesson: Don’t make authentication conditional on whether a secret is set or not. Missing settings should be a hard failure, not an open. 2. Secret committed in the Git history (Critical) Although it is not in the current code, the API key that I committed for a “quick test” a few months ago was still in the Git history. # How to check git log –all -p | grep -E “sk-ant-api03-(A-Za-z0-9_-){20,}” git log –all -p | grep -E “AIzaSy(A-Za-z0-9){20,}” Enter fullscreen mode Exit fullscreen mode Why does this happen? To test quickly in the beginning, hardcode the key and commit. I think I “fixed it” by later moving it to .env. But git remembers every commit forever. When a repo is released or a team member joins, anyone can retrieve keys from past commits. How to fix # Remove a specific file from the entire history pip install git-filter-repo git-filter-repo –path .env –invert-paths –force git push –force-with-lease origin main Enter fullscreen mode Exit fullscreen mode And the exposed key is immediately discarded + reissued. Cleaning up the git history does not cancel exposure that has already occurred. Lesson learned: Keys that have been committed to git even once are assumed to have already been stolen and reissued. 3. Debug endpoint production deployment (High) This endpoint was deployed on the production server: @app.get(“https://dev.to/justjinoit//debug/config”https://dev.to/justjinoit/) async def debug_config(): return { “https://dev.to/justjinoit/supabase_url”https://dev.to/justjinoit/: settings.supabase_url, “https://dev.to/justjinoit/environment”https://dev.to/justjinoit/: settings.env, “https://dev.to/justjinoit/connected_services”https://dev.to/justjinoit/: (…) } Enter fullscreen mode Exit fullscreen mode Why does this happen? Debug endpoints are really convenient during development. After solving the blockage, I forget to erase it. Since there is no error, no one tells you. How to edit: Delete. If runtime debugging is necessary, place it after authentication or write logs. # Pre-deployment check grep -rn ‘@app.get.*debug\|@app.post.*debug’ app/ Enter fullscreen mode Exit fullscreen mode Lesson: Add “Check removal of debug endpoints” to the deployment checklist. Otherwise, it’s better not to make it in the first place. 4. Internal information exposed as an error message (High) # My code except Exception as e: return JSONResponse({“https://dev.to/justjinoit/error”https://dev.to/justjinoit/: str(e)}, status_code=500) Enter fullscreen mode Exit fullscreen mode If you do this, this message will be sent to the client: FATAL: password authentication failed for user “postgres” (Errno 2) No such file or directory: ‘/home/ubuntu/app/config.json’ Module ‘xyz’ version 1.2.3 has no attribute ‘connect’ An attacker can use this information to determine the infrastructure structure, libraries in use, and known vulnerabilities by version. Why was it designed like this? This is also for development convenience. It is convenient when testing because you can immediately see the cause of the error with just str(e). The problem was that there was no layer between the internal error and the HTTP response. How to fix import logging logger = logging.getLogger(__name__) except Exception as e: logger.error(f”https://dev.to/justjinoit/Error: {e}”https://dev.to/justjinoit/, exc_info=True) # Only in server log return JSONResponse({“https://dev.to/justjinoit/error”https://dev.to/justjinoit/: “https://dev.to/justjinoit/internal server error”https://dev.to/justjinoit/}, status_code=500) Enter fullscreen mode Exit fullscreen mode Gives everything to the log and nothing to the HTTP response. Lesson: Server logs are for me, HTTP error responses are for the client. These two must be completely separated. 5. XSS (High) front-end code with innerHTML without escaping: articles.forEach(article => { container.innerHTML += ` ${article.title} ${article.summary} ${article.url}”>More `; }); Enter fullscreen mode Exit fullscreen mode When the same title is entered into the DB, it is executed in all users’ browsers. Why does this happen? It is because template literals feel like string formatting. When you use ${article.title}, it doesn’t feel like you’re rendering HTML. However, the browser parses the HTML there and executes it. “https://dev.to/justjinoit/<"https://dev.to/justjinoit/) .replace(/>/g, “https://dev.to/justjinoit/>”https://dev.to/justjinoit/) .replace(/”/g, “https://dev.to/justjinoit/””https://dev.to/justjinoit/); const safeUrl = u => /^https?:\/\//.test(u || ‘”https://dev.to/justjinoit/) ? u: “https://dev.to/justjinoit/#”https://dev.to/justjinoit/; container.innerHTML += ` ${esc(article.title)} ${esc(article.summary)} ${safeUrl(article.url)}” rel=”noopener noreferrer”>More `; Enter fullscreen mode Exit fullscreen mode Lesson learned: Every time you use innerHTML, you mentally read “I’m executing arbitrary code.” Then it’s difficult to miss the escape. 6. Rate limit on AI endpoints None (High) @app.post(“https://dev.to/justjinoit//analyze”https://dev.to/justjinoit/) async def analyze(item: Item, _: None = Depends(verify_api_key)): result = await ai_client.messages.create(…) # Cost per call return result Enter fullscreen mode Exit fullscreen mode Rate limit None. If you make infinite calls, you will be charged a lot of money in an instant. @limiter.limit(“https://dev.to/justjinoit/10/minute”https://dev.to/justjinoit/) async def analyze(request: Request, item: Item, _: None = Depends(verify_api_key)): … Enter fullscreen mode Exit fullscreen mode Lesson: Authentication prevents unauthorized access. Rate limits prevent authorized but abusive access. 7. CORS is needed in production. Wildcard (Medium) app.add_middleware( CORSMiddleware, allow_origins=(“https://dev.to/justjinoit/*”https://dev.to/justjinoit/), # Allow all sources… ) Enter fullscreen mode Exit fullscreen mode Why is it dangerous even if there is an API key? CORS is a browser-level firewall. If the API key is in the front-end JavaScript, an API call using that key can be made in the user’s browser through an XSS vulnerability on another site. Possible modification: import os ALLOWED_ORIGINS = os.environ.get(“https://dev.to/justjinoit/ALLOWED_ORIGINS”https://dev.to/justjinoit/, “https://dev.to/justjinoit/*”https://dev.to/justjinoit/).split(“https://dev.to/justjinoit/,”https://dev.to/justjinoit/) app.add_middleware( CORSMiddleware, allow_origins=ALLOWED_ORIGINS, allow_methods=(“https://dev.to/justjinoit/GET”https://dev.to/justjinoit/, “https://dev.to/justjinoit/POST”https://dev.to/justjinoit/), allow_headers=(“https://dev.to/justjinoit/X-API-Key”https://dev.to/justjinoit/, “https://dev.to/justjinoit/Content-Type”https://dev.to/justjinoit/), ) Enter fullscreen mode Exit fullscreen mode # production .env ALLOWED_ORIGINS=https://myapp.vercel.app Enter fullscreen mode Exit fullscreen mode Lesson: allow_origins=(“*”) is for local development only. Never distribute. 8. Do not delete temporary files (Medium) with tempfile.NamedTemporaryFile(suffix=”https://dev.to/justjinoit/.xlsx”https://dev.to/justjinoit/, delete=False) as tmp: tmp.write(uploaded_file.read()) tmp_path = tmp.name process_file(tmp_path) # If an exception occurs here, the temporary file will remain forever Enter fullscreen mode Exit fullscreen mode An exception occurs in process_file() When exploded, temporary files are not deleted from the long-term operating server, and if the file contains user-sensitive data, it remains on the disk. How to fix tmp_path = None with tempfile.NamedTemporaryFile(suffix=”https://dev.to/justjinoit/.xlsx”https://dev.to/justjinoit/, delete=False) as tmp: tmp.write(uploaded_file.read()) tmp_path = tmp.name try: process_file(tmp_path) finally: if tmp_path and os.path.exists(tmp_path): os.unlink(tmp_path) Enter fullscreen mode Exit fullscreen mode Lesson: The code path that creates the file is also responsible for deletion. Finally is always executed even if there is an exception. After this audit, I created a checklist that is enforced on all projects: Before writing code: ( ) Create .gitignore (.env, *.key, sessions/, credentials.json) ( ) Create .env.example (template without actual values) All endpoints: ( ) Add authentication (500 error, not bypass if no secret) ( ) Error response is generic message only (str(e) prohibited) ( ) Rate limit on AI/cost-generating endpoints Frontend: ( ) Escape for all uses of innerHTML ( ) Verify that URL starts with https:// ( ) For external links rel=”noopener noreferrer” Before commit: git diff –cached | This is because it was always treated as a separate step after feature development. Add a TODO comment: “Let’s clean it up later.” Deploying “temporary” code as is. The only solution I’ve found is to insert security checks into natural timings: before commit, before deployment, and the cost of fixing the bugs themselves is tedious. If you are running a backend, we recommend checking the above pattern yourself if _SECRET and key != _SECRET Authentication bypass is much more common than you think.



Source link

Bluetooth Channel Sounding: precise BLE ranging for embedded IoT



Most BLE proximity features start with RSSI. That is fine when the product only needs a rough “near or far” signal.

It becomes fragile when distance affects security, access control, asset tracking or industrial behavior.

Bluetooth Channel Sounding changes that by adding a standardized ranging capability to Bluetooth LE. Instead of relying only on received signal strength, two compatible devices exchange radio measurements that can be used to estimate distance more reliably.

Why RSSI is not enough

RSSI is easy to read, but it is not a stable distance sensor.

The value changes with antenna orientation, enclosure design, reflections, the user’s body, walls, metallic objects, interference and multipath. Two devices at the same physical distance can report very different RSSI values.

That is acceptable for simple beacons. It is not ideal for:

Digital keys and secure access
Indoor asset tracking
Smart proximity features
Find-my devices
Industrial maintenance workflows
Distance-aware IoT products

What Channel Sounding adds

Bluetooth Channel Sounding was introduced with Bluetooth Core 6.0 and refined further in Bluetooth Core 6.3.

A procedure involves two roles:

Initiator: starts the measurement

Reflector: responds to the measurement sequence

The devices exchange signals across multiple Bluetooth LE channels. The system can then estimate distance using methods such as:

Phase-Based Ranging, based on phase changes across frequencies

Round-Trip Timing, based on signal travel time between devices

The useful part is that this is not a proprietary trick layered on top of BLE. It is part of the Bluetooth specification, which matters for interoperability and long-term product design.

Where it fits

Channel Sounding is interesting when distance becomes part of the product logic.

For example:

A smart lock should know whether the authorized phone is really close to the door.
An industrial cabinet may allow access only when the technician is physically present.
A warehouse gateway may estimate how close a tag is to an anchor.
A wearable or tracker can guide the user with more useful distance feedback.
A machine can enable local configuration only when the operator is nearby.

That is different from “I can hear a BLE device somewhere nearby”. The product is now asking “how close is it, and can I trust that measurement enough to act on it?”

Architecture impact

A Channel Sounding product is not just a firmware flag. The whole embedded architecture is involved.

Area
What to verify

SoC
Real Channel Sounding support, not only generic BLE support

Bluetooth stack
Initiator, Reflector, HCI and SDK support

RF design
Antenna, layout, enclosure, ground plane and multipath behavior

Algorithm
Filtering, calibration, outlier handling and acceptance thresholds

Firmware
States, timeouts, fallback behavior and diagnostics

Security
Pairing, identity, secure boot, signed OTA and debug policy

Validation
Lab and field tests with motion, obstacles, angles and interference

This is where many projects get surprised. Ranging performance is not only about the Bluetooth version printed on a datasheet. It depends on the radio, stack, antenna, enclosure, firmware and test process working together.

Channel Sounding vs RSSI, AoA/AoD and UWB

RSSI is still useful for simple presence and low-cost beacon behavior.

AoA/AoD can be useful for localization systems that can afford antenna arrays and infrastructure.

UWB remains excellent for high-precision ranging and advanced digital-key systems, but it adds hardware, power and integration cost.

Bluetooth Channel Sounding sits in an interesting middle ground: more distance-aware than RSSI, inside the BLE ecosystem, and potentially simpler than adding a separate UWB path in products that already depend on Bluetooth.

Practical checklist

Before choosing Channel Sounding, I would check:

( ) Does the product really need distance, or is generic proximity enough?
( ) Have the RSSI failure modes been measured in the real environment?
( ) Is UWB, AoA/AoD or GNSS a better fit for the accuracy target?
( ) Does the selected SoC, controller, host stack and SDK actually support Channel Sounding?
( ) Is the antenna strategy compatible with the enclosure and installation conditions?
( ) How often will the product measure distance, and what is the battery impact?
( ) What happens when the measurement is uncertain?
( ) Are secure boot, signed OTA and debug-access policy already part of the architecture?
( ) Can field logs explain why the device trusted or rejected a distance measurement?
( ) Has the validation plan included multipath, movement, obstacles and edge cases?

Final takeaway

Bluetooth Channel Sounding is not simply “better RSSI”.

It gives embedded teams a standardized way to make Bluetooth LE devices more distance-aware. That can unlock better secure access, asset tracking, smart proximity and industrial workflows.

But it has to be designed as a system feature: hardware, RF, stack, firmware, security, power budget and validation all matter.

When those pieces are handled early, Channel Sounding can turn Bluetooth from a connectivity feature into a useful distance and trust signal for embedded IoT products.

Canonical source: Bluetooth Channel Sounding: precise and secure distance measurement for embedded IoT

Silicon LogiX helps teams design embedded, firmware and IoT architectures when prototypes need to become maintainable products.



Source link