DAILY NEWS

Stay Ahead, Stay Informed – Every Day

Advertisement
Learning Linux from Scratch: My First Week in DevOps


Hello everyone! 👋

Welcome to Week 1 of my DevOps learning journey.

As a final-year BCA student specializing in Cloud Computing, I have decided to document my journey toward becoming a DevOps Engineer. This blog series will cover everything I learn, from Linux fundamentals to cloud computing, containers, CI/CD, Kubernetes, and monitoring tools.

Since Linux is the backbone of most DevOps environments, I decided to start my journey by learning Linux fundamentals and essential commands.

Why Linux Matters in DevOps

Most servers in cloud environments run Linux. Whether you’re working with AWS EC2 instances, Docker containers, Kubernetes clusters, or CI/CD pipelines, Linux knowledge is essential.

As a DevOps Engineer, you’ll frequently:

Manage Linux servers
Deploy applications
Troubleshoot issues
Monitor system performance
Automate tasks using shell scripts

That’s why Linux is often considered the first skill every DevOps engineer should master.

What I Learned This Week

Understanding the Linux File System

One of the first things I learned was how Linux organizes files and directories.

Some important directories include:

/ – Root directory

/home – User home directories

/etc – Configuration files

/var – Logs and variable data

/tmp – Temporary files

/usr – User programs and utilities

Understanding the file system structure helps navigate servers more effectively.

Basic Navigation Commands

I practiced several commands used daily by Linux administrators.

Check Current Directory

pwd

Enter fullscreen mode

Exit fullscreen mode

List Files and Directories

ls
ls -l
ls -la

Enter fullscreen mode

Exit fullscreen mode

Change Directory

cd /home

Enter fullscreen mode

Exit fullscreen mode

Create Directory

mkdir project

Enter fullscreen mode

Exit fullscreen mode

Remove Directory

rmdir project

Enter fullscreen mode

Exit fullscreen mode

File Management Commands

Working with files is a common task in Linux.

Create a File

touch file.txt

Enter fullscreen mode

Exit fullscreen mode

Copy Files

cp file.txt backup.txt

Enter fullscreen mode

Exit fullscreen mode

Move Files

mv file.txt documents/

Enter fullscreen mode

Exit fullscreen mode

Delete Files

rm file.txt

Enter fullscreen mode

Exit fullscreen mode

View File Content

cat file.txt

Enter fullscreen mode

Exit fullscreen mode

Viewing System Information

I also learned how to check system details.

Check CPU Information

lscpu

Enter fullscreen mode

Exit fullscreen mode

Check Memory Usage

free -h

Enter fullscreen mode

Exit fullscreen mode

Check Disk Usage

df -h

Enter fullscreen mode

Exit fullscreen mode

Check Running Processes

ps -ef

Enter fullscreen mode

Exit fullscreen mode

Monitor Processes in Real Time

top

Enter fullscreen mode

Exit fullscreen mode

Understanding Permissions

Linux uses permissions to control access to files and directories.

I learned about:

Read (r)
Write (w)
Execute (x)

Viewing permissions:

ls -l

Enter fullscreen mode

Exit fullscreen mode

Changing permissions:

chmod 755 script.sh

Enter fullscreen mode

Exit fullscreen mode

Changing ownership:

chown user:user file.txt

Enter fullscreen mode

Exit fullscreen mode

Permissions are critical for maintaining system security.

Service Management

One interesting topic was managing services using systemctl.

Check Service Status

systemctl status nginx

Enter fullscreen mode

Exit fullscreen mode

Start a Service

systemctl start nginx

Enter fullscreen mode

Exit fullscreen mode

Stop a Service

systemctl stop nginx

Enter fullscreen mode

Exit fullscreen mode

Restart a Service

systemctl restart nginx

Enter fullscreen mode

Exit fullscreen mode

This is especially useful when managing web servers and applications.

Viewing Logs

Logs help identify issues and troubleshoot systems.

tail -f /var/log/messages

Enter fullscreen mode

Exit fullscreen mode

journalctl -xe

Enter fullscreen mode

Exit fullscreen mode

Understanding logs is one of the most important skills for troubleshooting production systems.

My First Bash Script

I wrote a simple script to check whether Nginx is running.

#!/bin/bash

if systemctl is-active –quiet nginx
then
echo “Nginx is running”
else
echo “Nginx is not running”
fi

Enter fullscreen mode

Exit fullscreen mode

This helped me understand:

if-else conditions
shell scripting basics
automation concepts

It was my first step toward infrastructure automation.

Challenges I Faced

During my learning, I encountered a few challenges:

Understanding Linux permissions
Navigating directories quickly
Reading system logs
Writing Bash scripts correctly

After practicing commands repeatedly and experimenting on AWS EC2 instances, these concepts became much clearer.

Key Takeaways

This week taught me that Linux is much more than just a command-line operating system.

I learned:

✅ Linux file system structure

✅ Essential Linux commands

✅ File and directory management

✅ System monitoring basics

✅ Service management

✅ Log analysis

✅ Bash scripting fundamentals

Most importantly, I realized that strong Linux fundamentals make learning DevOps tools much easier.

What’s Next?

In Week 2, I plan to learn:

Advanced Linux Commands
User and Group Management
Networking Basics
SSH and Remote Access
Package Management
More Bash Scripting

Final Thoughts

Every DevOps Engineer starts somewhere, and Linux is the perfect place to begin.

This week gave me a solid foundation and increased my confidence in working with servers and cloud environments. I’m excited to continue learning and sharing my progress through this blog series.

If you’re also starting your DevOps journey, feel free to connect and share your experiences.

See you in Week 2! 🚀



Source link

You Can’t Govern the AI You Can’t See



AI governance starts with visibility: a policy, a budget, or a guardrail can only act on the AI traffic a team can actually see. This guide explains why so much AI use stays out of IT’s view, why that gap stops governance before it starts, and how the Bifrost AI gateway and Bifrost Edge close it by making endpoint AI both visible and governable.

Every AI governance control an organization owns, from budgets and access rules to guardrails and audit trails, can only act on the AI traffic it can actually see. That ability to see what AI is running and what it is sending, often called AI visibility, is the precondition for everything else. The trouble is that most AI used at work now runs on the endpoint, inside desktop apps, browser tabs, and coding agents that reach a model provider directly, so the activity never reaches the systems security teams watch. A request that leaves a laptop for a third-party model without crossing a monitored path is, for governance purposes, a request that did not happen. The gap is wide, as a 2025 Gartner survey of cybersecurity leaders found that 69 percent have evidence or suspicion that employees are using public generative AI at work, which is exactly the usage most teams cannot account for.

Why you can’t govern what you can’t see

Governance is a chain of steps, and visibility is the first link. To act on an AI request, a system has to see it, attach an identity and a policy to it, enforce limits on it, and record what happened. When the first step is missing, none of the steps after it can run, because a control that never observes a request has nothing to act on.

This plays out the same way across every control a security or platform team relies on. A data guardrail that never inspects a prompt cannot redact the secret inside it. A budget that never counts a call cannot cap spending on it. A policy that never sees a tool cannot decide whether the tool is allowed. The result is not weak governance but absent governance, applied with confidence to the fraction of AI traffic that happens to be visible while the rest moves untouched.

Where AI goes out of view

AI goes out of view wherever it runs close to the user and connects straight to a provider, which describes most of where it now runs. Four blind spots account for the bulk of it:

Desktop assistants such as the ChatGPT app or Claude Desktop, signed in with personal accounts the organization does not manage.
Browser AI, including in-page assistants and extensions that an employee turns on without review.
Coding agents such as Claude Code, Codex, and Cursor, which read source code and call external services from the developer’s machine.
MCP servers wired into those tools, which can read files, call APIs, and act on a user’s behalf with standing access.

The list of tools an IT team can name is routinely a fraction of what employees actually use, because every new app, browser feature, and MCP server is one more thing to find, and discovery has no natural endpoint. The tools no one tracks are not necessarily malicious; they are simply outside anyone’s view, which is what places them beyond the reach of any control. Gartner has predicted that by 2030, more than 40 percent of organizations will experience security or compliance incidents tied to the use of unauthorized AI, a direct consequence of governing only the share of activity a team can see.

Why traditional tools don’t close the gap

Traditional controls do not close the visibility gap because they were built to watch the network, while endpoint AI mostly avoids the network they watch. Network proxies and data loss prevention systems inspect what crosses the corporate perimeter, yet a large share of AI traffic leaves the device for a provider directly, over an encrypted connection that resembles ordinary web browsing and that often never passes through a corporate proxy at all.

Three gaps recur across these approaches:

Network filtering and data loss prevention sit on the corporate network path, so requests sent straight from a device to a provider, including from machines off that network, never reach them.
Blocklists work from a known list of destinations, and new apps, browser features, and MCP servers appear faster than any list is updated.
SaaS and expense audits catch tools that bill the company, but they miss free tiers, personal accounts, and anything installed locally.

Each of these methods produces a partial list at a single moment, while the real usage is continuous and changes by the day. Closing the gap calls for visibility at the point where the AI actually runs, which is the endpoint itself.

How the Bifrost AI gateway and Bifrost Edge make AI visible and governable

Making AI governable takes two things in sequence: a place where AI traffic can be seen and governed, and a way to route the AI on every machine into that place. Bifrost, the open-source AI gateway built by Maxim AI, is that place, and Bifrost Edge is what brings the endpoint into it.

On the gateway, every request that passes through is recorded by built-in observability, which captures the prompt, the response, the model, the token counts, the cost, and the latency for each call, with no change to the application. The same gateway holds the virtual keys, budgets, and rate limits that tie usage to a person or project, along with the guardrail profiles that inspect prompts and responses. The limit, until now, has been reach: the gateway could see and govern only the traffic that something had already pointed at it.

Bifrost Edge closes that reach by routing all supported AI traffic on a machine through Bifrost rather than letting it go straight to the provider. The AI that used to leave the laptop unseen now appears in the same logs, under the same policies, as the rest of an organization’s AI. The division of labor is straightforward: Edge supplies the sight by inventorying endpoint AI and routing it through the gateway, and the gateway supplies the governance by recording, inspecting, and enforcing on the traffic it can now see. The gateway stays the single control plane, and Edge becomes its reach to the endpoint, so there is no separate visibility tool and no second policy model to maintain.

See what is running across the fleet

Visibility begins with knowing what is present. Bifrost Edge discovers the MCP servers configured in each app and the AI applications in use on every machine, then assembles a live view across the fleet of which assistants and which servers are running, on which apps, and on how many devices. New apps and servers surface as they appear rather than during a periodic audit, and each one can be allowed or denied from a single console, with the decision enforced on the device.

Govern and record the traffic you can now see

Once endpoint AI is visible, the same controls that protect gateway traffic apply to it. The guardrail profiles configured in Bifrost run before a prompt reaches a model and before a response returns, so secrets and personal data are caught or redacted before they leave the machine. Virtual keys and budgets tie each request to a person and a limit, while an administrative audit trail records who changed which policy and when, signed and retained for later review.

Roll it out and keep it current

Bifrost Edge deploys through the device management platforms an organization already runs, including Jamf, Microsoft Intune, Kandji, Omnissa Workspace ONE, and JumpCloud, across macOS, Windows, and Linux. Identity and keys come from the user’s single sign-on, so no secrets sit on the device, and central changes to policy and routing reach the fleet on their own once a machine is signed in.

Common questions about AI visibility

What is AI visibility?

AI visibility is the ability to see which AI tools, models, and services are in use across an organization, and to see the individual requests they send and receive. Without it, governance controls have nothing to act on, which is why visibility is treated as the first step rather than a report generated at the end.

How do you discover shadow AI?

Shadow AI is discovered by observing AI activity where it originates. Because most of it runs on endpoints, an agent on the device, such as Bifrost Edge, can inventory the apps and MCP servers in use and route their traffic through a gateway, which turns a guess about what employees might be using into a current list of what they actually use.

Can you get visibility without blocking AI?

Visibility does not have to mean blocking AI. Routing endpoint AI through the Bifrost gateway makes each request visible and subject to guardrails and budgets while the tools keep working normally, so an organization can approve and govern AI rather than ban it. Blocking remains available for tools a team decides to disallow, but that is a policy choice rather than a side effect of gaining visibility.

Visibility first, then governance

Shadow AI is, at its core, a visibility problem before it is a policy problem, because the strongest policy in the world cannot reach a request no one can see. The organizations that handle it well start by making endpoint AI visible, then apply the controls they already trust to the usage that visibility reveals.

Pairing the Bifrost AI gateway with Bifrost Edge gives security and platform teams both halves at once: the gateway records, inspects, and enforces, and Edge, currently in alpha, brings the AI on every machine into view so those controls have something to act on. Teams working through their own visibility gap can see how the combined approach fits together on the Bifrost Edge overview and register there for alpha access.



Source link

Junkyard Computing: The Engineering Case for Building Server Clusters from Dead Smartphones


TL;DR

A cluster of discarded smartphones can match the cost and performance profile of cloud server instances for a defined, bounded class of workloads bursty, latency-tolerant, horizontally-scalable services like microservices, dev environments, and educational platforms. This isn’t a sustainability thought experiment. A 2023 prototype (10 Pixel 3A phones) ran real end-to-end microservice benchmarks at roughly 1/40th the three-year cost of an equivalent AWS instance. A 2024 follow-up deployed the same architecture for live university coursework. And in June 2026, Google backed a production-scale version of this exact design: a 2,000-phone cluster at UC San Diego, replacing the compute equivalent of ~50 traditional servers, launching Fall 2026.

The rest of this post derives why that conclusion holds not by appeal to e-waste statistics, but from the underlying compute economics. The carbon numbers show up as evidence, not motivation.

Four terms, defined precisely

Before building the argument, four terms need precise definitions, because the entire case rests on a metric most performance benchmarks ignore.

Embodied carbon: emissions incurred manufacturing a device, paid once, upfront, regardless of how long the device is used.

Operational carbon: emissions incurred running a device, accrued continuously over its service life.

Computational Carbon Intensity (CCI): a metric proposed in the foundational research, defined as total lifetime CO2e (embodied + operational + networking) divided by total lifetime operations performed. Lower is better. Critically: for a device that is reused rather than newly manufactured, embodied carbon is treated as already paid i.e., C_M = 0.

Cloudlet: a small, localized cluster of compute nodes in this case, a set of networked smartphones functioning as a single addressable compute resource.

CCI is the metric that makes the rest of this argument possible. Power Usage Effectiveness (PUE), the industry-standard datacenter efficiency metric, only measures operational overhead. It says nothing about whether the underlying hardware needed to be manufactured at all. A datacenter can have excellent PUE and still have a poor carbon footprint if it churns through new servers fast enough. CCI is the metric that catches that.

Three measurements this argument stands on

Everything that follows is built from three things that have actually been measured not assumed, not estimated for effect. Each is independently checkable, sourced from device-level benchmarking and published life-cycle assessments (LCAs).

Manufacturing dominates smartphone lifecycle emissions.Published LCAs put manufacturing at 70-90% of a smartphone’s total lifetime carbon footprint. Operational energy the electricity used while running the device is a minority contributor.

Modern smartphone compute already clears the performance bar for a defined class of cloud workloads.GeekBench data across the top five Android phones released each year since 2013 shows multi-core throughput and memory capacity for recent devices meeting or exceeding AWS T4g burstable instances the instance class AWS explicitly markets for microservices, small databases, and dev environments. This is a performance floor claim, not a peak-performance claim: it does not extend to GPU-bound or HPC-class workloads.

Reused hardware carries zero marginal embodied carbon.If a device has already been manufactured and would otherwise sit idle or be discarded, its embodied carbon cost is sunk. Any additional compute extracted from it is amortized against zero new manufacturing.

The rest of this post is just what happens when you combine those three facts and follow them through.

Reuse beats new procurement on both cost and carbon and it’s not close

For workloads that fall inside a phone’s performance envelope, reusing one strictly outperforms buying new, on both dollars and carbon. Put the first and third facts above together: a repurposed device’s carbon-per-operation math loses its largest term manufacturing entirely. A purpose-built server’s math keeps it. Hold throughput roughly comparable (the second fact, within the defined workload class), and the repurposed device comes out ahead by construction, not by luck.

This isn’t theoretical. The empirical result: a 10-device Pixel 3A cloudlet running DeathStarBench’s HotelReservation and SocialNetwork applications real, end-to-end microservice stacks, not synthetic benchmarks handled up to 4,000 queries/second within a 50ms median / 100ms tail latency budget, comparable to an AWS c5.9xlarge instance. Three-year cost: $1,028 for the phone cluster versus $40,404 for the equivalent EC2 instance. Carbon efficiency: 9.8×–18.9× better per request, depending on workload mix.

Note what’s doing the work in that result: it is not that phones are faster. They aren’t. It’s that the device doesn’t have to absorb a new manufacturing cost in carbon or in dollars before it’s even started doing useful work.

The bottleneck was never the chip

The binding constraint on junkyard clusters is thermal, network, and power management not compute. Here’s why that has to be true: if reuse is strictly favorable, as established above, the only reason this isn’t already universal practice is that something else is hard. Three failure modes were identified and independently characterized:

Thermal. Phones throttle at 40-50°C and hard-shutdown at 60-70°C they were never designed for sustained, rack-density operation. Measured thermal output, however, came in low: ~2.6 W/device under 100% CPU load, ~1.2 W/device under realistic mixed workload. Extrapolated to a 256-device cluster, that’s ~666 W total coolable with two off-the-shelf 500 W server fans. The per-device throttling behavior functions as a built-in, distributed thermal governor; no centralized cooling control logic is required to keep the cluster from cascading into shutdown.

Network. Co-located WiFi clustering was tested and found to degrade past ~30 devices due to interference. The proposed mitigation for small/edge deployments is a tree topology phones grouped in cells of five, one device hotspotting to LTE, the rest bridging over its WiFi AP capping per-device throughput at ~18.5 Mbit/s. At true datacenter scale, this constraint is resolved trivially by reverting to wired Ethernet, the same way any rack of stripped-down nodes would be networked. Network is a real constraint, but not a hard one.

Power. This is the constraint unique to phone-based clusters. Smartphone batteries degrade after ~2,500 charge cycles. Under light-medium load, that works out to roughly 2.3 years of service for a Pixel-class battery before replacement non-trivial, recurring physical maintenance at scale (~9 hours of labor per 2 years for a 54-device cluster, by direct measurement). The battery cuts both ways: it doubles as a built-in UPS, and it enables smart charging (deferring charge cycles to low-carbon-intensity grid windows), which measured ~7% additional carbon reduction on a Pixel 3A but it is also the single component most likely to require physical intervention.

None of these three are compute problems. All three are solvable with conventional infrastructure engineering. That’s the load-bearing claim here: the barrier to junkyard computing was never the silicon.

The software barrier closed in three generations and that’s why 2026 happened

The remaining barrier software has closed measurably across three design generations, and that trajectory is what predicts the 2026 production deployment. Trace the actual implementation history:

Generation 1 (2023): OS replacement. Android removed entirely, replaced with Ubuntu Touch; kernel patched to add filesystem modules (BTRFS) required for Docker. Functional, but operationally fragile every device requires manual OS surgery before joining the cluster.

Generation 2 (2024): Native virtualization. Android 14+ shipped KVM in the stock kernel. The redesigned architecture runs an Ubuntu VM inside unmodified Android, with a Kubernetes pod inside that VM. Setup dropped to a scriptable handful of terminal commands. No OS replacement required.

Generation 3 (2026, production): Hardware reduction. Per the Google-backed UCSD deployment, phones are physically stripped to bare motherboard display, battery, camera, chassis removed and the SoC/RAM/storage run plain Linux directly, orchestrated with Kubernetes, indistinguishable to a scheduler from any other commodity node.

Each generation removed friction without changing the underlying economics laid out above. That’s the pattern that makes the trajectory predictable rather than coincidental: the compute case for junkyard clusters was sound in 2023; what changed by 2026 was that the engineering overhead of standing one up dropped enough for an organization like Google to commit production resources to it.

Where this stops applying

No argument built this way is honest without stating where it stops holding.

This does not extend to: GPU/AI-training workloads (measured 15–22× throughput gap against a GTX 1080 Ti on FP32/INT32 in the same research lineage), latency-critical applications (inter-device network hops add measurable tail latency), or memory-bound workloads exceeding ~12GB per node (current high-end smartphone RAM ceilings).

It does extend to: containerized microservices, CI/dev environments, educational platforms (autograders, notebook hosting, coursework infrastructure), and any workload class characterized by burstiness and loose latency SLAs which is precisely the workload class Google and UCSD are targeting for the Fall 2026 deployment.

Where this series goes next

This post establishes the why. The next posts in this series go device-by-device through the how:

How the thermal and network constraints above are actually engineered around at cluster scale
The full software stack evolution from Generation 1 to Generation 3, including the Kubernetes scheduling layer
A teardown of the CCI formula and how to apply it to your own infrastructure decisions

Sources: Switzer et al., “Junkyard Computing: Repurposing Discarded Smartphones to Minimize Carbon,” ASPLOS 2023; Switzer et al., “Reducing the Carbon Footprint of EdTech with Repurposed Devices,” 2024; Google Research / UC San Diego phone cluster computing project coverage, June 2026.



Source link