DAILY NEWS

Stay Ahead, Stay Informed – Every Day

Advertisement
Stop Leaking Medical Data! Build a Privacy-First Skin Cancer Classifier with Federated Learning & PySyft 🩺🛡️



Data is the new oil, but in healthcare, data is more like plutonium—extremely valuable but incredibly dangerous if handled incorrectly. If you are building AI for medical use cases, you’ve likely hit the “Data Silo” wall. Hospitals can’t just ZIP up patient records and DM them to you because of GDPR, HIPAA, and basic human ethics.

So, how do we train a high-performing Skin Lesion Classification model without ever actually seeing the raw medical images? Welcome to the world of Federated Learning (FL) and Privacy-Preserving AI. In this guide, we’ll explore how to use PySyft and PyTorch to train models on decentralized data while keeping sensitive information exactly where it belongs: with the patient.

We will focus on Federated Learning, Differential Privacy, and Secure Multi-Party Computation (SMPC) to build a robust, privacy-first pipeline.

The Architecture: Move the Code, Not the Data

In traditional Machine Learning, we bring data to the model. In Federated Learning, we flip the script: we bring the model to the data.

graph TD
subgraph “Central Server (Aggregator)”
A(Global Model v1.0) –>|Distribute Weights| B{Encrypted Aggregator}
B –>|Updated Global Model| A
end

subgraph “Hospital A (Edge Node)”
C(Local Data: Skin Images) –> D(Local Training)
D –>|Trained Gradients| B
end

subgraph “Hospital B (Edge Node)”
E(Local Data: Skin Images) –> F(Local Training)
F –>|Trained Gradients| B
end

style A fill:#f9f,stroke:#333,stroke-width:2px
style C fill:#bbf,stroke:#333
style E fill:#bbf,stroke:#333

Enter fullscreen mode

Exit fullscreen mode

As shown in the flow above, the raw images never leave the hospitals. Only the “learnings” (gradients/weights) are sent back to the central server.

Prerequisites

Before we dive into the code, ensure you have the following stack ready:

PyTorch: The backbone for our neural networks.

PySyft: The secret sauce for federated and private learning.

Differential Privacy (Opacus): To prevent “membership inference attacks.”

Step 1: Setting Up Virtual Workers

In a real-world scenario, these would be physical servers in different hospitals. For this tutorial, we will simulate two hospitals (Alice and Bob) using PySyft’s virtual workers.

import torch
import syft as sy

# Hooking PyTorch to add extra privacy features
hook = sy.TorchHook(torch)

# Create two remote ‘hospitals’
hospital_alice = sy.VirtualWorker(hook, id=”alice”)
hospital_bob = sy.VirtualWorker(hook, id=”bob”)

print(f”Nodes initialized: {hospital_alice.id}, {hospital_bob.id} 🏥”)

Enter fullscreen mode

Exit fullscreen mode

Step 2: Distributing the Dataset

Imagine we have a dataset of skin lesion images (like the HAM10000 dataset). We split it and “send” it to our hospitals. In reality, the data would already exist there; we are simply gaining pointers to it.

# Simulated skin lesion data (Features = Pixels, Targets = Cancer Type)
data = torch.tensor(((0.1, 0.2), (0.3, 0.4), (0.5, 0.6), (0.7, 0.8)), requires_grad=True)
target = torch.tensor(((0), (0), (1), (1)))

# Distribute data to hospitals
# In a real app, data stays local; here we simulate the ‘silo’
data_alice = data(0:2).send(hospital_alice)
target_alice = target(0:2).send(hospital_alice)

data_bob = data(2:4).send(hospital_bob)
target_bob = target(2:4).send(hospital_bob)

datasets = ((data_alice, target_alice), (data_bob, target_bob))

Enter fullscreen mode

Exit fullscreen mode

Step 3: The Federated Training Loop

Now for the magic. We define a simple CNN/Linear model and send it to the remote locations for training.

from torch import nn, optim

# A simple model for skin lesion classification
model = nn.Linear(2, 1)

def train(epochs=5):
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(epochs):
for data, target in datasets:
# 1. Send model to the hospital node
model.send(data.location)

# 2. Normal Training Step
optimizer.zero_grad()
output = model(data)
loss = ((output – target)**2).sum()
loss.backward()
optimizer.step()

# 3. Get the updated model back (The data stays behind!)
model.get()

print(f”Epoch {epoch} complete at {data.location.id}. Loss: {loss.get().item():.4f}”)

train()

Enter fullscreen mode

Exit fullscreen mode

Step 4: Adding Differential Privacy (DP)

Even if we don’t see the data, a clever attacker could theoretically reverse-engineer the gradients to see what the training images looked like. To prevent this, we add Differential Privacy. This injects controlled “noise” into the gradients.

Pro-Tip: If you’re looking for production-grade patterns on how to implement Differential Privacy at scale or want to explore hardware-level security like TEEs (Trusted Execution Environments), I highly recommend checking out the advanced research articles over at WellAlly Tech Blog. They cover the intersection of AI and privacy in much greater depth! 🥑

The Result: Privacy is a Feature, Not a Bug

By the end of this process, you have a model that has learned the features of skin cancer from multiple sources without violating a single privacy regulation.

Why this matters:

Compliance: You are automatically GDPR/HIPAA compliant by design (Privacy by Design).
Data Diversity: You can train on data from a hospital in New York and a clinic in London simultaneously, creating a more generalized and less biased model.
Security: Even if your central server is breached, the attacker finds no patient data—only model weights.

Conclusion 🚀

Federated Learning is transforming how we think about sensitive data. We no longer need to choose between AI Innovation and User Privacy. With tools like PySyft and PyTorch, the “Privacy-First” approach is becoming the industry standard.

Are you ready to build the future of secure AI? If you enjoyed this “Learning in Public” session, drop a comment below! What’s your biggest challenge with medical data? Let’s discuss! 👇



Source link

My Side Project Security Audit Results — I’m Embarrassed to Share



I recently did a security audit of all the side projects I’m running. FastAPI backend, Telegram bot, PWA, Streamlit app and many more. I thought, “I made it with some care, so it’ll be okay.” Wrong. We honestly share each problem we found, why we made it that way, and how we fixed it. This is not a theoretical checklist, but rather bugs that I have actually deployed to production. 1. Authentication bypass due to empty secret (Critical) My code _API_SECRET = os.environ.get(“https://dev.to/justjinoit/API_SECRET_KEY”https://dev.to/justjinoit/, ‘”https://dev.to/justjinoit/) def verify_api_key(x_api_key: str = Header(default=”https://dev.to/justjinoit/)): if _API_SECRET and x_api_key != _API_SECRET: # ← Bug here raise HTTPException(status_code=401) Enter fullscreen mode Exit fullscreen mode if _API_SECRET and … Let’s look at the conditions. If there is no API_SECRET_KEY environment variable on the server, _API_SECRET becomes an empty string — falsy — and the entire condition is skipped. All requests pass as if authenticated. Why was it designed like this? I tried to “handle it gracefully” so that the server would not crash even if environment variables were not set during local development. The problem is that the “elegant processing” made it to production, and the moment you didn’t set API_SECRET_KEY on the server, the entire API was opened. How to modify _API_SECRET = os.environ.get(“https://dev.to/justjinoit/API_SECRET_KEY”https://dev.to/justjinoit/, ‘”https://dev.to/justjinoit/) def verify_api_key(x_api_key: str = Header(default=””https://dev.to/justjinoit/)): if not _API_SECRET: raise HTTPException(status_code=500, detail=”https://dev.to/justjinoit/API_SECRET_KEY not configured”https://dev.to/justjinoit/) if not secrets.compare_digest(x_api_key, _API_SECRET): raise HTTPException(status_code=401, detail=”https://dev.to/justjinoit/Unauthorized”https://dev.to/justjinoit/) Enter fullscreen mode Exit fullscreen mode No secret = 500 error, not open access. Secrets.compare_digest() is also applied to prevent timing attacks. Lesson: Don’t make authentication conditional on whether a secret is set or not. Missing settings should be a hard failure, not an open. 2. Secret committed in the Git history (Critical) Although it is not in the current code, the API key that I committed for a “quick test” a few months ago was still in the Git history. # How to check git log –all -p | grep -E “sk-ant-api03-(A-Za-z0-9_-){20,}” git log –all -p | grep -E “AIzaSy(A-Za-z0-9){20,}” Enter fullscreen mode Exit fullscreen mode Why does this happen? To test quickly in the beginning, hardcode the key and commit. I think I “fixed it” by later moving it to .env. But git remembers every commit forever. When a repo is released or a team member joins, anyone can retrieve keys from past commits. How to fix # Remove a specific file from the entire history pip install git-filter-repo git-filter-repo –path .env –invert-paths –force git push –force-with-lease origin main Enter fullscreen mode Exit fullscreen mode And the exposed key is immediately discarded + reissued. Cleaning up the git history does not cancel exposure that has already occurred. Lesson learned: Keys that have been committed to git even once are assumed to have already been stolen and reissued. 3. Debug endpoint production deployment (High) This endpoint was deployed on the production server: @app.get(“https://dev.to/justjinoit//debug/config”https://dev.to/justjinoit/) async def debug_config(): return { “https://dev.to/justjinoit/supabase_url”https://dev.to/justjinoit/: settings.supabase_url, “https://dev.to/justjinoit/environment”https://dev.to/justjinoit/: settings.env, “https://dev.to/justjinoit/connected_services”https://dev.to/justjinoit/: (…) } Enter fullscreen mode Exit fullscreen mode Why does this happen? Debug endpoints are really convenient during development. After solving the blockage, I forget to erase it. Since there is no error, no one tells you. How to edit: Delete. If runtime debugging is necessary, place it after authentication or write logs. # Pre-deployment check grep -rn ‘@app.get.*debug\|@app.post.*debug’ app/ Enter fullscreen mode Exit fullscreen mode Lesson: Add “Check removal of debug endpoints” to the deployment checklist. Otherwise, it’s better not to make it in the first place. 4. Internal information exposed as an error message (High) # My code except Exception as e: return JSONResponse({“https://dev.to/justjinoit/error”https://dev.to/justjinoit/: str(e)}, status_code=500) Enter fullscreen mode Exit fullscreen mode If you do this, this message will be sent to the client: FATAL: password authentication failed for user “postgres” (Errno 2) No such file or directory: ‘/home/ubuntu/app/config.json’ Module ‘xyz’ version 1.2.3 has no attribute ‘connect’ An attacker can use this information to determine the infrastructure structure, libraries in use, and known vulnerabilities by version. Why was it designed like this? This is also for development convenience. It is convenient when testing because you can immediately see the cause of the error with just str(e). The problem was that there was no layer between the internal error and the HTTP response. How to fix import logging logger = logging.getLogger(__name__) except Exception as e: logger.error(f”https://dev.to/justjinoit/Error: {e}”https://dev.to/justjinoit/, exc_info=True) # Only in server log return JSONResponse({“https://dev.to/justjinoit/error”https://dev.to/justjinoit/: “https://dev.to/justjinoit/internal server error”https://dev.to/justjinoit/}, status_code=500) Enter fullscreen mode Exit fullscreen mode Gives everything to the log and nothing to the HTTP response. Lesson: Server logs are for me, HTTP error responses are for the client. These two must be completely separated. 5. XSS (High) front-end code with innerHTML without escaping: articles.forEach(article => { container.innerHTML += ` ${article.title} ${article.summary} ${article.url}”>More `; }); Enter fullscreen mode Exit fullscreen mode When the same title is entered into the DB, it is executed in all users’ browsers. Why does this happen? It is because template literals feel like string formatting. When you use ${article.title}, it doesn’t feel like you’re rendering HTML. However, the browser parses the HTML there and executes it. “https://dev.to/justjinoit/<"https://dev.to/justjinoit/) .replace(/>/g, “https://dev.to/justjinoit/>”https://dev.to/justjinoit/) .replace(/”/g, “https://dev.to/justjinoit/””https://dev.to/justjinoit/); const safeUrl = u => /^https?:\/\//.test(u || ‘”https://dev.to/justjinoit/) ? u: “https://dev.to/justjinoit/#”https://dev.to/justjinoit/; container.innerHTML += ` ${esc(article.title)} ${esc(article.summary)} ${safeUrl(article.url)}” rel=”noopener noreferrer”>More `; Enter fullscreen mode Exit fullscreen mode Lesson learned: Every time you use innerHTML, you mentally read “I’m executing arbitrary code.” Then it’s difficult to miss the escape. 6. Rate limit on AI endpoints None (High) @app.post(“https://dev.to/justjinoit//analyze”https://dev.to/justjinoit/) async def analyze(item: Item, _: None = Depends(verify_api_key)): result = await ai_client.messages.create(…) # Cost per call return result Enter fullscreen mode Exit fullscreen mode Rate limit None. If you make infinite calls, you will be charged a lot of money in an instant. @limiter.limit(“https://dev.to/justjinoit/10/minute”https://dev.to/justjinoit/) async def analyze(request: Request, item: Item, _: None = Depends(verify_api_key)): … Enter fullscreen mode Exit fullscreen mode Lesson: Authentication prevents unauthorized access. Rate limits prevent authorized but abusive access. 7. CORS is needed in production. Wildcard (Medium) app.add_middleware( CORSMiddleware, allow_origins=(“https://dev.to/justjinoit/*”https://dev.to/justjinoit/), # Allow all sources… ) Enter fullscreen mode Exit fullscreen mode Why is it dangerous even if there is an API key? CORS is a browser-level firewall. If the API key is in the front-end JavaScript, an API call using that key can be made in the user’s browser through an XSS vulnerability on another site. Possible modification: import os ALLOWED_ORIGINS = os.environ.get(“https://dev.to/justjinoit/ALLOWED_ORIGINS”https://dev.to/justjinoit/, “https://dev.to/justjinoit/*”https://dev.to/justjinoit/).split(“https://dev.to/justjinoit/,”https://dev.to/justjinoit/) app.add_middleware( CORSMiddleware, allow_origins=ALLOWED_ORIGINS, allow_methods=(“https://dev.to/justjinoit/GET”https://dev.to/justjinoit/, “https://dev.to/justjinoit/POST”https://dev.to/justjinoit/), allow_headers=(“https://dev.to/justjinoit/X-API-Key”https://dev.to/justjinoit/, “https://dev.to/justjinoit/Content-Type”https://dev.to/justjinoit/), ) Enter fullscreen mode Exit fullscreen mode # production .env ALLOWED_ORIGINS=https://myapp.vercel.app Enter fullscreen mode Exit fullscreen mode Lesson: allow_origins=(“*”) is for local development only. Never distribute. 8. Do not delete temporary files (Medium) with tempfile.NamedTemporaryFile(suffix=”https://dev.to/justjinoit/.xlsx”https://dev.to/justjinoit/, delete=False) as tmp: tmp.write(uploaded_file.read()) tmp_path = tmp.name process_file(tmp_path) # If an exception occurs here, the temporary file will remain forever Enter fullscreen mode Exit fullscreen mode An exception occurs in process_file() When exploded, temporary files are not deleted from the long-term operating server, and if the file contains user-sensitive data, it remains on the disk. How to fix tmp_path = None with tempfile.NamedTemporaryFile(suffix=”https://dev.to/justjinoit/.xlsx”https://dev.to/justjinoit/, delete=False) as tmp: tmp.write(uploaded_file.read()) tmp_path = tmp.name try: process_file(tmp_path) finally: if tmp_path and os.path.exists(tmp_path): os.unlink(tmp_path) Enter fullscreen mode Exit fullscreen mode Lesson: The code path that creates the file is also responsible for deletion. Finally is always executed even if there is an exception. After this audit, I created a checklist that is enforced on all projects: Before writing code: ( ) Create .gitignore (.env, *.key, sessions/, credentials.json) ( ) Create .env.example (template without actual values) All endpoints: ( ) Add authentication (500 error, not bypass if no secret) ( ) Error response is generic message only (str(e) prohibited) ( ) Rate limit on AI/cost-generating endpoints Frontend: ( ) Escape for all uses of innerHTML ( ) Verify that URL starts with https:// ( ) For external links rel=”noopener noreferrer” Before commit: git diff –cached | This is because it was always treated as a separate step after feature development. Add a TODO comment: “Let’s clean it up later.” Deploying “temporary” code as is. The only solution I’ve found is to insert security checks into natural timings: before commit, before deployment, and the cost of fixing the bugs themselves is tedious. If you are running a backend, we recommend checking the above pattern yourself if _SECRET and key != _SECRET Authentication bypass is much more common than you think.



Source link