coding – DAILY NEWS

TECH & AI

Stop Leaking Medical Data! Build a Privacy-First Skin Cancer Classifier with Federated Learning & PySyft 🩺🛡️

jackminion Jul 4, 2026 0

Data is the new oil, but in healthcare, data is more like plutonium—extremely valuable but incredibly dangerous if handled incorrectly. If you are building AI for medical use cases, you’ve likely hit the “Data Silo” wall. Hospitals can’t just ZIP up patient records and DM them to you because of GDPR, HIPAA, and basic human ethics.

So, how do we train a high-performing Skin Lesion Classification model without ever actually seeing the raw medical images? Welcome to the world of Federated Learning (FL) and Privacy-Preserving AI. In this guide, we’ll explore how to use PySyft and PyTorch to train models on decentralized data while keeping sensitive information exactly where it belongs: with the patient.

We will focus on Federated Learning, Differential Privacy, and Secure Multi-Party Computation (SMPC) to build a robust, privacy-first pipeline.

The Architecture: Move the Code, Not the Data

In traditional Machine Learning, we bring data to the model. In Federated Learning, we flip the script: we bring the model to the data.

graph TD
subgraph “Central Server (Aggregator)”
A(Global Model v1.0) –>|Distribute Weights| B{Encrypted Aggregator}
B –>|Updated Global Model| A
end

subgraph “Hospital A (Edge Node)”
C(Local Data: Skin Images) –> D(Local Training)
D –>|Trained Gradients| B
end

subgraph “Hospital B (Edge Node)”
E(Local Data: Skin Images) –> F(Local Training)
F –>|Trained Gradients| B
end

style A fill:#f9f,stroke:#333,stroke-width:2px
style C fill:#bbf,stroke:#333
style E fill:#bbf,stroke:#333

Enter fullscreen mode

Exit fullscreen mode

As shown in the flow above, the raw images never leave the hospitals. Only the “learnings” (gradients/weights) are sent back to the central server.

Prerequisites

Before we dive into the code, ensure you have the following stack ready:

PyTorch: The backbone for our neural networks.

PySyft: The secret sauce for federated and private learning.

Differential Privacy (Opacus): To prevent “membership inference attacks.”

Step 1: Setting Up Virtual Workers

In a real-world scenario, these would be physical servers in different hospitals. For this tutorial, we will simulate two hospitals (Alice and Bob) using PySyft’s virtual workers.

import torch
import syft as sy

# Hooking PyTorch to add extra privacy features
hook = sy.TorchHook(torch)

# Create two remote ‘hospitals’
hospital_alice = sy.VirtualWorker(hook, id=”alice”)
hospital_bob = sy.VirtualWorker(hook, id=”bob”)

print(f”Nodes initialized: {hospital_alice.id}, {hospital_bob.id} 🏥”)

Enter fullscreen mode

Exit fullscreen mode

Step 2: Distributing the Dataset

Imagine we have a dataset of skin lesion images (like the HAM10000 dataset). We split it and “send” it to our hospitals. In reality, the data would already exist there; we are simply gaining pointers to it.

# Simulated skin lesion data (Features = Pixels, Targets = Cancer Type)
data = torch.tensor(((0.1, 0.2), (0.3, 0.4), (0.5, 0.6), (0.7, 0.8)), requires_grad=True)
target = torch.tensor(((0), (0), (1), (1)))

# Distribute data to hospitals
# In a real app, data stays local; here we simulate the ‘silo’
data_alice = data(0:2).send(hospital_alice)
target_alice = target(0:2).send(hospital_alice)

data_bob = data(2:4).send(hospital_bob)
target_bob = target(2:4).send(hospital_bob)

datasets = ((data_alice, target_alice), (data_bob, target_bob))

Enter fullscreen mode

Exit fullscreen mode

Step 3: The Federated Training Loop

Now for the magic. We define a simple CNN/Linear model and send it to the remote locations for training.

from torch import nn, optim

# A simple model for skin lesion classification
model = nn.Linear(2, 1)

def train(epochs=5):
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(epochs):
for data, target in datasets:
# 1. Send model to the hospital node
model.send(data.location)

# 2. Normal Training Step
optimizer.zero_grad()
output = model(data)
loss = ((output – target)**2).sum()
loss.backward()
optimizer.step()

# 3. Get the updated model back (The data stays behind!)
model.get()

print(f”Epoch {epoch} complete at {data.location.id}. Loss: {loss.get().item():.4f}”)

train()

Enter fullscreen mode

Exit fullscreen mode

Step 4: Adding Differential Privacy (DP)

Even if we don’t see the data, a clever attacker could theoretically reverse-engineer the gradients to see what the training images looked like. To prevent this, we add Differential Privacy. This injects controlled “noise” into the gradients.

Pro-Tip: If you’re looking for production-grade patterns on how to implement Differential Privacy at scale or want to explore hardware-level security like TEEs (Trusted Execution Environments), I highly recommend checking out the advanced research articles over at WellAlly Tech Blog. They cover the intersection of AI and privacy in much greater depth! 🥑

The Result: Privacy is a Feature, Not a Bug

By the end of this process, you have a model that has learned the features of skin cancer from multiple sources without violating a single privacy regulation.

Why this matters:

Compliance: You are automatically GDPR/HIPAA compliant by design (Privacy by Design).
Data Diversity: You can train on data from a hospital in New York and a clinic in London simultaneously, creating a more generalized and less biased model.
Security: Even if your central server is breached, the attacker finds no patient data—only model weights.

Conclusion 🚀

Federated Learning is transforming how we think about sensitive data. We no longer need to choose between AI Innovation and User Privacy. With tools like PySyft and PyTorch, the “Privacy-First” approach is becoming the industry standard.

Are you ready to build the future of secure AI? If you enjoyed this “Learning in Public” session, drop a comment below! What’s your biggest challenge with medical data? Let’s discuss! 👇

Source link

TECH & AI

Integration with Workable public jobs API

jackminion Jul 3, 2026 0

Workable is an ATS with a public careers layer you can read without authentication. The widget-style JSON endpoint powers Workable-hosted career sites and embeddable job widgets.

This post shows how to list published jobs for one Workable account, request full descriptions, and normalize location and remote fields. For other ATS public feeds, see the Ashby, Greenhouse, and Lever posts.

Workable’s authenticated REST API v3 (https://{subdomain}.workable.com/spi/v3/) requires a bearer token and is meant for HR integrations, not public job aggregation.

Prerequisites

Node.js version 26
A company’s Workable account slug (see below)
No API key required for the public widget endpoint

Find the account slug

Workable career pages use URLs like https://apply.workable.com/{account_slug}/ or legacy https://{account_slug}.workable.com/. The slug is the account identifier in API paths.

Examples: huggingface for Hugging Face, flosum for Flosum.

API overview

Item
Value

Jobs (with details)
GET https://www.workable.com/api/accounts/{slug}?details=true

Locations
GET …/accounts/{slug}/locations

Departments
GET …/accounts/{slug}/departments

Auth
None

Format
JSON

Node fetch may follow a redirect to https://apply.workable.com/api/v1/widget/accounts/{slug}?details=true; both URLs return the same payload.

Set details=true to include description and full_description on each job. Without it you get summary fields only.

Common fields on each job in jobs():

Field
Description

title
Job title

url, shortlink

Apply links

location, locations

Structured location objects

experience
Seniority label (for example Mid-Senior level)

published_on, created_at

Timestamps

state
On public feeds this is often a region name, not listing status – prefer published_on to detect live roles

Basic integration

const accountSlug = process.env.WORKABLE_ACCOUNT_SLUG ?? ‘huggingface’;
const url = new URL(
`https://www.workable.com/api/accounts/${encodeURIComponent(accountSlug)}`,
);
url.searchParams.set(‘details’, ‘true’);

const response = await fetch(url);

if (!response.ok) {
throw new Error(`Workable API ${response.status}: ${response.statusText}`);
}

const data = await response.json();

for (const job of data.jobs ?? ()) {
console.log(job.title, ‘-‘, job.location?.location_str, ‘-‘, job.url);
}

Enter fullscreen mode

Exit fullscreen mode

Keep only published listings:

function isPublished(job) {
if (job.published_on?.trim()) return true;
return !job.state || job.state === ‘published’;
}

const publicJobs = (data.jobs ?? ()).filter(
(job) => job.title && (job.url || job.shortlink) && isPublished(job),
);

Enter fullscreen mode

Exit fullscreen mode

Locations and remote detection

Prefer location.location_str when present. Otherwise build a label from structured fields:

function stringifyLocation(entry) {
const city = entry.city?.trim();
const region = entry.state_code?.trim() || entry.region?.trim();
const country = entry.country_name?.trim() || entry.country?.trim();

if (city && region && country) return `${city}, ${region}, ${country}`;
if (city && country) return `${city}, ${country}`;
return country || city || ”;
}

function resolveLocation(job) {
const primary = job.location?.location_str?.trim();
if (primary) return primary;

const parts = (job.locations ?? ())
.map(stringifyLocation)
.filter(Boolean);

return parts.join(‘ / ‘) || ‘Unknown’;
}

function isRemoteJob(job, locationLabel) {
const structured =
Boolean(job.location?.telecommuting) ||
job.location?.workplace_type?.toLowerCase() === ‘remote’ ||
(job.locations ?? ()).some(
(loc) =>
Boolean(loc.telecommuting) ||
loc.workplace_type?.toLowerCase() === ‘remote’,
);

return structured || /remote/i.test(locationLabel);
}

Enter fullscreen mode

Exit fullscreen mode

Normalize to a stable shape

function normalizeWorkableJob(job, companyName) {
const location = resolveLocation(job);
const description = `${job.description ?? ”} ${job.full_description ?? ”}`.trim();

return {
id: job.id ?? job.shortcode,
title: job.title.trim(),
company: companyName,
location,
isRemote: isRemoteJob(job, location),
url: job.url || job.shortlink,
postedAt: job.created_at
? new Date(job.created_at)
: job.published_on
? new Date(job.published_on)
: null,
experience: job.experience ?? null,
description,
};
}

Enter fullscreen mode

Exit fullscreen mode

Optional companion calls enrich filters:

const (locationsRes, departmentsRes) = await Promise.all((
fetch(`https://www.workable.com/api/accounts/${accountSlug}/locations`),
fetch(`https://www.workable.com/api/accounts/${accountSlug}/departments`),
));

const { locations } = await locationsRes.json();
const { departments } = await departmentsRes.json();

Enter fullscreen mode

Exit fullscreen mode

Need help with your project?

Get personalized advice on your architecture, code, or career in a 45-minute 1-on-1 consultation.

→ Book a consultation

Source link

TECH & AI

When (and when not) to inline images as Base64

jackminion Jul 3, 2026 0

Base64 image data URIs are one of those web techniques that look like a magic shortcut the first time you use them.

Instead of referencing an external file:

src=”/logo.png” alt=”Logo”>

Enter fullscreen mode

Exit fullscreen mode

you can put the image bytes directly in the document as text:

src=”data:image/png;base64,iVBORw0KGgoAAAANSUhEUg…” alt=”Logo”>

Enter fullscreen mode

Exit fullscreen mode

That can be useful. It can also make a page slower, harder to cache, and more annoying to maintain.

Here is the practical rule: inline images as Base64 when self-containment matters more than caching. Keep normal image files when the browser should be able to cache, resize, lazy-load, or optimize them independently.

What a Base64 image actually is

An image file is binary data. Base64 rewrites that binary data as plain text using a limited character set. To make the browser treat the text as an image, you wrap it in a data URI:

data:image/png;base64,iVBORw0KGgoAAAANSUhEUg…

Enter fullscreen mode

Exit fullscreen mode

The first part tells the browser the MIME type. The second part tells it the data is Base64 encoded. The long tail is the image itself.

Base64 is not compression. It is not encryption. It is just a text representation of the same bytes.

When inlining an image is worth it

1. Tiny icons and UI assets

For very small images, removing an extra HTTP request can be worth the extra bytes. This is especially true for small icons, logos, placeholders, simple UI sprites, or tiny transparent PNGs.

Modern HTTP/2 and HTTP/3 make extra requests cheaper than they used to be, so this is not an automatic win. But for a one-off tiny asset inside a small page or widget, a data URI can still be a clean choice.

2. Single-file deliverables

Sometimes the point is not raw page speed. Sometimes you need one file that carries everything with it:

an HTML report
an email template
a CodePen or demo snippet
a CMS block where you cannot upload assets
a test fixture that should not depend on external hosting

In those cases, Base64 is useful because the image travels with the HTML, CSS, JSON, or JavaScript.

3. Prototypes and throwaway snippets

If you are testing a layout, writing a bug reproduction, or pasting a minimal example into a ticket, a data URI can save time. You do not need to set up static hosting just to show one image.

4. Local-only conversion workflows

If the image is private, it is nice to avoid uploading it to a random converter. Browser APIs can generate a Base64 data URI locally, so the file never leaves your device.

When you should not inline the image

1. Large photos and hero images

Base64 usually makes the encoded data about 33% larger than the original binary file. That is because Base64 stores every 3 bytes as 4 text characters.

For a large JPG, PNG, or WebP, that extra size is not a rounding error. Keep big images as normal files.

2. Images reused across pages

An external image can be cached once and reused across page views. An inlined image is bundled into every document or stylesheet that contains it.

If the same logo appears on 20 pages, inlining it 20 times is usually worse than letting the browser cache one file.

3. Responsive images

Normal image files can use srcset, sizes, lazy loading, CDN transforms, format negotiation, and caching headers.

src=”/hero-800.webp”
srcset=”/hero-400.webp 400w, /hero-800.webp 800w, /hero-1600.webp 1600w”
sizes=”100vw”
loading=”lazy”
alt=”Product screenshot”
>

Enter fullscreen mode

Exit fullscreen mode

That is much harder to preserve when the image is baked into a string.

4. Anything you expect humans to edit

Base64 strings are unpleasant to review in Git diffs, easy to truncate by accident, and noisy inside templates. If designers, marketers, or other engineers need to update the image regularly, use a normal asset file.

How to generate a Base64 data URI in the browser

The simplest browser-native path is FileReader.readAsDataURL().

The result will look like this:

data:image/png;base64,iVBORw0KGgoAAAANSUhEUg…

Enter fullscreen mode

Exit fullscreen mode

You can use that string directly in HTML:

src=”data:image/png;base64,iVBORw0KGgoAAAANSUhEUg…” alt=”Logo”>

Enter fullscreen mode

Exit fullscreen mode

or in CSS:

.logo {
background-image: url(“data:image/png;base64,iVBORw0KGgoAAAANSUhEUg…”);
}

Enter fullscreen mode

Exit fullscreen mode

A simple checklist

Inline the image if:

it is small
it is not reused across many pages
self-contained delivery matters
you do not need responsive image behavior
the string will not make your source files painful to maintain

Keep it as a normal file if:

it is a photo or large graphic
it should be cached separately
it appears on many pages
it needs srcset, lazy loading, CDN resizing, or image optimization
non-developers need to replace it often

Tiny tool note

I built a small free tool for this workflow: PNG to Base64 converter. It runs entirely in the browser with FileReader, so the PNG is not uploaded, and it gives you the raw Base64 string plus ready-to-paste HTML and CSS snippets.

There is also a general image to Base64 converter for JPG, SVG, WebP, GIF, and other image formats.

Use Base64 as a packaging tool, not a default image strategy. When the image is tiny or the deliverable must be self-contained, it can be perfect. When performance, caching, and responsive delivery matter, boring old image files are still the better answer.

Source link