DAILY NEWS

Stay Ahead, Stay Informed – Every Day

Advertisement
Integration with Workable public jobs API



Workable is an ATS with a public careers layer you can read without authentication. The widget-style JSON endpoint powers Workable-hosted career sites and embeddable job widgets.

This post shows how to list published jobs for one Workable account, request full descriptions, and normalize location and remote fields. For other ATS public feeds, see the Ashby, Greenhouse, and Lever posts.

Workable’s authenticated REST API v3 (https://{subdomain}.workable.com/spi/v3/) requires a bearer token and is meant for HR integrations, not public job aggregation.

Prerequisites

Node.js version 26
A company’s Workable account slug (see below)
No API key required for the public widget endpoint

Find the account slug

Workable career pages use URLs like https://apply.workable.com/{account_slug}/ or legacy https://{account_slug}.workable.com/. The slug is the account identifier in API paths.

Examples: huggingface for Hugging Face, flosum for Flosum.

API overview

Item
Value

Jobs (with details)
GET https://www.workable.com/api/accounts/{slug}?details=true

Locations
GET …/accounts/{slug}/locations

Departments
GET …/accounts/{slug}/departments

Auth
None

Format
JSON

Node fetch may follow a redirect to https://apply.workable.com/api/v1/widget/accounts/{slug}?details=true; both URLs return the same payload.

Set details=true to include description and full_description on each job. Without it you get summary fields only.

Common fields on each job in jobs():

Field
Description

title
Job title

url, shortlink

Apply links

location, locations

Structured location objects

experience
Seniority label (for example Mid-Senior level)

published_on, created_at

Timestamps

state
On public feeds this is often a region name, not listing status – prefer published_on to detect live roles

Basic integration

const accountSlug = process.env.WORKABLE_ACCOUNT_SLUG ?? ‘huggingface’;
const url = new URL(
`https://www.workable.com/api/accounts/${encodeURIComponent(accountSlug)}`,
);
url.searchParams.set(‘details’, ‘true’);

const response = await fetch(url);

if (!response.ok) {
throw new Error(`Workable API ${response.status}: ${response.statusText}`);
}

const data = await response.json();

for (const job of data.jobs ?? ()) {
console.log(job.title, ‘-‘, job.location?.location_str, ‘-‘, job.url);
}

Enter fullscreen mode

Exit fullscreen mode

Keep only published listings:

function isPublished(job) {
if (job.published_on?.trim()) return true;
return !job.state || job.state === ‘published’;
}

const publicJobs = (data.jobs ?? ()).filter(
(job) => job.title && (job.url || job.shortlink) && isPublished(job),
);

Enter fullscreen mode

Exit fullscreen mode

Locations and remote detection

Prefer location.location_str when present. Otherwise build a label from structured fields:

function stringifyLocation(entry) {
const city = entry.city?.trim();
const region = entry.state_code?.trim() || entry.region?.trim();
const country = entry.country_name?.trim() || entry.country?.trim();

if (city && region && country) return `${city}, ${region}, ${country}`;
if (city && country) return `${city}, ${country}`;
return country || city || ”;
}

function resolveLocation(job) {
const primary = job.location?.location_str?.trim();
if (primary) return primary;

const parts = (job.locations ?? ())
.map(stringifyLocation)
.filter(Boolean);

return parts.join(‘ / ‘) || ‘Unknown’;
}

function isRemoteJob(job, locationLabel) {
const structured =
Boolean(job.location?.telecommuting) ||
job.location?.workplace_type?.toLowerCase() === ‘remote’ ||
(job.locations ?? ()).some(
(loc) =>
Boolean(loc.telecommuting) ||
loc.workplace_type?.toLowerCase() === ‘remote’,
);

return structured || /remote/i.test(locationLabel);
}

Enter fullscreen mode

Exit fullscreen mode

Normalize to a stable shape

function normalizeWorkableJob(job, companyName) {
const location = resolveLocation(job);
const description = `${job.description ?? ”} ${job.full_description ?? ”}`.trim();

return {
id: job.id ?? job.shortcode,
title: job.title.trim(),
company: companyName,
location,
isRemote: isRemoteJob(job, location),
url: job.url || job.shortlink,
postedAt: job.created_at
? new Date(job.created_at)
: job.published_on
? new Date(job.published_on)
: null,
experience: job.experience ?? null,
description,
};
}

Enter fullscreen mode

Exit fullscreen mode

Optional companion calls enrich filters:

const (locationsRes, departmentsRes) = await Promise.all((
fetch(`https://www.workable.com/api/accounts/${accountSlug}/locations`),
fetch(`https://www.workable.com/api/accounts/${accountSlug}/departments`),
));

const { locations } = await locationsRes.json();
const { departments } = await departmentsRes.json();

Enter fullscreen mode

Exit fullscreen mode

Need help with your project?

Get personalized advice on your architecture, code, or career in a 45-minute 1-on-1 consultation.

→ Book a consultation



Source link

The CTO Playbook for AI Agent Data Analysis on a Budget



So here’s what happened: the CTO Playbook for AI Agent Data Analysis on a Budget

Six months ago my engineering team was burning roughly $14,000 a month on a single AI agent data pipeline. The model was great. The latency was fine. The output quality was honestly impressive. But the bill was eating our runway, and I had to make a call that would have felt absurd a year earlier: rip out a perfectly working stack and rebuild it from scratch.

This is the story of how I did it, what I learned shipping AI agent data analysis at scale, and why I now treat model choice the same way I treat database choice — as a strategic decision, not a default.

The Wake-Up Call

We had built our analytics agent on GPT-4o. It is a phenomenal model. I will not pretend otherwise. But the moment we crossed about 8 million tokens per day of production traffic, the math stopped working. At $2.50 per million input tokens and $10.00 per million output tokens, every new customer we onboarded was a net loss on infrastructure for the first three months.

I remember staring at the dashboard one Tuesday morning. Throughput was fine. The model was hitting the benchmarks we cared about. Our NPS was climbing. And yet finance was flagging the line item every week. That is the moment every startup CTO dreads: when the thing that is working is also the thing that is going to kill you if you do not change it.

So I started asking the questions I should have asked on day one. Which models are actually production-ready for our workload? What is the real cost gap between flagship models and the new generation of leaner ones? And critically, can I switch providers without rewriting my entire application?

That last question is the one nobody talks about. Vendor lock-in in the LLM space is real, and it is sneakier than cloud lock-in. When your prompt engineering, your evaluation harness, your retry logic, and your observability all assume one provider’s API shape, switching costs are not just financial — they are engineering hours you do not have.

The Cost Numbers That Made Me Switch

Once I started looking at the market seriously, the gap was jaw-dropping. Global API currently lists 184 models, with prices ranging from $0.01 to $3.50 per million tokens depending on tier. That spread is not academic. For an analytics agent, where input tokens dominate (because you are shoving tables, schemas, and prior context into every prompt), the input price is what actually moves your P&L.

Here is the comparison I built for my board deck:

Model
Input ($/M)
Output ($/M)
Context

DeepSeek V4 Flash
0.27
1.10
128K

DeepSeek V4 Pro
0.55
2.20
200K

Qwen3-32B
0.30
1.20
32K

GLM-4 Plus
0.20
0.80
128K

GPT-4o
2.50
10.00
128K

Look at GLM-4 Plus. $0.20 input, $0.80 output, 128K context window. For a large slice of our agent traffic — the follow-up questions, the structured summarization calls, the routing layer — the quality delta against GPT-4o was inside the noise floor of our human eval set. The cost delta was 12x.

That is when I knew. We were not paying for quality. We were paying for the logo on the box.

The Architecture I Actually Shipped

I am going to walk you through the production-ready setup we landed on, because I think it is the right shape for almost any team running AI agent data analysis at scale.

The core insight is that “AI agent data analysis” is not one workload. It is at least four:

Routing and intent classification — tiny prompts, high volume, must be cheap and fast.

Schema and tool selection — moderate context, structural reasoning.

Heavy analytical reasoning — the flagship call, where quality actually matters.

Verification and self-critique — another model call, where consistency matters more than peak brilliance.

Each of those workloads has a different price-quality sweet spot. Treating them as one homogeneous workload is how teams end up with $14,000 monthly bills for what should be a $3,000 service.

My routing logic now looks at the incoming query, classifies it (using GLM-4 Plus, which is dirt cheap), and then dispatches to one of three model tiers. The flagship calls — maybe 15% of total volume — still hit a top-tier model. The other 85% lands on leaner, faster, dramatically cheaper endpoints.

The result: a 40-65% cost reduction against our previous all-GPT-4o stack, with our internal quality benchmarks moving by less than 2 percentage points. That is the kind of ROI your CFO actually notices.

The Code

Here is the base client setup we use everywhere. I am showing the Python version because that is what our data team writes, but the same shape works in Node and Go.

import os
from openai import OpenAI

# when we swap providers — the entire point of routing through
# a unified API surface.
client = OpenAI(
base_url=”https://global-apis.com/v1″,
api_key=os.environ(“GLOBAL_API_KEY”),
)

def classify_query(user_query: str) -> str:
“””Cheap intent classification. GLM-4 Plus is plenty for this.”””
response = client.chat.completions.create(
model=”z-ai/glm-4-plus”,
messages=(
{
“role”: “system”,
“content”: “Classify the user’s analytics query as: simple, structured, or deep. Reply with one word only.”,
},
{“role”: “user”, “content”: user_query},
),
temperature=0.0,
max_tokens=4,
)
return response.choices(0).message.content.strip().lower()

def run_agent(user_query: str, context: str) -> str:
“””Dispatch to the right model tier based on query complexity.”””
tier = classify_query(user_query)

if tier == “deep”:
# Flagship tier — only for the hard stuff.
model = “deepseek-ai/DeepSeek-V4-Pro”
elif tier == “structured”:
# Mid tier — schema reasoning, tool calls.
model = “deepseek-ai/DeepSeek-V4-Flash”
else:
# Default tier — follow-ups, summarization, simple Q&A.
model = “Qwen3-32B”

response = client.chat.completions.create(
model=model,
messages=(
{“role”: “system”, “content”: “You are a senior data analyst. Reason step by step.”},
{“role”: “user”, “content”: f”Context:\n{context}\n\nQuestion: {user_query}”},
),
temperature=0.2,
)
return response.choices(0).message.content

Enter fullscreen mode

Exit fullscreen mode

Notice the base_url. That single line is the reason I am not locked into any one provider. If a better-priced model drops next quarter, or if a provider has a regional outage, I change the model string and move on. My application code, my prompt library, my eval harness — none of it changes. That is vendor lock-in avoidance as a feature, not as an afterthought.

For streaming responses on the deep tier, here is a second snippet that has saved us a lot of perceived latency complaints:

def stream_agent(user_query: str, context: str):
“””Stream the flagship tier for time-to-first-token gains.”””
response = client.chat.completions.create(
model=”deepseek-ai/DeepSeek-V4-Pro”,
messages=(
{“role”: “system”, “content”: “You are a senior data analyst.”},
{“role”: “user”, “content”: f”Context:\n{context}\n\nQuestion: {user_query}”},
),
stream=True,
temperature=0.2,
)
for chunk in response:
delta = chunk.choices(0).delta.content
if delta:
yield delta

Enter fullscreen mode

Exit fullscreen mode

Streaming shaved roughly 800ms off perceived response time on our longest-tail queries. At scale, that is the difference between a user thinking “this feels fast” and “this feels slow.”

What Actually Broke (And What I Learned)

I would be lying if I said the migration was clean. A few things bit us, and I want to be honest about them because the marketing material never is.

Tokenization differences. When you swap models, token counts do not transfer 1:1. The same English prompt can be 10-15% more tokens on one model than another. We had to rebuild our cost forecasting model from scratch. I am embarrassed how long I assumed tokenization was standard.

Latency variance. The 1.2s average latency number is real, but averages lie. We saw p99 latency spike on two of the cheaper models during US evening hours. We solved it with a simple fallback chain: if a call does not return inside 4 seconds, retry once on the next tier up. Costs us a few percent. Saves us a lot of angry customers.

Quality variance on edge cases. Our flagship model caught a subtle statistical error in about 95% of cases. The mid-tier model caught it in about 82%. That sounds small, but in a data analysis product, a silent miscalculation is a brand-destroyer. We added a verification call (using a different model family to avoid correlated errors) on any answer that involves numbers. The 84.6% average benchmark score we see is the blended result across all tiers.

Cache behavior. I cannot stress this enough: cache aggressively. We saw a 40% hit rate on our analysis queries within the first week, because analysts ask the same questions in slightly different ways. That 40% is pure margin. If you are not caching at the prompt-similarity level, you are leaving money on the table.

The Vendor Lock-In Question

This deserves its own section because it is the part of the conversation I think most CTOs avoid.

When you build on a single provider’s API, you are not just buying tokens. You are buying into their SDK conventions, their rate limit semantics, their error envelope, their deprecation policy, and their pricing roadmap. The moment any of those change in a way you do not like, you are stuck. And in the current LLM market, pricing has been dropping roughly 10x per year for equivalent capability. Locking in at last year’s prices is a real cost.

Routing through a unified API surface like Global API does not magically fix this, but it shifts the dependency from “the model vendor” to “the routing layer.” That is a much better place to be, because the routing layer has an economic incentive to keep you portable. Your model vendor does not.

We also run a quarterly exercise I call the “swap drill.” I take one of our production endpoints, switch it to a different model for a week, and measure the quality and cost delta. It is two engineer-days of work. It keeps us honest, and it means that if any provider raises prices or has a reliability incident, we are not scrambling — we are executing a playbook we have already rehearsed.



Source link

One API Call to Audit Any Domain’s Email Security



You know the drill. A customer complains their transactional emails land in spam. Or a B2B trial signup uses a throwaway address. Or someone asks “do we have DMARC set up correctly?” and you open ten browser tabs to find out.

I built MailSec to replace that entire workflow with one API call.

The problem

Email infrastructure is deceptively complex:

SPF has a hard 10-lookup limit that silently breaks when you add one too many include:

DMARC with p=none does literally nothing — but most teams ship it and assume they’re protected

DKIM selectors vary by provider (google, selector1, k1, s1) and you need to guess which one to check

Spamhaus listings can tank your deliverability for days before anyone notices

DNSSEC is either there or it isn’t, and most tools make you check separately

The information is all in DNS, but it’s scattered across different record types, different query tools, and different mental models. You end up juggling dig, MXToolbox, Spamhaus lookup, and a DMARC analyzer — just to answer “is this domain’s email OK?”

One request, full picture

curl https://prod.api.market/api/v1/fivetag-systems/mailsec/v1/audit/cloudflare.com \
-H “x-api-market-key: YOUR_KEY”

Enter fullscreen mode

Exit fullscreen mode

Response:

{
“domain”: “cloudflare.com”,
“spf”: {
“present”: true,
“valid”: true,
“record”: “v=spf1 ip4:199.15.212.0/22 ip4:173.245.48.0/20 include:_spf.google.com include:spf1.mcsv.net include:spf.mandrillapp.com include:mail.zendesk.com include:stspg-customer.com include:_spf.salesforce.com -all”,
“lookupCount”: 7
},
“dmarc”: {
“present”: true,
“valid”: true,
“record”: “v=DMARC1; p=reject; pct=100; rua=mailto:…@dmarc-reports.cloudflare.net,mailto:rua@cloudflare.com”,
“policy”: “reject”,
“subdomainPolicy”: “reject”,
“pct”: 100,
“rua”: (
“mailto:…@dmarc-reports.cloudflare.net”,
“mailto:rua@cloudflare.com”
)
},
“dkim”: { “present”: true, “selector”: “k1”, “valid”: true },
“dnssec”: { “enabled”: true, “valid”: true },
“mx”: {
“present”: true,
“redundant”: true,
“records”: (
{ “host”: “mxa-canary.global.inbound.cf-emailsecurity.net.”, “priority”: 5 },
{ “host”: “mxb-canary.global.inbound.cf-emailsecurity.net.”, “priority”: 5 },
{ “host”: “mxa.global.inbound.cf-emailsecurity.net.”, “priority”: 10 },
{ “host”: “mxb.global.inbound.cf-emailsecurity.net.”, “priority”: 10 }
)
},
“score”: 100,
“grade”: “A”,
“blacklists”: { “dblListed”: false, “zenListed”: false },
“verdict”: “READY”,
“mtaSts”: {
“present”: false,
“issues”: (“mta-sts: no DNS record found”)
},
“tlsRpt”: {
“present”: false,
“issues”: (“tlsrpt: no record found”)
}
}

Enter fullscreen mode

Exit fullscreen mode

Cloudflare scores 100/A. SPF with 7 lookups (under the limit of 10), DMARC at reject with full reporting, DKIM present, DNSSEC valid, redundant MX, clean blacklists. Verdict: READY.

Now try a domain that doesn’t have its act together and you’ll see the score drop, issues appear, and the verdict shift to CAUTION or BLOCKED.

What’s behind the score

The audit scores five components out of 100:

Check
Max points
What it measures

SPF
20
Valid record, all mechanism present, lookup count under 10

DMARC
30
Present, enforced (quarantine/reject), reporting configured

DKIM
20
Key found at a known selector

DNSSEC
20
DS record present, chain of trust valid

MX
10
MX records exist, redundant hosts

Grades: A (90+), B (70+), C (50+), D (30+), F (

DMARC is weighted heaviest because it’s the single biggest factor in whether spoofed mail gets through. A domain with p=none is essentially unprotected — MailSec won’t call that “ready.”

MTA-STS, TLS-RPT, and BIMI are included in the audit response for visibility but are informational only — they don’t affect the score. Adoption is still too low to penalize domains without them.

Beyond the full audit

You don’t always need everything. Each check has its own endpoint:

# Just SPF
GET /v1/spf/{domain}

# Just DMARC policy
GET /v1/dmarc/{domain}

# DKIM — auto-probes common selectors, or pass ?selector=google
GET /v1/dkim/{domain}

# MTA-STS — DNS record + HTTPS policy file (RFC 8461)
GET /v1/mta-sts/{domain}

# TLS-RPT — reporting URIs for TLS failures (RFC 8460)
GET /v1/tlsrpt/{domain}

# Is this a throwaway email domain?
GET /v1/email/disposable/{domain}

# Full email validation: syntax + DNS + disposable check
GET /v1/email/validate?email=user@example.com

# Deliverability verdict without DNSSEC (focused on inbox placement)
GET /v1/deliverability/{domain}

Enter fullscreen mode

Exit fullscreen mode

Real use cases

1. Validate B2B signups

Before provisioning a trial, check if the domain is real, has working email, and isn’t disposable:

curl …/v1/email/validate?email=cto@acme-corp.com

Enter fullscreen mode

Exit fullscreen mode

{
“email”: “cto@acme-corp.com”,
“syntaxValid”: true,
“domainExists”: true,
“mxPresent”: true,
“disposable”: false,
“deliverable”: true
}

Enter fullscreen mode

Exit fullscreen mode

Block mailinator.com, guerrillamail.com, and 100k+ other throwaway domains automatically. The disposable check does suffix-walking, so anything.mailinator.com is caught too.

2. Pre-flight transactional sends

About to send a welcome email, invoice, or password reset? Check the recipient’s domain first:

curl …/v1/deliverability/their-domain.com

Enter fullscreen mode

Exit fullscreen mode

If verdict is BLOCKED, that domain is in Spamhaus — your email probably won’t arrive. If CAUTION, their SPF/DMARC is misconfigured and replies/bounces may behave unexpectedly. Only send with confidence when verdict is READY.

3. Customer onboarding — “Check my domain” button

Building a SaaS that sends email on behalf of customers? Give them a one-click domain health check in your onboarding flow. Hit /v1/audit/{domain} and render the results:

“Your DMARC policy is set to none — this means spoofed emails from your domain won’t be blocked. Change it to quarantine or reject to protect your brand.”

4. Monitor your own domains

Run a daily cron against /v1/audit/bulk with your company’s domains. Alert when:

Score drops below a threshold
DMARC policy changes from reject to none

A new Spamhaus listing appears
SPF lookup count crosses 8 (getting close to the limit of 10)

5. Audit third-party vendors

Before integrating with a partner who’ll send email on your behalf, check their domain. A vendor with p=none DMARC and no DKIM is a phishing risk to your customers.

Performance

Live DNS lookups on every request (no stale scrapes)
In-process cache respects each record’s TTL — repeat queries are
Full audit fans out all checks in parallel — cold lookups typically 200-800ms
Bulk endpoint audits up to 10 domains in a single request

Get started

MailSec is available on api.market. Sign up, grab your API key, and start auditing domains in minutes.

Try it now — pick any domain you’re curious about and see what comes back. You might be surprised by your own.



Source link