What 15,000 Job Ads Told Me About the Australian ML Job Market

Posted on Sat 04 July 2026 in Data Science • 14 min read

A long time ago I set up a saved search in Gmail — a filter that quietly swept every job-alert email into a folder labelled Alerts/Jobs and marked it read. It was a tidy little pipeline for keeping my inbox uncluttered: every few weeks I would open the folder, see the pile had grown, and delete the lot without reading a single one.

This time I did something different. I was poking around my own mailbox from the command line — I have been living in the terminal for Google Workspace lately, via a small CLI I wrote about elsewhere — and before clearing the folder out of habit, I ran a count on it. The number stopped me: 2,916 job alerts, entirely unread. My finger was over the delete key, as always. Instead, I got curious. What was actually in there?

Here is the reframe that changed how I saw that folder: it was never clutter. It was a longitudinal dataset I had been passively collecting for half a year without noticing. Nobody designed it — it was never systematic, just a saved search running on autopilot. And yet it was big, exhaustive, and pointed at exactly one slice of the world I find interesting: who is hiring in machine learning, and for what. And it points at one place in particular: nine in ten of these roles are in Australia — most of them in Melbourne and Sydney — so read everything below as a picture of the Australian ML market, not the global one.

This post is two stories braided together. The first is the craft of turning a mess of email into clean data — which turned out to be the genuinely interesting engineering. The second is what the data said once I could finally read it.

The accidental dataset

Let me be honest about the scale first, because it sets up everything else. This is not big data — a few thousand emails and a few thousand web pages, all of which fits on a laptop. The whole project is a couple of hundred lines of pure Python — the standard library plus an HTTP client — and it cost essentially nothing to run.

That last point is worth dwelling on, because it cuts against the reflex of the moment. The obvious 2026 move would be to dump every email into a large language model and ask it to "analyse the job market." I deliberately did not. Every description went to disk as plain text and stayed there; the counting, de-duplication, and skill matching are all ordinary string processing and arithmetic. Total spend on model tokens: about zero.

That is a small argument, not just a footnote. Reaching for an LLM is not free, not always more accurate, and rarely more reproducible. A short function that counts how many descriptions mention "Kubernetes" gives the same answer every time, and I can read the code that produced it. For counting, not reasoning, the boring tool is the better tool.

Here is the funnel, from inbox clutter to something I could actually analyse:

2,916 alert emails, accumulated over roughly six months
14,685 individual job ads parsed out of them (each alert email lists several roles)
5,598 unique jobs after de-duplication — the same job showed up in about 2.6 different alerts on average
5,404 full job descriptions successfully archived (96.5%; the missing 3.5% were roles already deleted at the source by the time I fetched them)
3,758 of those, after cleaning, were real openings at real employers

Funnel from 14,685 job mentions down to 3,758 real employer roles

The craft is in the archiving

The analysis was the easy part. The engineering that made it possible — reliably pulling several thousand job descriptions off the public web without being a nuisance and without the whole run falling over halfway through — is where I actually learned something. Four principles carried the work.

1. The data is usually already in the HTML. Before reaching for a headless browser to "render" a page, look at what the server actually sends. Platforms hide content in surprisingly different ways: some show a login wall on the page a human visits, yet serve the same description through a plain public path; others look like fully JavaScript-rendered apps but ship the complete job text inside the initial HTML — right there in "view source," before a line of JavaScript runs. Browser automation is slow, fragile, and resource-hungry. Most of the time you do not need it. Check the raw response first.

2. Politeness is engineering, not manners. Collecting thousands of pages is a privilege, and you avoid being a problem by building restraint into the code. Identify yourself honestly as a normal browser. Space requests out with randomised delays. When a server signals "slow down," back off exponentially — wait longer after each successive rejection — then, after a bounded number of retries, give up gracefully. None of this is about evading anyone; it is about being the kind of automated visitor that does not degrade the service for the humans using it.

3. Big-but-not-fragile means idempotent and resumable. A job that fires five thousand network requests will be interrupted — a token expires, wifi hiccups, the laptop sleeps. If a five-thousand-request run cannot resume, it is a run that never finishes. So every step checks whether it has already done its work and skips it if so. If authentication drops mid-run, I re-authenticate and pick up exactly where I stopped, having wasted nothing. The shape of the loop is simple:

for job in jobs:
    target = cache_dir / f"{job.platform}-{job.id}.html"
    if target.exists():
        continue                        # already have it — never re-fetch

    for attempt in range(MAX_RETRIES):
        try:
            html = fetch(job.url)        # honest identity, one polite request
            target.write_text(html)      # to disk immediately
            sleep(BASE_DELAY + random()) # jittered pause
            break
        except RateLimited:
            sleep(BASE_DELAY * 2**attempt)  # exponential back-off

That if target.exists(): continue line is the whole trick — it is what lets you stop and restart a huge job as many times as you need. Almost every way a production data pipeline breaks traces back to the same omission: not planning for the run to be interrupted.

4. De-duplicate on a stable key, not the URL. This is the subtle one. The same job arrives in multiple alerts, and the naive fix — "drop duplicate URLs" — fails, because those URLs are stuffed with tracking parameters that differ every time; one role can wear a dozen URLs. So you dedup on something that does not change: the platform name plus the job's own numeric id. Do that, and 14,685 raw mentions collapse cleanly to 5,598 distinct jobs. Get it wrong, and every downstream count is inflated by roughly 2.6x.

What the corpus said

With the archived descriptions in hand, the findings came in four beats.

The noise tax. The first result needed no clever analysis: about one in three "ML job" alerts is not a real employer opening. Sorting by who is actually behind each posting gave a clean split — 3,758 real employer roles (70%), 1,222 gig or micro-task listings (23%), and 424 recruiter reposts that hide the true employer (8%). The gig bucket is mostly AI data-labelling marketplaces; two names alone, Alignerr (571) and Mercor (477), were 19% of the whole corpus. A fine part of the AI economy — but labelling data for a marketplace is not a salaried ML role, and mixing the two distorts the picture before you begin. Hence the rule that governs everything below: de-noise before you conclude. Every number that follows is from the 3,758 real roles.

Donut: 1 in 3 alerts is noise — employer 70%, gig 23%, aggregator 8%

The most-demanded skills are not technical. For the 3,704 real jobs with usable text, I counted how many descriptions mentioned each skill — document frequency, the share naming a skill at least once (breadth of demand, not depth). The top of the list surprised me, then on reflection did not:

Communication — 57%
Python — 56%
Stakeholder management — 49%
Leadership / mentoring — 45%
Machine learning — 43%

Three of the top four are not technical at all — communication, working with stakeholders, leading people. Python is the only hard skill in the tier, one point behind communication; "machine learning" itself sits in fewer than half the postings. I do not read this as employers not wanting people who build models. I read it as: building the model is assumed, and what they select on — in their own words — is whether you can explain the work, tie it to the business, and bring others along.

Bar chart of top skills by demand; communication skills lead

Two kinds of skill: table-stakes vs role-defining. Demand alone misleads, because a skill can be everywhere and still say nothing about a specific role. To separate the two I used lift — which is really just a conditional probability, expressed relative to the market. Take the share of a role's ads that ask for a skill, P(skill | role), and divide it by the skill's overall rate, P(skill). The result is a plain multiplier: 1× means the role wants it exactly as often as the market average (no signal), 3× means three times as often (a strong fingerprint). Laid out as every skill (rows) against every role (columns), it draws a map of who wants what:

Heatmap of skill demand by role; warm cells are role-defining skills, a flat pale band marks table-stakes

Two things jump out. Every role has a signature — the warm cells cluster tightly: GenAI (LLM, RAG, agents) under AI Engineer, the deep-learning frameworks under ML Engineer, Kubernetes/Terraform/CI-CD under MLOps, the data stack (ETL, Spark, Snowflake) under Data Engineer, Power BI and Tableau under Analyst. You can almost read a role off its column. And the bottom band is flat and pale — Python, communication, stakeholder management, leadership all sit near 1× in every column. Those are the table-stakes: asked for everywhere, so they place you nowhere in particular. Everything warmer is role-defining. One honest caveat: lift measures size, not certainty — cells on common skills rest on thousands of ads and are rock-solid, but a big multiplier on a rare skill rides on far fewer, so read the boldest exotic cells as directional. The practical read for a CV: keep the table-stakes to a compact summary line, and lead your bullets with the warm, role-specific skills — matched to the column you are actually aiming at.

Skills come in bundles. Those skills also travel together. Running community detection on which skills co-occur in the same descriptions, the ~70 I tracked fall into six clean clusters — and each cluster points squarely at a role: a classic-DS core (Python, ML, SQL, statistics), a cloud-and-MLOps stack, a GenAI stack (LLM, agents, RAG), a modern data-engineering stack (ETL, Databricks, Snowflake), a deep-learning group, and the general software languages. Employers do not ask for skills at random; they ask for coherent stacks. Span one bundle cleanly and you read as a fit for its role.

Grouped bars: six skill bundles, each mapped to a role

A trend, honestly caveated. The corpus also lets me watch skills move — carefully. From January to June 2026, demand for AI agents roughly doubled, from 12% to 22% of employer descriptions, with RAG and MLOps drifting up alongside. The tempting one-liner: hiring is tilting towards people who can build and operate agents. But six months shows drift, not a trend — and it cannot separate a real shift from an annual hiring cycle, which needs at least two years spanning the same months twice. Two more caveats: alert volume is not market volume (it is what got emailed to me, not the true population of roles), and document frequency counts mention, not importance. The direction is suggestive; the magnitude I would not bet on.

Line chart: AI-agent demand rose 12% to 22% over six months

The titles, though, barely moved. Here is the honest complement to that drift. Tracking the role mix month by month — the share of postings that were Data Scientist vs AI Engineer vs Analyst — it is strikingly flat across the six months. The skill content churned (agents up), but the titles did not: existing roles absorbed the new tools rather than new role types appearing. And, again, six months cannot reveal a hiring season either way — the wiggle you see is noise.

Line chart: monthly role mix stays roughly flat January to June 2026

Roles cluster — and that is your mobility. One last cut: comparing role families by the similarity of their required-skill mixes shows the titles are not nine islands. They collapse into two families — an engineering group (AI, ML, MLOps, Software, Data Engineer) and an analytical one (Data Scientist, Quant, Analyst, Research) — and within a family, neighbouring roles are almost interchangeable. AI-Engineer and ML-Engineer score 0.90 out of 1.0. That is not trivia: it means one well-built profile can credibly target several adjacent roles at once.

Heatmap: role families cluster into an engineering group and an analytical group

If you're doing your own job search

I started this out of curiosity, not a job hunt — but a few practical lessons fall out cleanly enough to be worth sharing. If you are actually searching, here is what I would take from the corpus:

De-noise before you conclude. A third of "ML" alerts are not real employer roles. Filter the gig platforms and recruiter reposts first, or your sense of the market will be off by a third before you start.
The market rewards communication as much as code. Three of the four most-requested skills are about people, not tools. Do not just list "communication" — show it. Point to a time your explanation actually changed a decision.
Separate your table-stakes from your role-defining skills. Put the ubiquitous ones (Python, communication) in a compact summary line. Lead your bullet points with the distinctive, high-lift skills that match the specific role.
Target your role's neighbours. Adjacent titles ask for nearly the same skills (see the map above), so one strong profile can cover several. Know which roles are your neighbours; it widens the search without diluting it.
Read the whole description, not the title. Titles are noisy and inconsistent across companies. The required-skills section is the real signal about what a role actually is.

What I actually learned

When I finished, the thing I kept turning over was not any single number. It was that the whole exercise contained almost no machine learning at all.

There was no model, no LLM. The insight, such as it is, came from taking something boring — a folder of email I had been ignoring — seriously enough to treat it as data, then doing the unglamorous work carefully: fetch politely, store everything, de-duplicate on the right key, count honestly, and caveat harder than felt comfortable. The same discipline I try to bring to production ML turned out to matter just as much for a weekend curiosity: measure, do not guess; make it reproducible; know the limits of what you built.

That is also why I want to be clear-eyed about what this is: a six-month snapshot of one person's saved search — a keyhole, not a window. What it is not, yet, is a picture of the market over time, because you cannot see a cycle in half of one. So the pipeline is still running, skipping the emails it has already seen and quietly archiving the new ones. In a year or two there will be enough to say something honest about how demand moves across seasons, rather than guessing from a single arc.

I like that the most valuable part was the least sophisticated. Anyone with an email account and a little Python could do a version of this on whatever corner of the world lands in their own inbox. The tools were free. What it took was noticing that the mess was worth reading.

And if you are the person these postings are actually for — refreshing the inbox with a knot in your stomach — one last thing. It is easy to read charts like these and feel measured by them, as if your worth were a document-frequency count. It is not. What this corpus really shows is how noisy the whole process is: a third of the alerts are not even real jobs, the monthly mix barely moves, and half of what looks like a trend is just variance. A rejection at 11pm feels like a clean signal about you; it almost never is. The search is genuinely hard, it is not a verdict on you, and the people who come out the other side are usually just the ones who kept going a little past the point that felt reasonable. Be patient with yourself, and keep going.