There’s a paper that’s been making the rounds for weeks among people who take applied AI seriously in business. It’s called AI Self-preferencing in Algorithmic Hiring and, while it sounds academic, it raises a problem every business owner should understand before automating anything that involves decisions about people, vendors, or customers.
The idea, in one sentence: AI models prefer content generated by AI models. And when you put them in charge of decisions, that preference has measurable consequences.
What the study found (and why it matters)
The researchers Jiannan Xu, Gujie Li, and Jane Yi Jiang asked a simple question: as LLMs become involved on both sides of decision-making processes — from hiring to content moderation — do they systematically favor content that looks like their own outputs?
The experiment is elegant. They took 2,245 human-written resumes from a professional platform, collected before the mass adoption of generative AI. For each one, they generated counterfactual versions with several leading models: GPT-4o, GPT-4o-mini, GPT-4-turbo, LLaMA 3.3-70B, Mistral-7B, Qwen 2.5-72B, and Deepseek-V3.
Then they had those same models evaluate the resumes. The result?
LLMs consistently prefer resumes generated by themselves over human-written ones.
This isn’t a tiny statistical nudge. When simulating realistic hiring pipelines, candidates using the same LLM as the evaluator were up to 60% more likely to pass the filter than equally qualified candidates with human-written resumes, with the largest disadvantages in fields like sales and accounting.
Translated: if your company uses GPT-4o to screen resumes and you used GPT-4o to write yours, you have an advantage. If you wrote it yourself, you’re at a disadvantage. No matter how qualified you are.
Why this happens
LLMs are not neutral evaluators. They learned what “looks good” from patterns in their training data, and when they generate text they reinforce those same patterns. When they later evaluate, they reward what matches their own aesthetic: structure, vocabulary, rhythm, formatting.
The risk is that AI amplifies bias if it systematically favors resumes that reflect its own generative style. In those cases, evaluation depends less on the actual quality of the credentials and more on superficial stylistic alignment with the evaluating LLM, giving unjustified advantages to those who use the same model to write.
This isn’t a bug. It’s how these models are designed to work.
This isn’t just about hiring
Here’s the part few people are connecting: the same problem appears anywhere an LLM evaluates content that was likely generated by another LLM.
Think about the processes you’re already automating or considering automating:
| Process | Who generates | Who evaluates | Risk |
|---|---|---|---|
| Resume screening | Candidate (with AI) | Your bot (AI) | Bias in favor of the same model |
| Lead scoring | Lead (sometimes with AI) | Your AI-powered CRM | AI-polished leads score higher |
| Vendor proposal review | Vendor (with AI) | Your evaluation AI | Same issue |
| Review moderation | Customer (sometimes with AI) | Your system | Genuine reviews flagged as suspicious |
| Email triage | Sender (with AI) | Your auto-pilot inbox | AI-template emails prioritized over handwritten ones |
The pattern repeats. The findings point to an emerging and previously overlooked risk in AI-assisted decision-making, and call for expanded fairness frameworks that address biases arising from AI-to-AI interactions.
The good news: it can be mitigated
It’s not all bad news. The study shows that, in many cases, this bias can be reduced by more than 50% through simple interventions that target the LLMs’ self-recognition capabilities.
In other words: with the right system design, the problem drops dramatically. The key is how you build the automation, not whether you use AI at all.
What you should do if you automate with AI
If you’re automating decision-making processes — hiring, leads, vendors, prioritized support, whatever it is — these are the principles you should require from whoever builds the system (or from yourself if you build it):
1. Don’t let a single model be both generator and judge. If you use the same LLM to write and evaluate, you’re maximizing bias. Combine different models at different steps in the pipeline.
2. Keep the final decision with humans for sensitive cases. AI can filter, rank, summarize, and suggest. But hiring someone, rejecting a vendor, or prioritizing a VIP customer should go through a person. Automation saves time on the repetitive stuff, not the critical stuff.
3. Evaluate with structured criteria, not “overall impression.” Instead of asking the AI, “Is this a good candidate?”, ask it to extract concrete data (years of experience, technologies, certifications) and leave the judgment to explicit rules. This reduces the surface area where stylistic bias can operate.
4. Audit results regularly. Take a sample of the automated decisions every month and review them by hand. If you notice weird patterns (“all the approved ones sound the same”), you’ve got a problem.
5. Diversify your signals. Don’t base a decision only on what the AI “reads” from a document. Cross-check with external data: LinkedIn, CRM, history, real behavior. The more non-text signal you have, the less weight style carries.
6. Document how your automation works. If you can’t explain to an employee or a customer how a decision is made, you have a legal and reputational problem waiting to happen.
The uncomfortable part for many AI vendors
The industry is full of companies selling “AI for hiring,” “AI for lead scoring,” “AI for everything” as if it were plug-and-play magic. This paper is an uncomfortable reminder: badly designed automation is not neutral, it is actively biased, and often the bias stays invisible until someone measures it.
That’s not a reason not to automate. It’s a reason to automate well.
How we approach this at Studio SmartWork
When we build AI automations for our clients — whether it’s lead scoring, email filtering, or bots that answer questions — we design the systems with exactly these kinds of problems in mind:
- Role separation: the model that extracts information is not the same one that makes the decision.
- Explicit rules for subjective criteria: AI provides data, business rules provide decisions.
- Human in the loop where it matters: automations run 24/7, but sensitive decisions always have a review point.
- Full transparency: the client sees what each step does, which model does it, and why.
- Open source with n8n: because when you have access to the full pipeline, you can audit it. Closed-vendor black boxes are the opposite of what you need to detect bias.
It’s not magic. It’s engineering with judgment.
The conclusion that matters
AI is an extraordinary tool for removing repetitive work. But when you put it in charge of decisions, it stops being just a tool and becomes another actor in your process, with its own biases and preferences. This paper proves it with data: LLMs are not neutral; they prefer their own kind.
The question every business owner should ask before automating a decision process is not “Does the AI work?” It’s: “Did I design the system to work well, or did I hand my judgment over to a black box that prefers itself?”
Those two answers lead to very different outcomes six months later.