Why Your AI Delivers Mediocre Results (and How to Fix It with Clear Specs)

There’s a conversation that comes up every week at Studio SmartWork. A business owner writes to us frustrated: “I tried building a bot with ChatGPT and the results are inconsistent. Sometimes it works, sometimes it makes things up, sometimes it just ignores what I ask.”

The usual reaction is to blame the model. “AI still isn’t ready.” “LLMs hallucinate too much.” “Maybe in a year.”

But that conclusion is almost always wrong. The problem is rarely the model. The problem is how we’re talking to it.

This week, a Hacker News article titled Specsmaxxing has been making the rounds and sparking hundreds of comments among developers. Its thesis is simple but important: if you want AI to produce reliable results, you have to stop improvising prompts and start writing clear specifications. Let’s break down what that means, why it matters for your business, and how to apply it even if you’re not technical.

The real problem: “AI psychosis”

The author of the article gives a provocative name to a phenomenon anyone who has tried automating with AI will recognize: AI psychosis. It’s that feeling of arguing with an assistant that understands perfectly one day and seems to have forgotten the basic rules the next. You ask it to classify emails and it works. A week later, it classifies the exact same emails the other way around.

Why does this happen?

Because language models are probabilistic, not deterministic. They don’t follow rules like a traditional program — they interpret instructions. And if those instructions are ambiguous, each interpretation can be slightly different. Multiply that by thousands of runs per month and you’ve got an unpredictable system.

The solution is not to hope the model “gets it.” It’s to eliminate ambiguity at the root.

What a spec is (and why it changes everything)

A spec (specification) is a document that describes precisely what a system has to do. Not how it does it — what it does.

The difference between a prompt and a spec is the difference between:

“Hey, classify these emails and reply to the important ones”

and

task: email_classification
categories:
  - urgent: requires a response in less than 2h
  - existing_customer: comes from a domain in CRM
  - prospect: first contact, no history
  - spam: detected by standard filters
  - newsletter: contains unsubscribe link
rules:
  - if urgent AND existing_customer: notify Slack #sales
  - if prospect: enrich with LinkedIn before notifying
  - if spam: archive without notifying
response_format:
  - original_subject
  - category
  - action_taken
  - confidence (0-100)

The first leaves the model free to decide what “important” means. The second gives it no room to invent.

Why YAML matters (and why you should care even if you don’t code)

The article’s author argues for writing specs in YAML, a format that reads almost like an indented list. The reason is practical: YAML forces you to structure your thinking into explicit hierarchies and relationships. You can’t be vague in YAML — either you define something, or it doesn’t exist.

You don’t need to know YAML to benefit from this idea. What matters is the principle:

Vague prompt	Structured spec
“Respond professionally”	Tone: formal but friendly. Greeting: “Hi [name]”. Closing: “Best regards”. Max 4 lines.
“If it’s urgent, let me know”	Urgent = contains keywords [“today”, “urgent”, “meeting tomorrow”] OR sender in VIP_list
“Summarize the call”	Summary: max 3 bullets. Include: decision made, next step, deadline. Exclude: small talk.

This is the difference between an automation that works 60% of the time and one that works 98% of the time.

How we apply this at Studio SmartWork

When a client hires us to automate a process, the first thing we do is not touch code. It’s write the spec.

Our typical process is:

Discovery interview. We talk to the person doing the manual work today. We ask them to show us live, with no filters.
Case mapping. We document not only the “normal” case, but the edge cases too. What happens when the email is in another language? When the customer replies with a new question? When the lead has no website?
Readable spec. We write the full logic in a document the client can read and validate before we write a single line of code. If the client doesn’t understand it, it’s written badly.
Validation with real cases. We take 20–30 historical examples and verify that the spec would produce the correct result in each one.
Build. Only then do we start assembling the bot in n8n with the relevant AI APIs.

This upfront step is why we deliver automations that work in less than 7 days without breaking the following month. The spec is the contract between what the client expects and what the machine does.

The three most common spec-writing mistakes

After hundreds of automated processes, these are the failures we see again and again:

1. Confusing exceptions with normal cases. People describe the happy path and forget the edges. “The bot receives the email, classifies it, and replies.” Okay — but what if 5 emails arrive from the same sender in 3 minutes? What if the email is a reply to an earlier conversation? Exceptions are not details — they’re where automations fail.

2. Defining tasks instead of criteria. “Summarize the meeting” is a task. “A summary is: max 3 bullets, each starting with a past-tense verb, max 15 words” is a criterion. Criteria can be evaluated. Tasks are open to interpretation.

3. Not defining what NOT to do. A good spec says both what the system should do and what it must never do under any circumstances. “Never promise discounts.” “Never give exact delivery dates.” “Never answer legal questions.” Boundaries are just as important as capabilities.

The mindset shift you need

If you’re thinking about automating something in your business — customer support, email management, lead qualification, whatever it is — the most important shift is not technical. It’s this:

AI is not magic. It’s a very fast executor of precise instructions. If the instructions are ambiguous, the result will be ambiguous. If they’re clear, the result will be reliable.

That’s actually good news. It means you don’t depend on models getting better for your automation to work. It depends on you (or whoever helps you) being able to translate your process into clear rules.

And here’s an uncomfortable clue: if you can’t explain your process to a new human in less than an hour with enough detail for them to do it well, there is no AI in the world that will automate it for you. Automation starts with clarity, not technology.

Where to start

If you want to apply this to your business without hiring anyone, try this exercise this week:

Choose a repetitive task that you or your team does.
Write it as if you had to leave instructions for someone starting tomorrow who you won’t be able to call.
Include: what inputs it receives, what outputs it produces, what decisions it makes, and what it does in every edge case you can think of.
Read it out loud. Any sentence that uses “usually,” “sometimes,” or “depends” is a sign of ambiguity that needs to be resolved.

That document is 80% of the work of automating the task. The other 20% is the technical setup — and that’s where a team like ours comes in. But the spec is always built together, because no one knows the process better than the person living it.

The lesson from the Hacker News article isn’t new, but it’s coming back in a big way for a reason: as AI gets embedded in more critical processes, the difference between a bot that works and one that’s embarrassing is exactly this. Clear specs. Explicit rules. Zero ambiguity.

The good news is that this is within reach for any business. The bad news is that you can’t skip the step.