Engineering

6 min read 12 Mar 2026

What It Actually Takes to Run AI Workflows in Production

Everyone can build a prototype

The bar for getting an LLM to do something useful has never been lower. A few API calls, a prompt, and you have something that impresses in a demo.

Production is different.

Six things every team hits

1. Cost control

LLM calls are cheap until they’re not. One misconfigured loop, one unexpectedly long input, and you’ve burned your monthly budget in an afternoon. You need hard budget caps at the workflow level, not just rate limits at the API level.

2. Observability

When something goes wrong — and it will — you need to know which node failed, what input it received, and what output it produced. Logging console.log(response) does not scale.

3. Retries and idempotency

LLM APIs fail. Network timeouts happen. Your workflow needs to handle transient failures gracefully without re-running nodes that already succeeded.

4. Isolation

If a workflow writes files, runs code, or modifies a repository, you need isolation. Running against your main branch in CI is a fire waiting to happen.

5. Secrets management

Credentials should never be in prompts, in logs, or in workflow definitions checked into version control. They should be injected at runtime from a secrets store.

6. Human-in-the-loop

Not everything should be fully automated. Some workflows need a review step before they take an action. That review step needs to be a first-class primitive, not a bolted-on afterthought.

How Skylence addresses these

Harness Builder was designed with all six in mind. Budget limits and output styles are in the meta block. Observability is built into the runtime. Worktree isolation is a one-line config change. Secrets are injected, never interpolated.

We’re not the only way to solve these problems. But we think the .sky format makes the right things easy and the dangerous things explicit.

safe-deploy.sky
meta
name = "safe-deploy"
description = "Run tests, get human approval, then deploy"
max_budget_usd = 2.0
⊕⊕

§test§
model = "sonnet"
isolation = "worktree"
§§

test
Run the test suite. If any tests fail, summarise which ones and why.
Output: pass | fail + summary
∆∆

§review§
model = "opus"
depends_on = ["test"]
human_review = true
§§

review
Test results: $test.output

Should we proceed with deployment? Summarise the risk.
∆∆

§deploy§
depends_on = ["review"]
requires_approval = true
§§

deploy
Deploy to production. Use the approved plan from review.
∆∆