AutonomousHQ

Paperclip Is the Right Idea, Not Quite the Right Tool Yet

We ran Paperclip as our task orchestration layer for six weeks. Here's an honest account of what worked and what added overhead instead of reducing it.

tool reviewpapercliptask orchestrationagents

Paperclip has the best possible pitch for anyone running AI agents: create a task, an agent picks it up, executes it, posts updates, and moves on. No code, no brittle automation pipelines, no babysitting. It sounds like exactly what a zero-human operation needs.

We used it as our primary orchestration layer for six weeks. The concept held up. The implementation didn't - at least not for us.

What Paperclip promises

The core idea is clean. A human (or another agent) creates an issue. An AI agent checks out that issue, runs the work, and posts back with what it did. Paperclip handles the queue, the status tracking, and the handoffs. You get a structured, auditable workflow without writing the plumbing yourself.

For anyone who's spent time maintaining n8n flows or hand-rolling agent loops in code, that's a real proposition. The appeal isn't just convenience - it's that a dedicated tool should be more reliable than something you glued together yourself.

Paperclip is also clearly aimed at the early-adopter end of the market: people building AI-operated workflows before there's a settled industry standard for how to do it. That's a reasonable place to aim. It's also a difficult one to execute in, because the users have high expectations and low tolerance for friction.

What actually happens in practice

Our experience broke down into three consistent problems.

Agents not picking up work. Issues would sit in the queue with no agent touching them. Not failing - just idle. Diagnosing this required checking multiple places: the issue status, the agent logs, the run history. Sometimes the agent had tried and stopped. Sometimes it hadn't tried at all. It wasn't always clear which.

Runs completing without output. An agent would pick up a task, run, and mark it complete - with nothing to show for it. No error, no partial output, no indication of what had happened. Whether the agent had executed correctly and the output hadn't been captured, or whether the run had silently failed, wasn't visible from the interface. We had to go back and re-examine the underlying work directly to find out.

Status that didn't reflect reality. Paperclip's status tracking is supposed to give you a clear view of what's in progress, what's blocked, and what's done. In practice, tasks showed as active when nothing was running. Tasks showed as complete when the work hadn't been done correctly. The status layer added a step to the process - checking whether the status was accurate before trusting it - that shouldn't have been necessary.

None of these problems are unsurmountable. They're the kind of rough edges you expect from tooling at this stage of development. But together they meant that Paperclip was adding oversight work rather than reducing it. In a setup where the whole point is to reduce how much the human has to monitor, that's a fundamental problem.

Who it might work for anyway

To be fair about it: Paperclip is genuinely useful for certain workflows.

If your tasks are short, well-defined, and idempotent - write this, summarise that, check this condition - the model works well. The agent picks it up, does the thing, posts a result, and the result is easy to verify. The failure modes are less likely to appear, and when they do, they're easier to catch.

It's also a reasonable choice if you're new to running AI agent workflows and want a structured interface before you know what your actual requirements are. The issue-based model maps onto how most people already think about tasks. You don't have to understand queues or event triggers or agent orchestration to use it - you just create a ticket, like you would in Jira or Linear.

The friction we hit is more likely to affect teams running longer, more complex, or more stateful work - agents that need to co-ordinate, hand off between stages, or produce output that feeds into the next step. That's where Paperclip's status tracking and reliability limitations start to compound.

What we're moving to

We've switched to NanoClaw for agent orchestration. The model is different: rather than agents polling a queue of issues, agents are embedded directly into the communication layer and respond to messages in context. That means tasks are picked up faster, status is visible in the thread itself, and the overhead of checking whether things are working is lower.

It's not perfect either - we'll review it honestly when we've had enough time with it. But the early comparison is that NanoClaw disappears into the background in a way Paperclip didn't. The orchestration layer should be invisible. If you're thinking about it, it's adding overhead.

For anyone looking at alternatives: n8n is worth considering if you're comfortable with visual workflow tools and want more control over the logic. Custom pipelines in code give you full flexibility at the cost of maintenance work. The right choice depends heavily on how much you trust the agents to handle variation versus how much you want the orchestration layer to enforce structure.

Verdict

Paperclip is worth watching. The concept is correct - issue-based task orchestration for AI agents is a real and useful thing - and the tool will improve. The team is clearly iterating.

But right now, for teams running complex multi-agent work, it adds more overhead than it removes. The status tracking isn't reliable enough to trust without verification, and silent failures are too common to ignore. For a zero-human operation trying to minimise the human-in-the-loop cost, that's a hard trade-off to accept.

It's not the wrong tool for everyone. For simple, well-scoped tasks, it works. For complex orchestration at any kind of volume, you'll probably hit the same walls we did.


Follow along. Tim is running the full AutonomousHQ experiment live on YouTube - every tool decision, every failure, every correction on camera. Sign up to the newsletter for weekly updates on what's working and what isn't in zero-human operations. This is one of those experiments that's more useful run in public.