Just Ralph It (a.k.a. Ralfealo) • Nico Pujia

Audience¶

Throughout this document I'll be calling you the user, and assuming you fall into one or more of the following buckets¹:

Early-stage startup founder, whether technical or not so much
Indie hacker
Programmer who wants to
- migrate a legacy software
- start a new project

Problem statement¶

We already know how powerful the Ralph Wiggum technique, and thus LLMs' software development capability, currently is. However, using such power that AI can potentially give us, you would realize if you try, is quite inaccesible. Another way to realize its inaccesibility is noticing that, provided existing tools, no one beyond a niche, curious group of hackers achieves the power of heavily automated software development.

First, if you are not careful, asking an agent to write the specifications down is fast, but very prone to not actually reflecting what's in your mind. Those specs will probably bloat of unintended behaviors that will eventually arise once the software, if functional at all, is built, unless you thoroughly review and correct them to the detail. On the other hand, writing them down yourself is an option not just slow in the medium-to-large scale, but also very prone to behavioral ambiguities, meaning that your specs would have more than one plausible interpretation, and therefore the result would probably drift from what you had expected. In other words, putting your entire idea into text is harder than it seems.

Second, setting the environment up for agents in a way to actually get them working correctly, especially for long-running ones, requires a great deal of trial and error, i.e. technical learning/skill, to achieve it. Although this is relatively easy when you are in the loop, as a synchronous 1:1 interaction with the agent, it gets proportially harder as you give the agent more room to work on its own, moving yourself to be on the loop. In other words, once you've correctly specified what you want, turning that into working, polished software isn't as easy as just clicking a button.

To conclude, my pain comes from (a) finding it quite rough to translate my idea into text effectively and easily, and (b) the difficulty of setting up an effective environment for long-running agents. Or in just one phrase, nowadays it's hard to heavily automate software development.

Existing solutions¶

You may be thinking that this is a problem of the past, that it has already been solved, so I'll analyze whether existing tools actually resolve this problem or not.

Vibe-coding tools¶

Isn't that already solved by Lovable, v0, Replit, Bolt, etc.?

When you use those tools, you give them a prompt and, after at most ~5 questions, they start building, so they assume you have already pretty well defined what you want to build; or, more often, that you get a result, and then prompt again because it wasn't exactly what you imagined in the first place. You may argue that this is a very good approach given that you can visualize what you're building and then decide what you want based on that. To be honest, it works pretty well for building frontends.

Now imagine you are building any other kind of software where the visual element is not the main one, and you prompt the agent, it builds the thing, then you test it, and it doesn't behave as expected. It may not even be broken, but it just doesn't do exactly what you intended it to do. So now you have to prompt it again, and repeat such process until you reach the expected result. You prompt, you wait, you test, you re-prompt, and so on.

My question for you is: wouldn't it be a much faster, cheaper, and straightforward approach to get interviewed until everything gets clarified; then you click a button, wait, and it's finally done exactly as you expected, with no re-prompting needed? And I don't mean removing iteration, because iteration is what leads to the best result; I only suggest iterating mostly through a prior interview rather than through generated code.

Moreover, at least today, these tools are heavily optimized mostly for web applications, leaving behind, for example, scraping, ETL pipelines, native applications, browser and editor plugins, trading and messaging bots, complex backends, and so on.

General coding agents¶

Why not just ask Claude Code, Codex, Cursor, OpenCode, etc. to do it?

Because the problem stated above arises from the current inability of these agents to solve it without a capable engineer around it. Although I'm not claiming saying it's impossible to solve the problem oneself, I'd say it's quite cumbersome, and that it requires a great deal of hard-earned skill to accomplish such results.

It's worth the mental exercise, though, to imagine, "if the models were 10, or even 100x better, would the problem get automatically solved?" Partly, I'd say. We can prove it right now if instead of talking to an LLM we talk to a very capable engineer. If we, as users, are ambiguous, what makes more sense: that the engineer starts right away, or that they first interviews us until we're on the exact same page? Both the engineer and user would still, no matter how smart, benefit from the latter approach.

The proper solution¶

So, I am here to propose you a better alternative; one which attempts to package the skill required to automate software development in a system. Such system, analogous to the problem, is split into two main components, plus a small setup/onboarding. I call it Just Ralph It (JRI for short), honouring the Ralph technique, which is its execution engine.

0. Setup¶

Given that the agents need a machine to run on, and that they may get off the rails (or, as Geoffrey says, "it's not if it gets popped; it's when it gets popped. And what is the blast radius?"), it's a reasonable idea to use a virtual private server (VPS) to reduce the risks of chaos. For simplicity, you get one VPS for all your projects.

Now I want to make a claim aside from JRI. If you're used to using as-a-service platforms (e.g. Supabase, Vercel, etc.), you'd be surprised by how much can be accomplishsed in a single computer, in a monolith, and how easy it is to manage it if you let an agent do it. Monoliths are quite powerful.

Back to JRI, we mainly use monoliths. Besides, it serves both as a running environment and also for deployment and CI/CD of your projects. And in the cases we do need to escalate beyond the monolith, the VPS would still be useful because the agent would manage the other machines from there, and so we don't need to think it further: we just give the agent total access on one machine and, if needed, the keys to orchestrate more.

Then, we'll need lots, lots of tokens. There are cheap and powerful Chinese models, true, but I can't ignore the fact that subscriptions are heavily subsidized and provide even better models. Besides, if you are a prospective user of JRI, you probably already are paying one of these subscriptions. Therefore, you can pay directly through JRI for API-based tokens, but also alternatively connect your existing subscriptions (as far as the third parties allow us) for token consumption.

1. Intent extraction¶

Cool, we now have all we need to get started. Let's talk about what our interaction with JRI would look like.

It's an app; whether for the terminal, web, desktop, or mobile; with a chat interface where you start sharing your initial idea, and the chatbot helps you walk through it by asking lots of questions, acting as an idea extractor of your mind. It should figure out the different decision branches, questions to ask for each of them, edge cases, and then go ask from higher to lower level questions until all branches are covered, and to cover new ones as they appear through the interview.

You may even decide to discard your initial idea mid-interview, and I wouldn't call that a failure, but rather the opposite. It means you realized what you actually wanted, or at least what you didn't want, just by chatting; it'd mean you saved yourself all the effort of building it. But if you keep going, the interviewer will make sure to cover all behavioral/user-facing ambiguities. It can also help you in product and business senses, challenge you, and basically help you succeed to get the actual outcome you're looking for.

You'd notice that it asks a lot of questions. This is because the interviewer is designed in a way that the specs we get from it must pass the following heuristic:

If an engineer solved this literally, could there be more than one possible result that matched what is written in the specs?

In other words, there mustn't be space for more than one literal intepretation, except for implementation details (unless you also care about those details, of course).

Not passing this test means there are ambiguities, and no matter how smart the engineer is, they could interpret the specs in either the intended way or maybe in another, which would be a failure.

At the same time, though, there's a point where you can't ask a certain amount of questions. Nobody will answer, let's say, 1000 questions. Therefore, the interrogator has to figure out which questions have the highest leverage, which on this case would be the number of possibilities you discard by answering a certain question, and ask those ones first.

2. Execution¶

So, once the project's v1 is properly defined, you can just Ralph it. More specifically, JRI would start transforming those specs into working, trustworthy, polished software.

By this point, Ralph would be working, and you'd observe its progress through the UI. However, and as it's expected to happen, your project will very probably need updates. For that, you would be able to simply keep chatting with the interrogator, which would append the new requirements accordingly, and Ralph would eventually pick them up.

Brownfield projects¶

Running Ralph directly on an existing software to improve it further probably wouldn't work if it wasn't built by Ralph, because on the process of building it, Ralph adds backpressure (i.e. signals of what is wrong and what is not) that correct the behavior of its future instances.

For migrating legacy codebases with JRI, you'd do what's called reverse-Ralphing, which is basically producing specs based on the legacy code plus the interview and then build based on that, rather than starting only from the interview alone.

How it works¶

Talk is cheap. Show me the code.

I'm still figuring it out. It'd definitely include an agent on a loop, thoughtful context management, and a lot of backpressure. Backpressure means having a mechanism for the agent to know when it's producing unacceptable output, so it can then self-correct.

Worth mentioning, though, that Geoffrey himself pointed out that productizing the Ralph technique would not work. He argues that it's highly supervised, as for Ralph to work you need to babysit it; that you need to have an engineer on the loop, tuning it like a guitar. Even Anthropic has already tried it with a plugin and it didn't work as expected. Nevertheless, JRI is my attempt to do it better, to craft the guitar that comes tuned out-of-the-box.

Limitations and their solution¶

JRI alone wouldn't be able to build anything that requires:

Physical presence or intuitions²
Locked services (no API, no CLI, no reachable website, etc.)
Human identity

For those cases, there is a notification system, so you can unblock Ralph when it absolutely needs you.

Philosophy¶

Thesis¶

If the definition is shallow, the output is shallow; if the definition is rigorous, the output is rigorous. Given rigorous specs, LLMs are already capable of producing rigorous software if we have the proper system built around them.

Collaboration¶

To increase efficiency, one should rather spend more of the back-and-forth discussion time with co-workers or customers on the specification process than after the software has already been built.

Why build it¶

Despite I've been calling you the user, I consider one myself too. JRI is something I want and would use myself. In fact, it'd be great to have JRI to build itself, which leads me to the last point on this document.

Success scenario¶

JRI works as expected if I can define JRI itself with little friction, click "Just Ralph It," and see a working clone emerge that behaves as expected.

I don't include other kinds of non-technical people other than founders because using JRI is not what most people would call fun. In fact, half joke half true, I'd say JRI is SDD: Slow-Dopamine Development. But I haven't even explained what it is, so let's continue. ↩
When I say "physical intuitions," I mean software that requires having a hard-wired mental model of how physics work in the real world. Many video games, for example, requires the developer to have them. As humans we have those intuitions naturally, but LLMs don't, and so they can't really know if what they are building is correct or not. This video explains how if LLMs were capable of correctly building games with those characteristics, then AI would be capable of doing basically anything we can, as humans, do (i.e. AGI). ↩