Vibe Coding Higher Quality Code

Ever since the term vibe coding was coined by Andrej Karpathy, it has been on top of many developers' minds. Basically, vibe coding is using AI agents to generate code for you with very little to no human intervention.

On the plus side, there's something magical about watching an agent crank out a new feature or even build a new app completely from scratch. But on the minus side, the code generated by agents is often low quality in a number of ways, not following your original intent, or being buggy. Because of this, many people are hestitant to use agents in "serious" projects, and for good reason -- the code may work at first, but projects can quickly start to buckle under the weight of accumulated tech debt.

In developing the OpenHands AI software development agent, we use agents all the time to generate code to great effect -- OpenHands is the most active contributor to its own code base, and we use it in a number of other projects as well. So how can we use agents to write most of our code on major projects while still maintaining high quality? In this blog I'd like to share a number of rules of thumb that we've learned that have helped us tremendously:

Use Static Analysis Tools
Practice Test-driven Development
Use CI/CD
Enable Repository Customization
Perform Two-tier Code Review

Let's dig into each of these in turn.

Use Static Analysis Tools

AI agents have the ability to write code and run commands, which means that, in theory, they should be able to debug and fix issues similarly to how human developers do. However, even for human developers, debugging runtime errors with multiple levels of dependencies can be a nightmare. One powerful tool that we have to help us is static analysis, which makes it possible to catch many issues in a more transparent way, before actually running the code, and in my opinion this is a must-have in any vibe coding workflow.

If you are working in a weakly typed language like Python or JavaScript, the most important static analysis tool is a type checker. Type checking is the process of checking that the types of variables match the types that are expected by functions that are called on them. For Python, we enforce this using mypy, and for JavaScript we use TypeScript. It is also useful to use a linter to check for style issues, and to enforce a style guide, and we use ruff for this. To ensure that the agents consistently use these tools, we use pre-commit, which runs these checks on every commit, making sure that the agent doesn't "forget" to run them before sending code for human review.

You can see our example configuration files for mypy, ruff, and pre-commit if you want an example.

Practice Test-driven Development

One other characteristic of AI agents is that they will often jump to conclusions, assuming that they know the right way to fix a bug or implement a new feature without fully understanding the code base. One way to combat this is to practice test-driven development, where you ask the agent to write tests for your code before it writes the code itself.

An example of a prompt that I use to do this is:

Read this github issue: {github issue link} Once you have understood the issue, first read the existing testing code and find a good place to add one or more new tests that fail, demonstrating that the issue exists. Do NOT write any code to fix the issue until you have written a test reproducing the issue and confirmed that it fails when running. If you are not able to reproduce the issue, ask me for help.

This way the agent is forced to carefully understand the issue and reproduce it in a test, and once there is a clearly failing test it is much easier to fix the issue as well.

Use CI/CD

Another advantage of test-driven development is that as you code more, you accumulate a large test suite which helps prevent regressions. To ensure that the test suite is run every time we send in a new pull request, we use Github Actions. GitHub actions take a while to run, so I often will append a prompt like this to the end of my requests to the agent:

Once you have sent a pull request, wait 30 seconds and then check the result of github actions with the github API. If the CI doesn't pass, run the corresponding github workflow locally to reproduce the issue, fix it, then update the PR and monitor actions again.

This allows the agent to iterate until CI finally passes.

Enable Repository Customization

Many of the best practices above can be enforced by prompts or scripts, but it's a bit onerous to type in extensive prompts every time you want to use the agent. Fortunately, most agents, including OpenHands, allow you to customize settings for each project to your liking.

In the case of OpenHands, we support:

Perform Two-tier Code Review

In multi-person development projects, code review is essential, and we have found that a two-tier process works well:

Agent Invoker Review: At first, the PR is marked as a draft and assigned to the person who invoked the agent. This developer reads the agent's code and either leaves comments on the pull request, or directly prompts the agent to make changes. If comments are left in the pull request, we can then use the OpenHands github integration give a high-level comment @openhands fix these comments, and the agent will make the changes.

Second Review: Once the person who invoked the agent is satisfied with the changes, they will convert the PR from a "draft" PR into a normal PR. At this point, the pull request is assigned to another person for a second review.

We have found that this "Agent Invoker Review" is critical, as it makes sure that the person who has full context of the PR is satisfied before it is sent out for a second review.

A Final Note

If you read all the way here, there might be some people thinking, "Well, that's great, but you just took all the fun out of vibe coding! Do you mean that I need to do all this setup work just to use agents effectively?"

The good news is, all of this setup work can be vibe coded as well! For instance, you can ask the agent to:

Set up a new repo with a Python backend and TypeScript frontend. Use best practices, including mypy for type checking, ruff for linting, pre-commit hooks, and github actions for CI/CD.

If you do this, you'll have a repo that is set up with all of these practices from the start, so you can jump in to getting the agent to write relatively solid code for you right away.

If you're interested in doing some serious vibe coding, it's easy to get started on the OpenHands Cloud, or download and run OpenHands yourself. And if you want to discuss best practices, learn more about agents, or just chat about AI in software development, come join us on Slack or Discord.

Neubig, Vibe Coding Higher Quality Code (2025)

Get the latest updates from All Hands AI delivered to your inbox.