Robocop: automated code review

I recently wrote Robocop, and then used Claude Sonnet 4.5 to turn it into a GitHub app under supervision from GPT-5 and Gemini 2.5 Pro.

What it does

Robocop feeds diffs, plus surrounding context, into GPT-5 at max reasoning, for automated code review. (That’s basically it! It’s very simple.)

In more detail: it watches every commit I make to a pull request in a repo I own, submits the content of the pull request to GPT-5, and eventually comments on the PR with feedback.

It has been absurdly effective for me. Its feedback has something like a 90% hit rate of true positives vs false positives. Sometimes it finds really insidious or subtle logic bugs, and I’ve basically stopped caring about basic LeetCode-style algorithmic correctness when reading Robocop-approved PRs because it finds those bugs so well. Other times it finds places where documentation has drifted or some global consistency no longer holds.

What it doesn’t do

  • It doesn’t work for anyone else. For security reasons, I have it responding only to my commits. (Fork it, of course, if you want to use it.)
  • I don’t trust it to perform architectural reviews, and it doesn’t really try to identify when a PR has created a solution to a problem that only exists because the architecture is bad.

How it works

It has several components:

  • A dashboard, which you run entirely locally, which scrapes your submissions to the OpenAI API and displays the ones which have Robocop metadata associated with them.
  • A standalone Rust binary, which you can invoke locally on a code checkout to get an ad-hoc review.
  • A GitHub app, which responds automatically to all my PRs.

Why write it

Other tools exist to do the same thing; GitHub Copilot has some of these features. But I had some bad experiences with automated PR review tooling in the earlier days, and GPT-5 is the first model that seems to be actually competent at the kinds of reviews I ask of it. At least with Robocop, I know that any problems are my own fault!