Robocop: automated code review

I recently wrote Robocop, and then used Claude Sonnet 4.5 to turn it into a GitHub app under supervision from GPT-5 and Gemini 2.5 Pro.

What it does

Robocop feeds diffs, plus surrounding context, into GPT-5 at max reasoning, for automated code review. (That’s basically it! It’s very simple.)

In more detail: it watches every commit I make to a pull request in a repo I own, submits the content of the pull request to GPT-5, and eventually comments on the PR with feedback.

It has been absurdly effective for me. Its feedback has something like a 90% hit rate of true positives vs false positives. Sometimes it finds really insidious or subtle logic bugs, and I’ve basically stopped caring about basic LeetCode-style algorithmic correctness when reading Robocop-approved PRs because it finds those bugs so well. Other times it finds places where documentation has drifted or some global consistency no longer holds.

What it doesn’t do

It doesn’t work for anyone else. For security reasons, I have it responding only to my commits. (Fork it, of course, if you want to use it.)
I don’t trust it to perform architectural reviews, and it doesn’t really try to identify when a PR has created a solution to a problem that only exists because the architecture is bad.

How it works

It has several components:

A dashboard, which you run entirely locally, which scrapes your submissions to the OpenAI API and displays the ones which have Robocop metadata associated with them.
A standalone Rust binary, which you can invoke locally on a code checkout to get an ad-hoc review.
A GitHub app, which responds automatically to all my PRs.

Why write it

Other tools exist to do the same thing; GitHub Copilot has some of these features. But I had some bad experiences with automated PR review tooling in the earlier days, and GPT-5 is the first model that seems to be actually competent at the kinds of reviews I ask of it. At least with Robocop, I know that any problems are my own fault!