A Cursor review

At Red Hat, we are currently evaluating Cursor. To gain hands-on experience, I began developing a Kubernetes device plugin that creates OVS VDUSE ports and exposes their vhost-vDPA char devices to pods (devel Jira task FDP-1513). The code is available in vduse-device-plugin. This plugin is part of proof of concept evaluating OpenShift and OVN-Kubernetes, where the primary network operates entirely in userspace (PoC story FDP-1286).

For this experiment, I am using Cursor-1.2.2-x86_64.AppImage with the Claude Sonnet 4 model and Cursor's Max Mode enabled, running in a fresh Fedora virtual machine. I took a careful and deliberate approach to "vibe coding", guiding Cursor through device plugin development by thoroughly assessing the ideas and approaches -- without writing code myself and reviewing the generated output only superficially.

The Kubernetes Device Plugins API has been stable for years, with numerous long-standing reference implementations available. Most device plugins are written in Go -- a widely used language known for its simplicity. Given this, many device plugins and Go projects are likely included in LLM training datasets, making this an ideal use case for Cursor. A surprising and instructional experience it was.

Our initial design draft in FDP-1309 described a Go application with two main components:

It would implement the Device Plugin API to create VDUSE ports in OVS, following Maxime's instructions in TESTING_VM2VM.md, and expose vhost-vDPA char devices for pods.
It would watch the Kubernetes API in a separate thread to delete VDUSE ports and their associated vhost-vDPA chardevs upon pod deletion. The ListAndWatch() function of the Device Plugin API would expose a distinct resource pool for each OVS bridge found on the host. For each pool, it would report a set of placeholder vhost-vDPA chardevs -- initially non-existent -- to Kubernetes. This is necessary because Kubernetes only allocates devices that have been reported in advance by a device plugins. The list of placeholders would grow dynamically as previous ones are allocated via the Allocate() function.

Starting with an empty directory on a fresh Fedora virtual machine, I worked with Cursor to incrementally build the device plugin from scratch. Major (paraphrased) prompts issued to Cursor in Agent mode included:

Create a Go application implementing the Kubernetes Device Plugin API to allocate VDUSE devices.
Add a separate thread to watch the Kubernetes API for pod deletion events and log them as a stub action.
Read TESTING_VM2VM.md (provided as context) and implement device creation and deletion logic accordingly.
Repeatedly implement and refine tests for each Go file.
Repeatedly identify bugs and suggest improvements.
Add steps to set up a KinD cluster on Fedora 42 and deploy the device plugin.
Add deployment configurations and instructions to push the plugin image to an internal KinD registry.
Follow the generated README.md to set up the KinD Cluster and deploy the plugin to the internal registry using the documented steps.

Each prompt started a new thread where Cursor and I iteratively implemented the task. Cursor did not ask for approval or discussion before generating code, which required follow-up prompts to steer its implementation. For example, I had to explicitly reference the SR-IOV Network Device Plugin early on to to help it draft the initial version from scratch. Reassuringly, Cursor always requested permission before executing commands, such as installing tools or adding Go dependencies. It also asked whether I accepted or rejected generated code and never overwrote existing files or code without confirmation. Additionally, Cursor automatically created checkpoints for each prompt thread, allowing me to revert to previous project states when needed.

Within a few hours, we had implemented all features from the initial design draft, along with a set of tests for each Go file. Cursor also generated additional assets on its own, including a README.md, a Makefile, and Kubernetes manifests for deploying the device plugin. However, this was followed by a tedious and repetitive debugging session that lasted half a day. In each round, I would start a new prompt asking Cursor to find bugs in its own code. Cursor typically identified around ten issues per round, categorizing them by severity: critical, high, medium, and low. It then attempted to fix them in descending order of of priority. In doing so, Cursor often introduced new code -- especially when addressing logic or concurrency bugs -- that broke existing tests or introduced new issues, which would be caught in subsequent rounds.

On the first day, I repeated the fix cycle until Cursor had generated a total of 8000 lines of code. To contain code bloat, I had to refine my prompts significantly over time -- encouraging Cursor to propose concise, targeted fixes and to rerun and repair tests afterward.

Regarding tests: they often contained more code than functionality they verified. Despite this, the coverage reports Cursor generated consistently contradicted its own assessment -- showing that tests covered only a fraction of the core logic, with coverage varying from 100% down to 0%. Additionally, when unable to resolve test failures, Cursor frequentely attempted to bypass them by inserting code to skip tests.

On two separate occasions, Cursor's (aka VS Code's) built-in linter flagged unused variables caused by its own stub implementations rather than complete code. These stubs were clearly marked with comments starting with "For a real implementation...". However, when prompted to implement all stub functions, Cursor did not recognize them as such. I had to explicitly instruct it that the phrase "For a real implementation" indicates a stub.

An interesting and unexpected aspect of Cursor's tightly integrated AI workflow is is how AI can sometimes work around limitations of the tool itself. For example, Cursor can gain root access using sudo -s, but since this command opens a persistent root shell and does not return, the AI fails to detect its success -- causing the prompt to hang indefinitely. To proceed, the user must manually "skip" the command and inform Cursor that it succeeded but did not return. Subsequent commands executed within that root shell will also hang, requiring manual skipping. However, in these cases, Cursor typically infers from context that the hanging is expected behavior due to the persistent shell, not an error.

After two weeks of work with Cursor on the device plugin, it reached a sufficiently functional state. I uploaded its source code to vduse-device-plugin. The KinD cluster, pods running ovsdb-server and ovs-vswitchd and userspace (netdev) bridges, the device plugin, and example pods all deploy successfully. The device plugin registers devices with Kubernetes, creates VDUSE ports in OVS bridges, binds them to the vhost-vDPA kernel module, and exposes the resulting vhost-vDPA char devices to pods. For a full list of features, see the README.md. Ultimately, I suspended development of the device plugin in favor of an alternative design. Rationale and a detailed comparison of both designs is available here.

Developing with Cursor resembles mentoring a junior developer, but with much faster feedback cycles. Within six hours, I reached the limit of 500 included requests, after which Cursor automatically switched to usage-based spending. After two days of exclusively working on the device plugin, the Cursor Dashboard reported approximately ~50$ in charges.

My preliminary evaluation is as follows:

Developing with Cursor offers a surprisingly different and refreshing experience when adopting a careful, deliberate form of vibe coding. It shifts the developer's role from hands-on design and implementation to primarily code review, which can limit deep, end-to-end understanding of the problem and solution. Within a few hours, Cursor generates a code structure that closely aligns with the ideas outlined in the prompts. The code compiles but is initially verbose and minimally functional due to numerous stub implementations -- sometimes hidden -- and various errors such as logical bugs, concurrency problems, race conditions, and deadlocks. Tests tend to be verbose, through coverage is often insufficient. Similar to a sculptor refining raw material, the developer iteratively improves the generated code -- again with Cursor's assistance -- until it reaches acceptable quality. With Cursor, this refinement cycle of testing, debugging, and fixing becomes the most time-consuming part of development. Critics argue that this process diminishes the joy of software development, while proponents claim it allows developers to focus more on solving customer problems.
Cursor automates the process where an engineer carefully considers design choices and tradeoffs, iteratively working towards an optimal solution for a given problem. The resulting code -- whether generated or handwritten -- is the product of this process. When AI generates code, an engineer must independently develop the solution to effectively review the output and critically evaluate the design decisions made. Therefore, I believe AI-generated code is less suitable for critical product functionality where code quality is essential. However, it can provide significant value for narrow-scope problems, proof of concepts, non-core features, or less critical code, where design rigor and code quality are less demanding.
Cursor (and LLMs in general) excels at generating boilerplate code, such as Python unit tests. However, boilerplate code still requires maintenance and can become a liability over time. Whenever boilerplate code seems necessary, alternative approaches should be considered-- such as relying more on integration tests, adopting stronger typing or advanced type systems, or switching to a more expressive and powerful programming language.
Cursor can help reduce procrastination of tedious, repetitive, or uninteresting tasks. Knowing that these can be handled (semi-)automatically can boost morale and lower the barrier of entry or motivation to start.
Reviewing, testing, debugging, and fixing Cursor's generated code requires significant time and effort. There is a risk that users apply changes without thorough review, especially given the extensive amount and complexity of generated output.
Cursor can quickly and reliably perform refactorings, especially smaller ones. For example, it renamed Makefile targets to improve naming consistency and symmetry, and updated multiple Dockerfiles by switching base images from Alpine to Red Hat UBI to Fedora Rawhide -- adapting commands and package managers from apk to dnf to microdnf.
Cursor (like other leading LLMs) can assist with exploring unfamiliar domains, brainstorming ideas, understanding existing code, and searching the web.
Cursor does not integrate well with my existing development environment. It is only available as an AppImage, with no Flatpak support -- limiting integration and sandboxing options. Additionally, Cursor is proprietary. I would welcome efforts to standardize AI-assisted development features across editors (similar to the Language Server Protocol), enabling open-source tools like Kate to support them as well.
Running Cursor inside a VM increases the maintenance burden. It no longer integrates seamlessly with the host filesystem -- where my code resides -- or with host-based containers that encapsulates the development tools in a reproducible, disposable environment. Additionally, the VM introduces a second system that requires separate updates and management.
Cursor currently exhibits usability and performance issues; for example, when running inside a VM, the UI often becomes sluggish and can causes 100% CPU utilization across all cores during rendering, despite using a virtio GPU.
Cursor also experiences stability issues, with the AI activity occasionally hanging unexpectedly. Typically, stopping the activity and prompting it to continue resolves the issue.

For regular day-to-day development, I will maintain my current highly customized setup, though I would welcome similar AI-assisted features integrated into my existing tools. If new insights arise, I will update this evaluation accordingly.

JM1/cursor_review.md

A Cursor review