Yes, this feature is absolutely buildable — the spec, the API surface, and the existing Omni architecture all line up cleanly. Here's my assessment across both the feature itself and AFM compatibility.

Feature Feasibility

The core mechanism is straightforward: a single POST /repos/{owner}/{repo}/pulls/{pull_number}/reviews that atomically submits a summary, a verdict (APPROVE / REQUEST_CHANGES / COMMENT), and an array of inline comments each pinned to a path, line, and side . Omni already has the GitHub OAuth identity and write capabilities in githubWrite.ts, so the API plumbing is an extension of what exists .

The UX spec in the issue comments maps directly onto the existing PRView component tree — adding a ReviewList in the sidebar, a ReviewDetailView in the DetailsPanel, and inline annotations on the DiffViewer . The changes needed are well-scoped:

Component	Change
`Sidebar/index.tsx`	Add `<ReviewList />` below `<FileList />`
`DetailsPanel`	Add `ReviewDetailView` as a new view option
`FileDetailView.tsx`	Pass review comments to `DiffViewer`, render annotations
`BottomBar.tsx`	Add "Submit Review" action when a draft exists
`usePRStore` / `useUIStore`	Track `selectedReviewId` and `draftReview` state

New files: ReviewList.tsx, ReviewDetailView.tsx, reviewStore.ts, useReview.ts .

The AI Generation Layer

The blog post describes Omni's 6-source enrichment pipeline (diff, project files, linked issues/PRs, web references, agent notes, style preferences) feeding a model that outputs structured inline comments + summary + verdict . The model needs to produce structured JSON like:

{
  "body": "Overall summary...",
  "event": "REQUEST_CHANGES",
  "comments": [
    { "path": "src/auth.ts", "line": 42, "side": "RIGHT", "body": "..." }
  ]
}

This is the part where AFM compatibility matters.

AFM Support Assessment

Apple Foundation Models (the on-device models available via the FoundationModels framework on macOS with Apple Intelligence) can technically handle this, but with important caveats:

What works well:

AFM supports structured generation / tool calling, which maps directly to outputting the review JSON schema (path, line, side, body per comment)
Running on-device means zero latency to a cloud API and no token costs — great for the "one button, full review" UX
In a Tauri app, you can bridge to AFM via a Swift native plugin (Tauri's plugin system → Rust FFI → Swift FoundationModels framework)

What's challenging:

Context window size — AFM's on-device models are significantly smaller than cloud models (Claude, GPT-4, etc.). Fitting all 6 context sources (full diff + project files + issues + web refs + agent notes + style prefs) into a single prompt may exceed what AFM can handle for large PRs
Reasoning quality — generating accurate line numbers and correctly correlating diff hunks with project-wide context requires strong reasoning. Smaller on-device models may produce less precise line-pinned comments than a cloud model
Availability — AFM requires Apple Silicon + macOS with Apple Intelligence enabled, which limits your user base to recent Macs

Recommended approach:

Make the model layer swappable — define a ReviewGenerator interface that takes the enriched context and returns the structured review JSON
Implement AFM as one backend (via Tauri native Swift plugin), alongside cloud model backends (OpenAI, Anthropic, etc.)
For AFM specifically, use a chunked strategy: summarize each of the 6 sources individually first, then feed the compressed context into the final review generation call to stay within the on-device context window
Fall back to cloud models for large PRs where the context exceeds AFM's capacity

Verdict

The feature is very doable — the spec is tight, the API is atomic, and it layers cleanly onto the existing PR view . AFM can work as one of several model backends, but shouldn't be the only backend due to context window and reasoning limitations on large diffs. The right architecture is a pluggable model interface where AFM handles quick reviews on smaller PRs locally, and cloud models handle the heavy lifting when needed .

bu can the AFM model reason about code review? say if I make a pr on OAuth. and I forgot to add a comment to one code block. would AFM be able to catch something like that?

Honestly, no — AFM almost certainly would not reliably catch something like a missing comment on an OAuth code block. Here's why.

The Model Is Too Small for Code Reasoning

AFM on-device is a ~3B parameter model quantized down to 2 bits, making its effective capacity even smaller. It was designed and fine-tuned for consumer text tasks — summarization, writing tools, content tagging, smart replies — not code understanding or software engineering reasoning. Apple's own documentation describes it as a "text-based on-device model" that "identifies patterns for generating new text".¹²³⁴⁵

For context on what model sizes can actually do with code:

Task	What It Takes
"Is this comment missing?"	Understand code semantics + project conventions — needs ~7B+ code-specialized model minimum
"This OAuth flow doesn't handle token expiry"	Reason about security patterns across files — needs frontier-class model (Claude, GPT-4)
Reliable line-pinned inline review comments	Correlate diff hunks with project context — needs strong reasoning + large context window

Even purpose-built code models at 7B parameters (CodeLlama-7B, DeepSeek-Coder-6.7B) struggle with nuanced review tasks. A general-purpose 3B model at 2-bit quant is not in the same league.⁶

What AFM Is Actually Good At

The framework excels at structured generation (guided output that conforms to a Swift schema), tool calling, and short-form text tasks. So AFM could reliably format the review JSON structure — it's the reasoning content inside that structure that would be shallow or wrong.⁷⁸

A realistic failure mode: AFM might generate a review that looks correct (proper path, line, side fields) but the body comments would be surface-level parroting like "consider adding documentation" rather than actually understanding that your specific OAuth code block implements PKCE and the missing comment should explain the code verifier derivation.

Guardrails Are Also a Problem

Developers have reported that AFM has aggressively restrictive guardrails that can refuse or error on unexpected content categories. Code that touches auth, security, or cryptographic patterns could potentially trigger refusals depending on how the content filters interpret the prompt — which would be a showstopper for reviewing OAuth PRs.⁹

What Would Actually Work

For Omni's review feature to deliver the quality described in your blog post — where the AI "sees how that line connects to the rest of the codebase, whether it contradicts a linked issue" — you need a model that can:

Hold the full context — diff + project files + issues + style prefs often exceeds 3B model's practical context window
Reason about code semantics — not just pattern-match, but understand what the code does
Generate precise line-pinned feedback — map insights back to exact diff locations

That's firmly in frontier model territory (Claude Sonnet/Opus, GPT-4o, Gemini Pro). AFM could serve as a fast fallback for trivial tasks — like generating a PR summary from the diff — but for the actual code review intelligence, you'd want to route through a cloud model. The pluggable ReviewGenerator interface approach still makes sense, but AFM would be the "lite mode," not the real thing. ¹⁰¹¹¹²¹³¹⁴¹⁵¹⁶¹⁷¹⁸¹⁹²⁰²¹²²²³²⁴²⁵²⁶²⁷

⁂

what ollama model would work with this review feature?

For Omni's review feature specifically, Qwen 2.5 Coder 32B is your best bet on Ollama — it's the sweet spot between running locally and having enough reasoning capability to generate meaningful code review comments.²⁸²⁹

Recommended Models by Hardware

Model	VRAM Needed	Aider Score	Code Review Quality
Qwen 2.5 Coder 32B	~20GB (Q4)	72.9%	Good — matches GPT-4o level, can catch missing comments, auth pattern issues
Qwen 2.5 Coder 14B	~9GB (Q4)	69.2%	Decent — catches obvious issues, may miss nuanced context
DeepSeek R1 Distill 32B	~20GB (Q4)	—	Strong reasoning but slow — better for complex logic bugs
Qwen3-Coder	Varies	—	Newer option with longer context window, good for large diffs
Qwen 2.5 Coder 7B	~5GB (Q4)	57.9%	Too shallow for real code review — fine for autocomplete only

The 32B Qwen Coder scores 72.9% on the Aider benchmark, which is on par with GPT-4o. That's the threshold where a model can realistically look at your OAuth diff, notice a code block lacks documentation, and write a meaningful inline comment like "this PKCE verifier derivation should have a comment explaining the flow."³⁰

Why 32B Is the Minimum for Reviews

Code review is fundamentally harder than code generation — the model needs to simultaneously hold the diff, understand the surrounding codebase context, reason about what's missing (not just what's wrong), and output structured JSON with precise line numbers . The 7B and even 14B models struggle with this multi-concern reasoning because they hold fewer patterns simultaneously. The gap becomes obvious when you're asking "what did the developer forget?" versus "write me a function."³⁰

Practical Setup for Omni

Since you're on macOS (likely M-series), you're in a good position — Apple Silicon shares unified memory between CPU and GPU:³⁰

M2/M3 Max (32GB+) → Qwen 2.5 Coder 32B runs comfortably at Q4
M2/M3 Ultra (64GB+) → Could even run 70B models, though overkill for this feature
M2/M3 Pro (18GB) → Limited to 14B, which is borderline for review quality

ollama pull qwen2.5-coder:32b

From your Tauri app, you'd hit Ollama's local API (http://localhost:11434/api/chat) with the enriched context from your 6-source pipeline and a structured output prompt that produces the review JSON.²⁹

The Honest Trade-off

Even the 32B model will produce shallower reviews than Claude Sonnet or GPT-4o for complex PRs — it may catch the missing comment but miss that your OAuth flow has a subtle CSRF vulnerability in the state parameter handling. The hybrid approach makes the most sense: Ollama 32B as the default local engine for fast, private, cost-free reviews, with a cloud model fallback for when the user wants a deeper analysis. Your ReviewGenerator interface can expose this as a simple toggle — "Quick Review (local)" vs "Deep Review (cloud)."³⁰ ³¹³²³³³⁴³⁵³⁶³⁷³⁸³⁹⁴⁰⁴¹⁴²

⁂

what about for MacBook m1 16gb

That's a tight squeeze. On an M1 with 16GB unified memory, macOS itself eats ~4–6GB, leaving you roughly 10–12GB for the model + context — which seriously limits your options.⁴³⁴⁴

What Actually Fits

Model	Size (Q4)	Fits in 10–12GB?	Code Review Quality
Qwen 2.5 Coder 7B	~4.5GB	✅ Comfortable, room for context	Surface-level — catches obvious stuff
DeepSeek Coder V2 Lite (16B)	~9GB	⚠️ Tight, minimal context window	Better reasoning, but may swap
Qwen 2.5 Coder 14B	~9GB	⚠️ Tight, reported laggy on M1 16GB	Decent, but real-world performance suffers
Gemma 3 12B	~7.5GB	✅ Fits with headroom	General-purpose, not code-specialized
Qwen 2.5 Coder 32B	~20GB	❌ Won't fit	—

The fundamental problem is that even if a 14B model loads, you'll be memory-starved for context. Multiple users on 16GB Macs report that anything above ~12B leads to heavy swap usage, which on M1 means token generation drops to single-digit tokens/sec — painfully slow for generating a multi-comment review.⁴⁵⁴⁴

The Realistic Pick: Qwen 2.5 Coder 7B

ollama pull qwen2.5-coder:7b

This runs at roughly 19–35 tokens/sec on M1 16GB with room to spare for your enrichment context. The catch is that it scores ~57.9% on the Aider benchmark — which means for your OAuth PR scenario:⁴⁶⁴³

✅ Would probably catch a completely missing docstring/comment on a function
✅ Can identify obvious issues like unused variables, missing error handling
⚠️ Might miss why a specific code block needs a comment (e.g., "this implements PKCE, explain the verifier")
❌ Unlikely to catch subtle auth logic flaws or cross-file concerns

A More Honest Take for Omni

For an M1 16GB user, the local-only code review would produce noticeably weaker results than the cloud path. The practical approach for your ReviewGenerator interface would be:

Local (7B) → marketed as "Quick Review" — fast, private, catches low-hanging fruit, good enough for formatting/docs/obvious issues
Cloud (Claude/GPT-4o) → marketed as "Deep Review" — the full 6-source enrichment pipeline with real code reasoning

You could also do a hybrid: run the 7B locally to generate a draft review skeleton (identify which files/lines are interesting), then send just those targeted chunks to a cloud model for the actual comment text. This keeps the cloud token cost low while producing quality output even on constrained hardware.

If Users Upgrade Later

It's worth noting the M1 16GB is now a 5-year-old machine. For users on M3/M4 Pro with 36GB+, the 32B model becomes viable and the local experience gets dramatically better. Building the pluggable architecture now means those users automatically get the upgrade path without any feature changes on your side.⁴⁵ ⁴⁷⁴⁸⁴⁹⁵⁰⁵¹⁵²⁵³⁵⁴⁵⁵⁵⁶⁵⁷

⁂

eonist/limitation with local models for code review.md

Select an option

No results found

Select an option

No results found

Feature Feasibility

The AI Generation Layer

AFM Support Assessment

Verdict

bu can the AFM model reason about code review? say if I make a pr on OAuth. and I forgot to add a comment to one code block. would AFM be able to catch something like that?

The Model Is Too Small for Code Reasoning

What AFM Is Actually Good At

Guardrails Are Also a Problem

What Would Actually Work

what ollama model would work with this review feature?

Recommended Models by Hardware

Why 32B Is the Minimum for Reviews

Practical Setup for Omni

The Honest Trade-off

what about for MacBook m1 16gb

What Actually Fits

The Realistic Pick: Qwen 2.5 Coder 7B

A More Honest Take for Omni

If Users Upgrade Later

eonist/limitation with local models for code review.md

Feature Feasibility

The AI Generation Layer

AFM Support Assessment

Verdict

bu can the AFM model reason about code review? say if I make a pr on OAuth. and I forgot to add a comment to one code block. would AFM be able to catch something like that?

The Model Is Too Small for Code Reasoning

What AFM Is Actually Good At

Guardrails Are Also a Problem

What Would Actually Work

what ollama model would work with this review feature?

Recommended Models by Hardware

Why 32B Is the Minimum for Reviews

Practical Setup for Omni

The Honest Trade-off

what about for MacBook m1 16gb

What Actually Fits

The Realistic Pick: Qwen 2.5 Coder 7B

A More Honest Take for Omni

If Users Upgrade Later

Footnotes