The commit splitting logic uses a fixed batch size of 3 files when autoSplit is enabled. In src/stores/commit/commitStore.ts, line ~340:
const batchSize = autoSplit ? 3 : Math.max(1, selected.length);
const totalCommits = Math.max(1, Math.ceil(selected.length / batchSize));The actual grouping happens in the Rust backend via three Tauri commands:
commit_generate_plan(Apple AFM)ollama_commit_generate_plan(Ollama local models)remote_commit_generate_plan(BYO API keys)
- Files are batched in groups of 3 (hardcoded)
- Each file is analyzed using the Diffsense diff algorithm to distribute context fairly
- No semantic understanding of which files belong together logically
@eonist's suggestion in the follow-up comment outlines a smarter algorithm:
- Dossier creation - Generate lightweight metadata for each file change optimized for grouping decisions
- AI-powered grouping - Use heuristics + AI to determine logical commit boundaries based on:
- Related functionality (same feature/module)
- Dependency relationships
- Change type (refactor vs feature vs fix)
- User intent respect - Honor custom prompt instructions for grouping preferences
- Final review pass - AI validates the groupings make semantic sense
| Aspect | Current | Proposed |
|---|---|---|
| Batch size | Fixed (3 files) | Dynamic, context-aware |
| Grouping logic | File count heuristic | Semantic relationship analysis |
| Scalability | O(n/3) commits | Recursive algorithm for 10-1000 files |
| Model support | Same across all | Works on AFM, improves on larger models |
The bottleneck is inference time, not capability. Even with Apple's AFM (smallest local model), the algorithm could be recursive - taking 6-10 seconds for large refactors is acceptable when quality improves. Users wanting faster results could use larger models via BYO API keys or faster hardware.
- Create a
FileDossierstruct in Rust with: path, change type, affected symbols/imports, diff summary - Add a grouping phase before message generation that uses AI to cluster related dossiers
- Make the batch size configurable (remove hardcoded
3) - Add unit/integration tests as both @eonist and @dernDren161 mentioned regressions are becoming hard to catch
This is an interesting problem because it sits at the intersection of code understanding, user intent, and practical performance constraints. Here are some ideas beyond what's been discussed:
Before involving AI at all, build a dependency graph from static analysis:
- Parse imports/requires to find file relationships
- Detect shared symbols (functions, classes, types referenced across files)
- Use community detection algorithms (like Louvain) to find natural clusters
This is fast, deterministic, and gives AI a head start. The AI then only needs to validate or refine clusters rather than discover them from scratch.
Instead of full LLM inference for grouping:
- Generate lightweight embeddings for each diff (could use AFM's embedding mode or a small local model)
- Cluster diffs by vector similarity
- Use the LLM only for naming/describing the clusters
This could be 10x faster than having the LLM reason about all files simultaneously.
For large changesets (50+ files), use a divide-and-conquer strategy:
100 files → group by directory → 8 directory clusters
Each cluster → AI refines into logical commits
Result: ~15-20 well-organized commits
This keeps context windows manageable and scales predictably.
Before semantic grouping, categorize changes by type:
| Category | Signals |
|---|---|
| Refactor | Renames, moves, no new exports |
| Feature | New files, new public APIs |
| Fix | Small modifications, test additions |
| Chore | Config files, dependencies, docs |
Then group within categories. A "refactor" commit shouldn't mix with a "feature" commit even if they touch related files.
A simple but high-value heuristic: always group test files with their implementation. If UserService.ts changed, UserService.test.ts belongs in the same commit. This is purely path-based and catches a common mistake.
Input: 47 file changes
Phase 1: Static Analysis (instant, no AI)
├── Directory clustering
├── Import graph analysis
├── Test file pairing
└── Output: 8 candidate clusters
Phase 2: AI Refinement (fast, small prompts)
├── For each cluster: "Should this be 1 or 2 commits?"
├── Cross-cluster: "Do any files belong elsewhere?"
└── Output: 12 refined commits
Phase 3: Message Generation (current approach)
└── Generate title/body per commit
The key insight is that grouping and describing are separate problems. The current implementation conflates them. Splitting these phases means:
- Phase 1 handles 80% of cases with zero AI cost
- Phase 2 uses small, focused prompts (faster, works on AFM)
- Phase 3 stays as-is
- Drag-and-drop regrouping in the UI, then AI regenerates only affected commit messages
- Grouping presets: "by directory", "by feature", "atomic (1 file = 1 commit)"
- Learn from corrections: if user moves a file between commits, remember that pattern
If I were implementing this:
- Short term: Remove hardcoded
3, make itceil(files.length / target_commits)wheretarget_commitsis user-configurable - Medium term: Add test-file pairing heuristic + directory-based pre-clustering
- Long term: Embedding-based similarity with AI refinement
The recursive/dossier approach @eonist suggested is good for quality, but the hybrid approach might give 90% of the benefit at 20% of the inference cost. Worth prototyping both and comparing.
the file changes gives two infos: the diff that actual changed, and access to entire file of course. how do you factor that in? or doesnt it change anything?
That changes things significantly. The full file context is much richer than the diff alone.
| Signal | From Diff | From Full File |
|---|---|---|
| What changed | ✓ Direct | ✗ |
| Change magnitude | ✓ Lines added/removed | ✗ |
| Function/class modified | ~ Partial (if name in diff) | ✓ Full structure |
| File's role in codebase | ✗ | ✓ Imports, exports, patterns |
| Related files | ✗ | ✓ Import statements |
| Domain/feature area | ~ From path/variable names | ✓ Full context |
You're essentially doing "blind" grouping - you see what changed but not why it matters or how files relate. Two files might both add 10 lines, but without full context you can't tell one is a React component and the other is its API hook.
You can extract:
// From UserProfile.tsx full file:
import { useUserData } from './hooks/useUserData';
import { Avatar } from '@/components/Avatar';
import type { User } from '@/types/user';
// Now you KNOW this file relates to useUserData.ts, Avatar.tsx, user.tsThis import graph is deterministic and free - no AI needed.
For each changed file, parse the full file to get:
interface FileDossier {
path: string;
diff: string; // What changed
imports: string[]; // What it depends on
exports: string[]; // What it exposes
fileType: 'component' | 'hook' | 'util' | 'test' | 'config' | 'type';
detectedDomain: string; // From path: "user", "auth", "checkout"
}UserProfile.tsx ──imports──→ useUserData.ts
UserProfile.tsx ──imports──→ Avatar.tsx
UserProfile.test.tsx ──tests──→ UserProfile.tsx
If UserProfile.tsx and useUserData.ts both changed → strong signal they belong together.
Use AI for ambiguous cases:
- Two files both changed, no import relationship, but both touch "user" domain
- Large file with multiple unrelated changes (could be split into 2 commits)
- Determining if a change is "refactor", "feature", or "fix"
Diff = what changed. Full file = why it matters.
For grouping, "why it matters" is often more valuable. A one-line change to an API endpoint file might be the cause of 15 other file changes (updating consumers). The diff alone makes it look minor; the full file reveals it's the central piece.
You can't send 47 full files to the LLM - that'll blow context limits. But you can:
- Send full files through a local parser (AST parsing, regex for imports) - instant, unlimited
- Build the relationship graph from that
- Send only diffs + dossier summaries to the LLM
File: src/hooks/useUserData.ts
Imports: react, @/api/userApi, @/types/user
Exports: useUserData (hook)
Domain: user
Diff: +12 -3 (added caching logic)
File: src/components/UserProfile.tsx
Imports: react, ./hooks/useUserData, @/components/Avatar
Exports: UserProfile (component)
Domain: user
Diff: +5 -2 (added loading state)
Relationship: UserProfile imports useUserData (BOTH CHANGED)
This compressed representation lets the AI make smart grouping decisions without seeing 2000 lines of code.
Our smallest model. AFM only has 12kb token window. is all this possible even with tiny context windows? we can always call AFM many times with fresh context windows. so the strategy must be to create temporary summaries that feed into the next etc. so divide and conquer etc
Exactly. 12KB is roughly 3000 tokens - that's tight. But the constraint actually forces a cleaner architecture.
Never ask the LLM to see everything at once. Ask it many small questions and aggregate.
With 12KB you can fit roughly:
| Content | Tokens |
|---|---|
| System prompt | ~300 |
| 1 file dossier (compressed) | ~100-150 |
| 1 diff (typical) | ~200-500 |
| Response | ~200-300 |
So realistically: 10-15 files per call maximum, and that's without full diffs.
Input (tiny):
File: src/hooks/useUserData.ts
Imports: react, @/api/userApi
Exports: useUserData
Diff: [50 lines]
Classify: refactor | feature | fix | chore
Domain: [one word]
Summary: [one sentence]
Output:
{ "type": "feature", "domain": "user", "summary": "Added caching to user data hook" }This runs in parallel - 47 files = 47 calls, but they're fast and concurrent.
Now you have 47 tiny summaries (~50 tokens each). Feed batches to the LLM:
These files changed. Which belong together?
1. [feature][user] Added caching to user data hook
2. [feature][user] Added loading state to profile component
3. [fix][auth] Fixed token refresh race condition
4. [feature][user] Added cache invalidation to user API
5. [chore][config] Updated eslint rules
...
Group by logical commit. Output: [[1,2,4], [3], [5], ...]
With compressed summaries, you can fit 40-50 files in one call.
For each group:
Files in this commit:
- useUserData.ts: Added caching to user data hook
- UserProfile.tsx: Added loading state to profile component
- userApi.ts: Added cache invalidation
Write commit title and body.
Small input, focused output.
47 files
│
▼ (parallel, 47 calls)
┌─────────────────────┐
│ Pass 1: Classify │ "What is this change?"
│ 1 file per call │
└─────────────────────┘
│
▼ (47 summaries, ~2KB total)
┌─────────────────────┐
│ Pass 2: Cluster │ "What belongs together?"
│ All summaries │
└─────────────────────┘
│
▼ (say 8 groups)
┌─────────────────────┐
│ Pass 3: Describe │ "Write the commit message"
│ 1 group per call │
└─────────────────────┘
│
▼
8 commits with messages
When summaries don't fit in one Pass 2 call:
100 files
│
▼
100 summaries
│
├─► Batch 1 (50 summaries) → 5 proto-groups
└─► Batch 2 (50 summaries) → 5 proto-groups
│
▼
10 proto-groups
│
▼ (merge pass)
"Should any of these merge?"
│
▼
7 final groups
Don't waste tokens on things you can compute:
| Task | Method |
|---|---|
| Import extraction | AST parse or regex |
| Test file pairing | Path matching (*.test.ts → *.ts) |
| Directory grouping | String operations |
| File type detection | Extension + path patterns |
| Diff stats | Already have from git |
The LLM only answers questions that require judgment:
- Is this a feature or a fix?
- Do these two unrelated-looking files actually belong together?
- What's a good commit message?
| Phase | 47 files | 100 files |
|---|---|---|
| Local parsing | <100ms | <200ms |
| Pass 1 (parallel) | ~2s | ~3s |
| Pass 2 | ~1s | ~2s (2 batches) |
| Pass 3 | ~2s (8 commits) | ~3s (12 commits) |
| Total | ~5s | ~8s |
Assuming AFM runs at ~500 tokens/sec locally. Totally acceptable for a "smart split" operation.
12KB isn't a limitation - it's a feature. It forces you to:
- Decompose the problem properly
- Keep each step focused and debuggable
- Parallelize naturally
- Cache intermediate results (summaries can be reused if user tweaks grouping)
Bigger models let you be lazy. Small models force good architecture.
4K tokens is ~1000 words. That's genuinely tight. Let me rethink.
| Content | Tokens |
|---|---|
| System prompt | ~150 (must be minimal) |
| Input | ~250-300 |
| Response | ~100-150 |
| Usable per call | ~300 tokens of actual content |
That's roughly one file with diff per call, or 20-25 compressed summaries without diffs.
src/hooks/useUserData.ts
+import cache from './cache'
+const cached = cache.get(userId)
+if (cached) return cached
-return fetchUser(userId)
+const data = fetchUser(userId)
+cache.set(userId, data)
+return data
Type? Domain? 10-word summary.
Response: feature | user | Added caching layer to user data fetching
Still works at 4K. One file per call.
25 summaries max per call. For 47 files:
Call 1: files 1-25 → proto-groups A
Call 2: files 26-47 → proto-groups B
Call 3: merge A + B → final groups
Commit: useUserData.ts, UserProfile.tsx, userApi.ts
- Added caching layer to user data fetching
- Added loading state for cache delays
- Added cache invalidation endpoint
Title + body.
~150 tokens in, ~100 out. Works.
| Approach | 12K | 4K |
|---|---|---|
| Files per classify call | 3-5 | 1 |
| Summaries per cluster call | 50 | 25 |
| Merge passes needed (100 files) | 1 | 2-3 |
| System prompt | Detailed | Bare minimum |
At 4K, every token counts. Use structured formats:
Instead of:
Please analyze this file change and tell me what type of change it is...
Use:
[src/hooks/useUserData.ts]
+cache.get +cache.set -fetchUser direct
?type ?domain ?sum
Response:
feat user cache-user-fetch
Brutal, but you can fit 3-4 files per call this way.
| Phase | 47 files | 100 files |
|---|---|---|
| Pass 1 | ~3s (parallel) | ~5s |
| Pass 2 | ~2s (2 cluster + 1 merge) | ~4s (4 cluster + 2 merge) |
| Pass 3 | ~2s | ~3s |
| Total | ~7s | ~12s |
More calls, but still fast because each call is tiny and AFM responds quickly to short prompts.
If even this is too slow, make Pass 1 optional:
- Default: Use only local heuristics (imports, paths, test pairing) for grouping
- "Smart mode": Enable AI classification when user wants better results
Most commits are <10 files anyway. The 4K limit only really hurts on large refactors, and users will tolerate 10-15 seconds for those.
and this must also work on bigger models. we offer ollama support as well. and byo keys with gemini 3, anthropic opus 4.5 and gpt 5.2 etc
That's the key constraint then. The architecture must scale up gracefully, not just survive at the bottom.
| Tier | Models | Context | Strategy |
|---|---|---|---|
| Tiny | AFM | 4K | Many micro-calls |
| Medium | Ollama (Llama, Mistral) | 8K-32K | Batched calls |
| Large | Gemini 3, Opus 4.5, GPT 5.2 | 128K-1M+ | Few or single call |
interface ModelCapabilities {
contextWindow: number;
tokensPerSecond: number;
costPerToken?: number; // null for local
}
function planStrategy(files: FileDossier[], model: ModelCapabilities) {
const totalTokensNeeded = estimateTokens(files);
if (totalTokensNeeded < model.contextWindow * 0.7) {
return 'single-pass'; // Send everything at once
} else if (model.contextWindow >= 32000) {
return 'chunked-parallel'; // Few large batches
} else {
return 'micro-pipeline'; // Many tiny calls (AFM path)
}
}When you have 128K+ tokens, just send everything:
Here are 47 file changes with their diffs and full import context.
Group them into logical commits. For each commit, provide:
- Files included
- Commit title
- Commit body
Consider: related functionality, test/implementation pairing,
change type (feature/fix/refactor), user instructions: "{customPrompt}"
[Full dossiers + diffs for all 47 files]
One call. Done in 3-5 seconds. Higher quality because the model sees all relationships simultaneously.
For 16-32K context:
Pass 1: Classify + Cluster combined
- Send 15-20 files per call
- Ask for both classification AND preliminary grouping
- 3 calls for 47 files
Pass 2: Merge + Describe
- Reconcile groups across chunks
- Generate messages
- 2-3 calls
The aggressive chunking strategy from before. Many calls, tiny payloads.
interface SplitPlan {
commits: Array<{
files: string[];
title: string;
body: string;
}>;
}
// Single interface, multiple implementations
async function generateSplitPlan(
files: FileChange[],
model: ModelBackend,
customPrompt?: string
): Promise<SplitPlan> {
const capabilities = await model.getCapabilities();
const strategy = planStrategy(files, capabilities);
switch (strategy) {
case 'single-pass':
return singlePassSplit(files, model, customPrompt);
case 'chunked-parallel':
return chunkedSplit(files, model, customPrompt);
case 'micro-pipeline':
return microPipelineSplit(files, model, customPrompt);
}
}The UI doesn't care which path runs. Same input, same output, different execution.
| Model | Time (47 files) | Quality | Cost |
|---|---|---|---|
| AFM | ~7s | Good | Free |
| Ollama 7B | ~5s | Good | Free |
| Ollama 70B | ~8s | Better | Free |
| GPT 5.2 | ~3s | Best | ~$0.02 |
| Opus 4.5 | ~4s | Best | ~$0.05 |
Users self-select. Local-first folks use AFM/Ollama. Power users with API keys get faster + better results.
Same logical prompt, different verbosity:
const PROMPTS = {
classify: {
tiny: '[{path}]\n{diff_summary}\n?type ?domain ?sum',
medium: 'File: {path}\nChanges: {diff}\nClassify type, domain, summary.',
large: 'Analyze this file change:\nPath: {path}\nFull diff:\n{diff}\n\nProvide classification...'
},
cluster: {
tiny: 'Group:\n{summaries}\nOutput: [[ids],...]',
medium: 'Group these changes into logical commits:\n{summaries}',
large: 'You are organizing code changes into atomic commits. Consider semantic relationships, shared functionality, test coverage...\n{full_context}'
}
};
function getPrompt(task: string, tier: 'tiny' | 'medium' | 'large') {
return PROMPTS[task][tier];
}Build for AFM first. That forces clean decomposition. Then the larger models just... work better automatically because:
- They can see more context per call → better relationship detection
- Fewer calls → lower latency
- More reasoning tokens → better edge case handling
The micro-pipeline isn't a compromise - it's the foundation. Large models just collapse multiple passes into one.