Claude Lint sepcification

GATE v1.1 Rule Catalog (44 Rules)

Category 1: Structure (R01-R05, R28, R30, R39) -- 30 points

R01 | Structure | MUST | "Identity section MUST appear first (max 5 sentences)"

Weight: 8 points
Automatable: YES
Failure mode: Claude begins work without project context, applies generic patterns rather than project-specific ones. Commands in section 2 lack semantic grounding.
Check: Parse H2 sections in order; assert first section matches identity patterns (contains project name, stack, purpose; no imperative verbs).

R02 | Structure | MUST | "Commands section MUST appear second"

Weight: 7 points
Automatable: YES
Failure mode: Commands buried after extensive context are less reliably parsed due to position bias. The instruction budget means early positioning matters mechanistically, not merely aesthetically.
Check: Assert second H2 section contains at least one fenced code block with shell commands.

R03 | Structure | SHOULD | "Architecture section SHOULD be present"

Weight: 5 points
Automatable: YES (presence check)
Failure mode: Claude makes structural assumptions based on convention rather than actual layout; edits wrong files or creates files in wrong locations.
Check: Assert presence of H2 section matching architecture patterns (contains directory tree or component descriptions). Conditional: required for files > 2,000 chars; optional for micro files.

R04 | Structure | SHOULD | "Conventions section SHOULD be present"

Weight: 3 points (reduced from 5 in v1.0; empirical rate only 18%)
Automatable: YES (presence check)
Failure mode: Claude applies training-data defaults (e.g., camelCase when project uses snake_case), requiring repeated correction.
Check: Assert presence of H2 section matching conventions patterns (naming, style, patterns).

R05 | Structure | SHOULD | "Constraints section SHOULD be present"

Weight: 2 points (new explicit scoring in v1.1; 14% prevalence but high safety value)
Automatable: YES (presence check + NEVER/MUST NOT pattern detection)
Failure mode: Claude performs actions that are locally reasonable but violate project invariants (deleting database volumes, skipping CI validation, adding AI attribution to commits).
Check: Assert presence of H2 section with MUST NOT or NEVER rules.

R28 | Structure | MUST | "Commands section MUST include build + test + lint"

Weight: 5 points
Automatable: YES
Failure mode: Claude spends tool calls discovering basic operations; misreads build system; executes wrong variants.
Check: Assert commands section contains recognizable build, test, and lint commands.

R30 | Structure | SHOULD | "Single responsibility per section"

Weight: 3 points
Automatable: PARTIALLY (content classification)
Failure mode: A "Development" section that mixes commands, architecture, conventions, and constraints reduces scannability and makes updates error-prone.
Check: Semantic classification of content per section; flag sections with > 2 content types.

R39 | Structure | SHOULD | "Section count SHOULD be 7 or fewer"

Weight: 3 points (new in v1.1 from corpus validation)
Automatable: YES
Failure mode: Files with 10+ sections average 65 GATE points vs 72 for files with 5-7 sections. Over-structuring signals content curation failure.
Check: Count H2 sections; full points for <= 7; partial (1 point) for 8-12; zero for > 12.

Category 1 total: 36 raw points (normalized to 30 in rubric)

Category 2: Precision (R06-R08, R26, R29, R31, R35) -- 35 points

R06 | Precision | SHOULD | "Total length SHOULD be <= 5,000 chars"

Weight: 10 points (graduated in v1.1)
Automatable: YES
Failure mode: Rules past approximately line 120 face systematically lower compliance. Linear decay means every additional instruction reduces adherence to all instructions proportionally.
Check: Character count with graduated scoring:
- <= 5,000 chars: 10/10
- 5,001-8,000 chars: 7/10
- 8,001-10,000 chars: 4/10
- 10,001-12,000 chars: 2/10
- 12,000 chars: 0/10

R07 | Precision | MUST | "Total length MUST NOT exceed 12,000 chars"

Weight: 5 points (plus -15 deduction if violated)
Automatable: YES
Failure mode: Beyond ~300 lines, the advisory framing actively filters content. Only 5% of the corpus exceeds this limit, and those files show clear anti-pattern characteristics.
Check: Count characters; fail if > 12,000.

R08 | Precision | MUST | "Every instruction MUST pass the removal test"

Weight: 20 points (operationalized across multiple checks)
Automatable: PARTIALLY (generic phrase detection automatable; quality judgment requires human review)
Failure mode: Generic instructions ("write clean code"), linter-duplicate rules, and standard conventions consume instruction budget while adding zero behavioral signal.
Check: (a) Generic phrase detection (7 pts) -- patterns: /write clean code|best practices|be concise|follow standards|maintainable|readable|properly|as needed|appropriate/. (b) No linter redundancy (5 pts via R29). (c) Appropriate length (10+5 pts via R06/R07). (d) Appropriate instruction count (5 pts via R35).

R26 | Precision | MUST | "No secrets or credentials"

Weight: -50 points (deduction, not addition -- critical failure)
Automatable: YES
Failure mode: Credentials committed to git and exposed publicly. Confirmed case: abretonc7s/metamask-mobile-automation contained live JWT token.
Check: Regex detection for API key patterns (/sk-[A-Za-z0-9]{48}/), JWT patterns, private key headers (/-----BEGIN (RSA |EC )?PRIVATE KEY-----/), bearer tokens, connection strings with passwords.

R29 | Precision | SHOULD | "No rules that linters already enforce"

Weight: 5 points
Automatable: YES (config file parsing)
Failure mode: "Use 2-space indentation" when prettier.config.js already sets tabWidth: 2 adds noise and consumes budget for zero behavioral gain.
Check: Cross-reference conventions against detected linter configs (.eslintrc, .prettierrc, ruff.toml, .golangci.yml); flag conventions that duplicate linter settings.

R31 | Precision | SHOULD | "No file-by-file descriptions"

Weight: 3 points
Automatable: YES
Failure mode: 50-line directory listing that describes every file individually, when ls src/ would provide the same information.
Check: Count individual file path entries in architecture section; warn if > 20 entries.

R35 | Precision | SHOULD | "Instruction count <= 80 (soft), <= 130 (hard)"

Weight: 5 points
Automatable: YES (bullet point counting)
Failure mode: Exceeding the instruction budget causes uniform degradation across all instructions, not selective non-compliance.
Check: Count lines starting with - or * (bullet points). Warn if > 80; fail if > 130. Thresholds are IFScale-derived (Claude 4.0 keyword-inclusion basis), conservative for Claude Sonnet 4.6.

Category 2 total: 48 raw points + -50 deduction (normalized to 35 in rubric)

Category 3: Style (R09-R14, R27, R32, R34) -- 20 points

R09 | Style | MUST | "Commands MUST be in fenced code blocks"

Weight: 5 points
Automatable: YES
Failure mode: Inline command references get reformatted, quoted incorrectly, or combined with surrounding text. Claude produces a command that looks right but fails at runtime.
Check: Assert every command-like string (contains make, npm, cargo, go, python, ./) appears inside ``` markers. 92% corpus compliance.

R10 | Style | SHOULD | "Instructions SHOULD use imperative voice"

Weight: 5 points
Automatable: YES (NLP verb detection)
Failure mode: Declarative descriptions are interpreted as context rather than rules. Claude reads "the project uses camelCase" as background information rather than a behavioral constraint.
Check: For each instruction-like sentence in non-identity sections, check for imperative verb at start ("Use", "Run", "Never", "Always", "Avoid", "Prefer"). > 70% imperative = full points.

R11 | Style | SHOULD | "Positive framing for convention rules"

Weight: 3 points
Automatable: YES
Failure mode: Negative conventions require Claude to infer the alternative, which may be wrong. Cognitive load increases when every rule is what not to do.
Check: Count negation-only convention rules; ratio should be < 30% of conventions.

R12 | Style | SHOULD | "Negative framing for safety constraints"

Weight: 3 points
Automatable: PARTIALLY
Failure mode: "Consider running ci/validate" vs "NEVER skip ci/validate" -- the former leaves room for interpretation.
Check: Assert constraints section uses NEVER/MUST NOT/Do not for safety rules.

R13 | Style | SHOULD | "Emphasis markers used sparingly"

Weight: 2 points
Automatable: YES
Failure mode: Emphasis inflation -- when MUST and CRITICAL appear on every rule, none of them stand out. Claude's attention is evenly distributed rather than spiking on truly critical rules. Note: 94% of corpus files use zero emphasis markers. For Claude 4.6, heavy emphasis can cause overtriggering.
Check: Count occurrences of MUST, NEVER, CRITICAL, ALWAYS, IMPORTANT; warn if > 5 per 5K chars, fail if > 10.

R14 | Style | SHOULD | "H2/H3 headers only (no H4 or deeper)"

Weight: 2 points
Automatable: YES
Failure mode: H4-H6 headers signal the file is trying to do too much. Content requiring deeper nesting belongs in a separate @-imported file.
Check: Assert no H4 (####) or deeper headers exist; count H2 sections (should be 3-7).

R27 | Style | MUST | "MUST NOT rules MUST include alternative"

Weight: 5 points
Automatable: YES
Failure mode: "Never use --foo-bar flag" -- Claude encounters a situation where it believes this flag is required. Without an alternative, it either ignores the rule or fails the task.
Check: Assert every Constraint of kind MUST_NOT has the alternative field populated.

R32 | Style | MAY | "Standard opener sentence recommended"

Weight: 2 points
Automatable: YES
Failure mode: Minor -- file works without it, but the opener is a low-cost grounding signal. Appears in 15% of corpus.
Check: First paragraph contains a variant of "This file provides guidance to Claude Code when working with code in this repository."

R34 | Style | MAY | "XML tags for critical multi-rule blocks"

Weight: 2 points
Automatable: PARTIALLY
Failure mode: Critical multi-rule blocks formatted as plain bullet lists are parsed with the same weight as non-critical content. Only 3% of corpus uses XML tags.
Check: Suggest wrapping for blocks with >= 3 critical instructions.

Category 3 total: 29 raw points (normalized to 20 in rubric)

Category 4: Organization (R17-R20, R33, R36) -- 10 points

R17 | Organization | SHOULD | "Use @-imports for references, not inline embedding"

Weight: 3 points
Automatable: YES
Failure mode: Embedding large documents inline consumes tokens for content needed only 10% of the time.
Check: Detect inline embedding of large content blocks (> 1,000 chars in a non-commands section) that could be @-referenced.

R18 | Organization | SHOULD | "Root CLAUDE.md: project-wide content only"

Weight: 3 points
Automatable: PARTIALLY
Failure mode: Module-specific rules in root CLAUDE.md create noise for unrelated tasks, consuming instruction budget unnecessarily.
Check: Flag rules that reference specific subdirectory paths; suggest moving to subdirectory CLAUDE.md or .claude/rules/.

R19 | Organization | SHOULD | "Module-specific content in .claude/rules/ or subdirectory CLAUDE.md"

Weight: 3 points
Automatable: YES (length-triggered)
Failure mode: Monolithic CLAUDE.md grows without bound as project complexity grows. The correct response is decomposition.
Check: If root CLAUDE.md > 8,000 chars, flag and suggest decomposition.

R20 | Organization | MUST | "Personal preferences in ~/.claude/CLAUDE.md, not project root"

Weight: 2 points
Automatable: YES
Failure mode: Personal rules committed to the project impose one developer's workflow on all contributors.
Check: Detect first-person language ("I prefer", "my workflow", "for me"), editor-specific rules without project basis.

R33 | Organization | MAY | "Anti-attribution rule for team projects"

Weight: 1 point
Automatable: PARTIALLY
Failure mode: Claude adds "Co-Authored-By: Claude" markers to commits, which is increasingly unwanted in professional settings.
Check: Flag absence of anti-attribution rule for projects with > 5 contributors.

R36 | Organization | MAY | "Subdirectory CLAUDE.md for large monorepos"

Weight: 3 points
Automatable: YES (filesystem detection)
Failure mode: Trying to document a 50-module monorepo in one file; the root file becomes a context tax on every task.
Check: If project has > 5 top-level modules with significant source trees, suggest subdirectory CLAUDE.md files.

Category 4 total: 15 raw points (normalized to 10 in rubric)

Category 5: Maintenance (R25, R37-R44) -- 5 points + defense-in-depth guidance

R25 | Maintenance | MUST | "Version controlled; CLAUDE.local.md gitignored"

Weight: 5 points
Automatable: YES
Failure mode: CLAUDE.md modified without review introduces regressions; without git history, debugging "why did Claude start behaving differently?" is impossible.
Check: CLAUDE.md exists in git history (not in .gitignore); CLAUDE.local.md is in .gitignore.

R37 | Maintenance | SHOULD | "Critical constraints SHOULD have hook implementations"

Weight: 3 points
Automatable: YES (config file cross-reference)
Failure mode: "Always run make lint before committing" as a CLAUDE.md instruction is unreliable; a PreToolUse hook is a guarantee.
Check: Check if constraints section quality gates have corresponding hook definitions in .claude/settings.json.

R38 | Maintenance | MAY | "Session learning protocol in place"

Weight: 2 points
Automatable: PARTIALLY
Failure mode: CLAUDE.md remains static while the project evolves; rules that no longer apply accumulate; new patterns go undocumented.
Check: Check if CLAUDE.md has a last-modified date or version comment.

R40 | Maintenance | MAY | "No emoji in headers"

Weight: 1 point
Automatable: YES
Failure mode: Emoji in H2 headers makes section names harder to grep and signals decorative rather than functional use.
Check: Detect emoji codepoints in H1/H2/H3 header text.

R41 | Maintenance | MAY | "Non-English length adjustment"

Weight: 0 points (rubric note, not scored)
Automatable: YES
Failure mode: CJK languages require more characters per concept. Same information penalized by length thresholds designed for English.
Check: For CJK primary language, apply 1.3x character count multiplier to soft-limit and hard-limit calculations.

R42 | Maintenance | SHOULD | "Use hooks for safety-critical constraints, not CLAUDE.md alone"

Weight: 0 points (advisory; scored via R37)
Automatable: N/A
Failure mode: FM-01 (advisory framing) and FM-04 (extended thinking) make CLAUDE.md unreliable for safety-critical rules.
Check: Architectural guidance -- safety-critical rules belong in PreToolUse hooks with decision: "block".

R43 | Maintenance | SHOULD | "Use --append-system-prompt for rules that must survive plan mode"

Weight: 0 points (advisory)
Automatable: N/A
Failure mode: FM-02 -- plan mode system reminders contain explicit "supercedes" language. CLAUDE.md rules that configure plan mode behavior will be overridden.
Check: Architectural guidance -- plan mode constraints belong in --append-system-prompt or settings.json keys (e.g., plansDirectory).

R44 | Maintenance | SHOULD | "Add a Compact Instructions section"

Weight: 0 points (advisory; becomes scored if template includes it)
Automatable: YES
Failure mode: FM-03 -- context compaction drops CLAUDE.md rules silently. Compact Instructions section guides the summary generator to preserve critical context.
Check: Assert presence of ## Compact Instructions section listing what to preserve during compaction.

Category 5 total: 11 raw points (normalized to 5 in rubric)

Laiff/claude-linter.md

Select an option

No results found