Skip to content

Conversation

@SecondSkoll
Copy link
Contributor

@SecondSkoll SecondSkoll commented Nov 5, 2025

Moves to a manifest generated test framework, implements some low level fixes, and adjust some rules.

@SecondSkoll SecondSkoll requested a review from Copilot November 5, 2025 10:27
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a data-driven test framework for Vale rules using pytest, replacing manual test files with automated testing infrastructure. The framework validates Vale linting rules against expected outcomes defined in a manifest.

Key changes:

  • Implements pytest-based test infrastructure with fixtures and parametrized tests
  • Adds initial test coverage for two rules (000-US-spellcheck and 500-Repeated-words)
  • Updates Vale configuration to enable repeated words rule and add RST directive ignores

Reviewed Changes

Copilot reviewed 41 out of 42 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/conftest.py Core pytest configuration providing fixtures for manifest loading, Vale execution, and result validation
tests/test_rules.py Main test functions implementing data-driven rule validation
tests/data/manifest.yml Test manifest defining expected behaviors for two initial rules
tests/requirements.txt Python dependencies including pytest, pyyaml, and vale 3.13.0.0
.github/workflows/pytest.yml GitHub Actions workflow to run tests on Python 3.10 and 3.12
vale.ini Configuration updates: enables repeated words rule, adds RST directive token ignores
test/.md, test/.rst, test/*.html Removal of legacy manual test files
styles/Canonical/*.yml Rule refinements including scope changes and pattern updates
styles/config/vocabularies/Canonical/accept.txt Added "LTS" to accepted vocabulary
getting-started.md Documentation addition about conditionally ignoring rules

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

severity: error
004-Canonical-product-names: # Not currently working for multi-word terms
cases:
- id: valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name seems misleading

# Vale returns 0 (no issues) or 1 (issues found). Other codes indicate errors.
if proc.returncode not in (0, 1):
# Gracefully skip known RST parser runtime errors if detected.
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably try to flatten or simplify exception handling, otherwise we risk drowning in these unreadable try/except/raise rollercoasters. Sometimes it's OK to let an exception fall through.

Some context:

https://x.com/karpathy/status/1976077806443569355
https://www.webpronews.com/karpathy-critiques-llms-fear-of-code-exceptions-in-rlhf-training/


def _load_manifest() -> Dict[str, Any]:
with open(MANIFEST_PATH, "r", encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see a formal (and documented) JSON schema for the YAML manifest used for validation somewhere around here.

)


@dataclass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using Pydantic for this and the manifest, too, e.g.:

class ExpectedResult(BaseModel):
    triggers: List[str] = Field(default_factory=list)
    severity: str | None = None
    message_regex: str | None = None


class TestCase(BaseModel):
    id: str
    filetypes: List[str]
    content: str
    expect: ExpectedResult


class RuleDefinition(BaseModel):
    name: str
    cases: List[TestCase]


class Manifest(BaseModel):
    rules: List[RuleDefinition]
    
    @classmethod
    def from_yaml_dict(cls, data: dict) -> "Manifest":
        """Create Manifest from YAML structure where rules is a dict."""
        rules_dict = data.get("rules", {})
        rules = [
            {"name": rule_name, "cases": rule_data.get("cases", [])}
            for rule_name, rule_data in rules_dict.items()
        ]
        return cls(rules=rules)
    
    def iter_cases(self):
        """Iterate over all test cases with their rule names."""
        for rule in self.rules:
            for case in rule.cases:
                yield rule.name, case
    
    def get_rule_names(self) -> List[str]:
        """Return list of all rule names in the manifest."""
        return [rule.name for rule in self.rules]


def _load_manifest() -> Manifest:
    with open(MANIFEST_PATH, "r", encoding="utf-8") as f:
        data = yaml.safe_load(f) or {}
    
    try:
        return Manifest.from_yaml_dict(data)
    except ValidationError as e:
        raise ValueError(f"Manifest validation failed: {e}") from e

And so on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants