Skip to content

Conversation

martonvago
Copy link
Contributor

@martonvago martonvago commented Oct 13, 2025

Description

This PR handles the special case of requiring a field to be non-null by subclassing CustomCheck to create RequiredCheck.

Usage:

version_required = RequiredCheck(
    jsonpath="$.version",
    message="Version is required.",
)
lowercase_check = CustomCheck(
    jsonpath="$.name",
    message="Name must be lowercase.",
    check=lambda name: name.islower(),
    type="lowercase",
)
config = Config(custom_checks=[lowercase_check, version_required])
issues = check(descriptor, config=config)

I thought this was a nice way of splitting out the logic for required rules and that required rules were special enough to warrant their own class.

I also thought it made sense to limit what kind of JSON paths the required rule can be applied to. It makes sense to mark a dict property (e.g. $.resources[*].title) as required because it is very obvious what it means for this field to be missing.

  • But marking an array item itself (e.g. $.resources[*] or $.resources[2]) as required makes less sense to me. "I want the second resource to exist in particular" is a weird constraint. If someone cares about the number of resources, they should be checking the length of the array.
  • Same for "vague" JSON paths like $..title: selecting all title fields under the root node and then checking if they exist feels circular and unnecessary.

➡️ So I restricted RequiredCheck to sensible paths only, which also made the code simpler.

Closes #120

Needs an in-depth review.

Checklist

  • Formatted Markdown
  • Ran just run-all

@martonvago martonvago self-assigned this Oct 13, 2025
@martonvago martonvago moved this from Todo to In Progress in Iteration planning Oct 13, 2025
Comment on lines +71 to +74
def _get_direct_jsonpaths(jsonpath: str, descriptor: dict[str, Any]) -> list[str]:
"""Returns all direct JSON paths that match a direct or indirect JSON path."""
fields = _get_fields_at_jsonpath(jsonpath, descriptor)
return _map(fields, lambda field: field.jsonpath)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclude can use this too, that's why I put it here

"""Checks the descriptor against the rule and creates issues for fields that fail.
def __init__(self, jsonpath: str, message: str):
"""Initializes the `RequiredRule`."""
field_name_match = re.search(r"(?<!\.)(\.\w+)$", jsonpath)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find field names not preceded by .. So, e.g., $.resources[*].name >> .name

f"Cannot define `RequiredRule` for JSON path `{jsonpath}`."
" A `RequiredRule` must target a concrete object field (e.g.,"
" `$.title`) or set of fields (e.g., `$.resources[*].title`)."
" Ambiguous paths (e.g., `$..title`) or paths pointing to array items"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better alternatives for "ambiguous"?

Comment on lines +119 to +125
matching_paths = _get_direct_jsonpaths(self.jsonpath, descriptor)
indirect_parent_path = self.jsonpath.removesuffix(self._field_name)
direct_parent_paths = _get_direct_jsonpaths(indirect_parent_path, descriptor)
missing_paths = _filter(
direct_parent_paths,
lambda path: f"{path}{self._field_name}" not in matching_paths,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually:

  1. Get all existing paths >> ["$.resources[0].name", "$.resources[1].name"]
  2. Get all parent paths >> ["$.resources[0]", "$.resources[1]", "$.resources[2]"]
  3. Get all missing paths >> ["$.resources[2].name"]

@@ -43,38 +45,105 @@ class Rule:
check: Callable[[Any], bool]
type: str = "custom"

def apply(self, descriptor: dict[str, Any]) -> list[Issue]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made apply a method on the class. This way, Rules and RequiredRules can each define their own logic.

_get_fields_at_jsonpath,
_map,
)
from check_datapackage.issue import Issue


@dataclass
@dataclass(frozen=True)
class Rule:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A weakness here is that there is nothing stopping you from writing Rule(type="required", ...) and this will not do anything, just like before. There are some ways of disallowing this, but maybe it's better not to complicate it further. The guide can say "if you want to check that a property exists, use the RequiredRule".

What do you think?

@martonvago martonvago moved this from In Progress to In Review in Iteration planning Oct 14, 2025
@martonvago martonvago marked this pull request as ready for review October 14, 2025 12:05
@martonvago martonvago requested a review from a team as a code owner October 14, 2025 12:05
Copy link
Member

@lwjohnst86 lwjohnst86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial thoughts before I do a deeper review ☺️ Very neat though!

"Exclude",
"Issue",
"CustomCheck",
"RequiredCheck",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're exposing it, but how it is integrated into Config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an example in the description of the PR. No need to do anything special because a RequiredCheck is a CustomCheck ☺️

)


class RequiredCheck(CustomCheck):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like that it's a whole other exposed class that we have to describe and document. Plus the naming now doesn't match the style of CustomCheck. Do you think we could fold in having a required check into the existing exposed CustomCheck?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can fold all the code into CustomCheck. I could have some if type == "required" conditionals and hope it doesn't become too spaghetti?

Copy link
Contributor Author

@martonvago martonvago Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then how would you want to handle the creation of a CustomCheck with type="required"? Should it be possible for the user to pass their own check function? What if they define something other than the non-null condition?
We could make the required check the default custom check maybe?

@github-project-automation github-project-automation bot moved this from In Review to In Progress in Iteration planning Oct 17, 2025
@signekb signekb changed the title feat: ✨ add RequiredRule feat: ✨ add RequiredRule Oct 17, 2025
@martonvago martonvago changed the title feat: ✨ add RequiredRule feat: ✨ add RequiredCheck Oct 17, 2025
@martonvago
Copy link
Contributor Author

I opened an alternative (PR #146), but I'll leave this here so you can compare ☺️

@martonvago martonvago moved this from In Progress to In Review in Iteration planning Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Rules with type="required"

2 participants