-
Notifications
You must be signed in to change notification settings - Fork 0
feat: ✨ add RequiredCheck
#122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
def _get_direct_jsonpaths(jsonpath: str, descriptor: dict[str, Any]) -> list[str]: | ||
"""Returns all direct JSON paths that match a direct or indirect JSON path.""" | ||
fields = _get_fields_at_jsonpath(jsonpath, descriptor) | ||
return _map(fields, lambda field: field.jsonpath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exclude can use this too, that's why I put it here
"""Checks the descriptor against the rule and creates issues for fields that fail. | ||
def __init__(self, jsonpath: str, message: str): | ||
"""Initializes the `RequiredRule`.""" | ||
field_name_match = re.search(r"(?<!\.)(\.\w+)$", jsonpath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Find field names not preceded by .
. So, e.g., $.resources[*].name
>> .name
f"Cannot define `RequiredRule` for JSON path `{jsonpath}`." | ||
" A `RequiredRule` must target a concrete object field (e.g.," | ||
" `$.title`) or set of fields (e.g., `$.resources[*].title`)." | ||
" Ambiguous paths (e.g., `$..title`) or paths pointing to array items" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better alternatives for "ambiguous"?
matching_paths = _get_direct_jsonpaths(self.jsonpath, descriptor) | ||
indirect_parent_path = self.jsonpath.removesuffix(self._field_name) | ||
direct_parent_paths = _get_direct_jsonpaths(indirect_parent_path, descriptor) | ||
missing_paths = _filter( | ||
direct_parent_paths, | ||
lambda path: f"{path}{self._field_name}" not in matching_paths, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually:
- Get all existing paths >>
["$.resources[0].name", "$.resources[1].name"]
- Get all parent paths >>
["$.resources[0]", "$.resources[1]", "$.resources[2]"]
- Get all missing paths >>
["$.resources[2].name"]
@@ -43,38 +45,105 @@ class Rule: | |||
check: Callable[[Any], bool] | |||
type: str = "custom" | |||
|
|||
def apply(self, descriptor: dict[str, Any]) -> list[Issue]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made apply
a method on the class. This way, Rule
s and RequiredRule
s can each define their own logic.
src/check_datapackage/rule.py
Outdated
_get_fields_at_jsonpath, | ||
_map, | ||
) | ||
from check_datapackage.issue import Issue | ||
|
||
|
||
@dataclass | ||
@dataclass(frozen=True) | ||
class Rule: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A weakness here is that there is nothing stopping you from writing Rule(type="required", ...)
and this will not do anything, just like before. There are some ways of disallowing this, but maybe it's better not to complicate it further. The guide can say "if you want to check that a property exists, use the RequiredRule
".
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial thoughts before I do a deeper review
"Exclude", | ||
"Issue", | ||
"CustomCheck", | ||
"RequiredCheck", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're exposing it, but how it is integrated into Config
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an example in the description of the PR. No need to do anything special because a RequiredCheck
is a CustomCheck
) | ||
|
||
|
||
class RequiredCheck(CustomCheck): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I like that it's a whole other exposed class that we have to describe and document. Plus the naming now doesn't match the style of CustomCheck
. Do you think we could fold in having a required check into the existing exposed CustomCheck
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can fold all the code into CustomCheck
. I could have some if type == "required"
conditionals and hope it doesn't become too spaghetti?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then how would you want to handle the creation of a CustomCheck
with type="required"
? Should it be possible for the user to pass their own check
function? What if they define something other than the non-null condition?
We could make the required check the default custom check maybe?
I opened an alternative (PR #146), but I'll leave this here so you can compare |
Description
This PR handles the special case of requiring a field to be non-null by subclassing
CustomCheck
to createRequiredCheck
.Usage:
I thought this was a nice way of splitting out the logic for required rules and that required rules were special enough to warrant their own class.
I also thought it made sense to limit what kind of JSON paths the required rule can be applied to. It makes sense to mark a dict property (e.g.
$.resources[*].title
) as required because it is very obvious what it means for this field to be missing.$.resources[*]
or$.resources[2]
) as required makes less sense to me. "I want the second resource to exist in particular" is a weird constraint. If someone cares about the number of resources, they should be checking the length of the array.$..title
: selecting alltitle
fields under the root node and then checking if they exist feels circular and unnecessary.➡️ So I restricted
RequiredCheck
to sensible paths only, which also made the code simpler.Closes #120
Needs an in-depth review.
Checklist
just run-all