-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
CI: Add test case for unwanted patterns #30467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: Add test case for unwanted patterns #30467
Conversation
datapythonista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, added some comments about the structure of the script, and the style, but looks good.
|
This is really nice, thanks @MomIsBestFriend. I was not looking forward to figuring out the @datapythonista is there someone on one of the flake/lint projects we might want to talk to about upstreaming this? |
1 similar comment
|
This is really nice, thanks @MomIsBestFriend. I was not looking forward to figuring out the @datapythonista is there someone on one of the flake/lint projects we might want to talk to about upstreaming this? |
3ae2f2b to
6f2738f
Compare
|
@datapythonista It's looking like the running version of python in github's action is |
Where does it say this out of interest? It must be at least 3.6 as we run |
Line 52 in the github actions under (Screenshot because I squashed the original commit) |
a34b8ba to
e7c1a7f
Compare
|
We never use the system python, we install conda in every build, and create an environment, you'll have to activate it. I'd yield the values as a namedtuple. |
4d6d179 to
d5afed9
Compare
The problem is, the activation of conda is happening after it's running the unwanted patterns check. Do you have any suggestions? |
|
Better move the string thing in the linting, which is a good idea anyway. |
d5afed9 to
ea3f0e7
Compare
ea3f0e7 to
a90ca90
Compare
datapythonista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, added few comments with minor things, but good job. Thanks!
a90ca90 to
fc0def6
Compare
| argparser.add_argument( | ||
| "--format", | ||
| "-f", | ||
| default="default", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| default="default", | |
| default="{source_path}:{line_number}:{start}:{end}:{msg}", |
So by default we use this format, and if the caller ones a different format, can call the script like script -f "##vso[task.logissue type=error; {source_path}]{msg}" path. So you don't need to know the constants with the formats beforehand, just the variables.
I'm unsure whether start and end provide value, are they the columns of the strings to concatenate? I'd say the line number should make clear enough where the problem is, if that's the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure whether
startandendprovide value.
Short summary:
start and end are used to pinpoint the unconcatenated string.
Long summary:
start and end are the two unconcatenated strings that needs to be concatenated.
lets say we have a file called script.py and it's located in path/to/script.py and on line 1337 there is a line of:
print("foo" "bar")When running scripts/validate_string_concatenation.py
The default the output is looking like this:
./path/to/script.py:1337 BETWEEN "foo" AND "bar"
and the yielded dictionary is mapped like this:
source_path: "./path/to/script.py"
line_number: 1337
start: "foo"
end: "bar"ecd2e93 to
b0732c5
Compare
b0732c5 to
72f3dc3
Compare
| ): | ||
| for values in strings_to_concatenate(os.path.join(subdir, file_name)): | ||
| is_failed = True | ||
| print(output_format.format(**values)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output of this will be something like:
pandas/__init__.py:124: BETWEEN first part of the string AND with a second part in another string
I don't think people will understand what's going on when seeing this error in the CI. I think we should display something like:
pandas/__init__.py:124:String unnecessarily split in two by black. Please merge them manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this, will the message be too long?
and where do I put the source_path, line_number, start, end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message being long is not a problem. Neither for the CI or for the code.
From my example, the format would be:
{source_path}:{line_number}:String unnecessarily split in two by black. Please merge them manually.
As I mentioned eariler, I don't find start and end particularly useful for the reader of the error. Feel free to leave them if you disagree, but better rename them, their name is misleading. string1 and string2 are not great names, but they're better. At least they don't create the wrong impression those are the positions of the strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can now see what you mean, Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, looks much better.
Sorry I said it wrong, but the format should actually be {source_path}:{line_number}:{msg}. We'll decide the error here. Imagine that this function can detect more than one error, we can't say in the format which is the error we want to receive.
So this could become:
| print(output_format.format(**values)) | |
| msg = "String unnecessarily split in two by black. Please merge them manually." | |
| print(output_format.format(source_path=source_path, line_number=line_number, msg=msg)) |
And since now the strings_to_concatenate function is just returning two values, I think it makes more sense to return a tuple with them, instead of a dict, so the for above would be:
for source_path, line_number in strings_to_concatenate(os.path.join(subdir, file_name)):
72f3dc3 to
280d103
Compare
7bdcd14 to
10df407
Compare
datapythonista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, looks great now. Just couple of minor things, and I think we can get it merged. Thanks!
| Yields | ||
| ------ | ||
| dict of {str: str} | ||
| Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just list the two returns, instead of saying it's a tuple.
83e9246 to
5e272bc
Compare
|
Can be merged after merging #30579 |
|
Merged #30579, you can update your branch, so the CI passes |
datapythonista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just on small typo, but lgtm
afd9e0b to
0530d02
Compare
datapythonista
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @MomIsBestFriend
|
Thank you for the very close guidance @datapythonista |
|
thanks @MomIsBestFriend |

black pandasgit diff upstream/master -u -- "*.py" | flake8 --diffCan be merged, after merging #30464