Skip to content

Conversation

@rekhajoshm
Copy link
Contributor

What changes were proposed in this pull request?

Using pydocstyle for python document style checker
PyCQA/pycodestyle#723

How was this patch tested?

./dev/run-tests

@SparkQA
Copy link

SparkQA commented Jan 24, 2018

Test build #86563 has finished for PR 20378 at commit e69827e.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea itself seems fine. but how many instances to fix are there? I think we should decide which rule to exclude and include. Also, I took a quick look and seems this scans all the files including non-python files?

@rekhajoshm
Copy link
Contributor Author

rekhajoshm commented Jan 28, 2018

Thanks @HyukjinKwon for review.
This checks only for python files, as $PATHS_TO_CHECK is passed, and it takes only python files(find . -name "*.py").Also the single file version on pydocstyle, pep257.py does not allow include/exclude from what I have tried. Sample results attached here.thanks.
pydocstyle_sample_results.txt

@HyukjinKwon
Copy link
Member

Please give me few days .. Let me try to take a close look for this.

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jan 30, 2018

So, seems we got:

First line should end with a period. 293
Multiline docstring should end with 1 blank line. 279
Blank line missing after one-line summary. 265
Return value type should be mentioned. 141
All modules should have docstrings. 109
One-liner docstrings should fit on one line with quotes. 91
First line should be in imperative mood ('Do', not 'Does'). 87
Exported definitions should have docstrings. 61
Class docstring should have 1 blank line around them. 35
Use r\"\"\" if any backslashes in your docstrings. 19
The entire docstring should be indented same as code. 6
Use \"\"\"triple double quotes\"\"\". 3
Exported classes should have docstrings. 1
No blank line before docstring in definitions. 1

I think we can take in:

First line should end with a period. 293
Multiline docstring should end with 1 blank line. 279
Blank line missing after one-line summary. 265
The entire docstring should be indented same as code. 6
Use \"\"\"triple double quotes\"\"\". 3   # this seems only in heapq3.py where we ignore pep8.
No blank line before docstring in definitions. 1

Not sure on:

Exported definitions should have docstrings. 61
Exported classes should have docstrings. 1

and take out (for now)

Return value type should be mentioned. 141
All modules should have docstrings. 109
One-liner docstrings should fit on one line with quotes. 91
First line should be in imperative mood ('Do', not 'Does'). 87
Class docstring should have 1 blank line around them. 35
Use r\"\"\" if any backslashes in your docstrings. 19

Also, I think we can take out cloudpickle.py, heapq3.py, shared.py, python/docs/conf.py, work//.py, and ./python/.eggs/* as we do in pep8.

@HyukjinKwon
Copy link
Member

Hey @holdenk, @ueshin, @viirya, @icexelloss, @felixcheung, @BryanCutler and @MrBago. What do you guys think about checking docstring and the list above? I think this could prevent nitpicking and idea itself seems good.

One vague concern is that it might make backporting super hard.

@rekhajoshm
Copy link
Contributor Author

rekhajoshm commented Jan 30, 2018

Thanks @HyukjinKwon for your update.

@HyukjinKwon @holdenk @ueshin @viirya @icexelloss @felixcheung @BryanCutler and @MrBago - While you are thinking on it, below is my analysis.

As I understand, there are two things that jira "seems" to be calling out.Please validate.

  1. doctest strings must be correctly formatted.
  2. doctest must NOT be included in docs?

Working on it, I found docstring style itself was not enforced at all, and that includes doctest style.
Another aspect seems to be exclude doctest from documentation (_build/html once generated.)
I am not certain on the reasoning behind this exclusion or whether it is indeed what is additionally intended in SPARK-11222.The jira subject and description say two different things, so maybe validate that understanding?

Meanwhile I had a look into/tested different configurations on epytext/sphinx extensions to see if we can achieve surpassing doctests in docs via them.
Played with RULES set in epytext.py. As per epytext manual http://epydoc.sourceforge.net/manual-epytext.html it ensures only correctly formatted doctests are rendered.Which means only if they were incorrectly formatted will they not to appear in docs.This did not seem right. I do not want incorrectly formatted doctest to avoid showing them in docs.

So I played around with sphinx extensions in conf.py, tested with different configs, eg-

'sphinx.ext.doctest',
'sphinx.ext.napoleon'

# Napoleon settings
napoleon_google_docstring = True
# Doctest settings
doctest_test_doctest_blocks=''
trim_doctest_flags=True

None of those options tried, get me to surpass doctest in docs(_build/html) once the build is done.

Thanks for thinking this over.

@ueshin
Copy link
Member

ueshin commented Jan 30, 2018

I like this idea, too, but seems like there are too many violating files so we can't enable this for now.
I'm wondering how we can encourage contributors to follow the style, hopefully automatically. Should we make a blacklist for the currently violating files and remove from it when the style is fixed, or something?

if not changed_files or any(f.endswith("lint-python")
or f.endswith("tox.ini")
or f.endswith(".py")
for f in changed_files):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you resolve the conflict? Looks like this change is already merged from #20338.

@viirya
Copy link
Member

viirya commented Jan 30, 2018

One question I have is, do the current violations cause significant document error?

Overall this is a good idea. However, is it worth enforcedly applying this if we consider the effort of fixing the violations, backporting difficulty in the future?

@HyukjinKwon
Copy link
Member

One question I have is, do the current violations cause significant document error?

I think this is a good point. Maybe, we could enable ones fixing actual significant problems, at least.

# Using pep257.py which is the single file version of pydocstyle.
PYDOCSTYLE_VERSION="0.2.1"
PYDOCSTYLE_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pydocstyle-$PYDOCSTYLE_VERSION.py"
PYDOCSTYLE_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/PyCQA/pydocstyle/$PYDOCSTYLE_VERSION/pep257.py"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if this is an official channel to get this script from? I see pep8 download has a note above to use PyPI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BryanCutler for your input. pep8 is replaced by pycodestyle from official PyPi channel in PR #20338 .This is for doc style alone, which is not maintained within pycodestyle as per maintainers of the project - PyCQA/pycodestyle#723
On the other hand, there is value and we could try to somehow setup latest pydocstyle from git(it would need setup on fly) rather than using single file version.

@BryanCutler
Copy link
Member

I looked at the example output to see see what the errors were. Specifically looking at broadcast.__init__

def __init__(self, sc=None, value=None, pickle_registry=None, path=None):
        """
        Should not be called directly by users -- use L{SparkContext.broadcast()}
        instead.
        """
        if sc is not None:

it gave

./python/pyspark/broadcast.py:68:8: First line should end with a period.
./python/pyspark/broadcast.py:68:8: Blank line missing after one-line summary.
./python/pyspark/broadcast.py:68:8: Multiline docstring should end with 1 blank line.

I think a lot of docstrings are purposely formatted this way. What is the format that it is looking for here?

@HyukjinKwon
Copy link
Member

pydocstyle seems claiming PEP 257 - https://www.python.org/dev/peps/pep-0257.

One option given #20378 (comment) and #20378 (comment) might be to note that we follow PEP 257 in http://spark.apache.org/contributing.html and then enable only ones causing actual problems.

@icexelloss
Copy link
Contributor

I share the same concern of backporting. If we decide to do large amounts of format changes. Should we consider backporting the format changes in one batch so future backporting can be easier?

@HyukjinKwon
Copy link
Member

@rekhajoshm, would you mind if I ask to take a quick look for ones causing actual problems and identify them? It might be kind of a grunt job tho.

@HyukjinKwon
Copy link
Member

If you happen to be busy on working this one, I can take this when I have some time. That's fine. Either way works.

@rekhajoshm
Copy link
Contributor Author

@HyukjinKwon I will check it over weekend.thanks


# Get PYDOCSTYLE at runtime so that we don't rely on it being installed on the build server.
# Using pep257.py which is the single file version of pydocstyle.
PYDOCSTYLE_VERSION="0.2.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, seems like the latest version of pydocstyle is 2.1.1. Should we use it instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As called out earlier, this was single file python doc style checker, the latest does not have single file checker that can be included.

@rekhajoshm
Copy link
Contributor Author

@HyukjinKwon Identifying docstyle failures does not help much as it is not straightforward to exclude in this version.

@rekhajoshm
Copy link
Contributor Author

rekhajoshm commented Feb 9, 2018

@HyukjinKwon @holdenk @ueshin @viirya @icexelloss @felixcheung @BryanCutler and @MrBago - This was one of the possible approach that I was running by you. I have proposed another approach at #20556 with features as below -

  • Use sphinx like check, run only if pydocstyle installed on machine/jenkins
  • use pydocstyle rather than single file version pep257.py, which is much older and had some known issues
  • verify pydocstyle latest 2.1.1 is in use, to ensure latest doc checks are getting executed
  • ignore (inclusion/exclusion error codes) features and support via tox.ini
  • Be non-breaking change and allow updating docstyle to standard at easy pace
    Closing this.Thanks!

@rekhajoshm rekhajoshm closed this Feb 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants