Skip to content

Commit fe75ff8

Browse files
HyukjinKwonmengxr
authored andcommitted
[SPARK-28206][PYTHON] Remove the legacy Epydoc in PySpark API documentation
## What changes were proposed in this pull request? Seems like we used to generate PySpark API documentation by Epydoc almost at the very first place (see 85b8f2c). This fixes an actual issue: Before: ![Screen Shot 2019-07-05 at 8 20 01 PM](https://user-images.githubusercontent.com/6477701/60720491-e9879180-9f65-11e9-9562-100830a456cd.png) After: ![Screen Shot 2019-07-05 at 8 20 05 PM](https://user-images.githubusercontent.com/6477701/60720495-ec828200-9f65-11e9-8277-8f689e292cb0.png) It seems apparently a bug within `epytext` plugin during the conversion between`param` and `:param` syntax. See also [Epydoc syntax](http://epydoc.sourceforge.net/manual-epytext.html). Actually, Epydoc syntax violates [PEP-257](https://www.python.org/dev/peps/pep-0257/) IIRC and blocks us to enable some rules for doctest linter as well. We should remove this legacy away and I guess Spark 3 is good timing to do it. ## How was this patch tested? Manually built the doc and check each. I had to manually find the Epydoc syntax by `git grep -r "{L"`, for instance. Closes #25060 from HyukjinKwon/SPARK-28206. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Xiangrui Meng <[email protected]>
1 parent d493a1f commit fe75ff8

File tree

23 files changed

+185
-217
lines changed

23 files changed

+185
-217
lines changed

python/docs/conf.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@
3131
extensions = [
3232
'sphinx.ext.autodoc',
3333
'sphinx.ext.viewcode',
34-
'epytext',
3534
'sphinx.ext.mathjax',
3635
]
3736

python/docs/epytext.py

Lines changed: 0 additions & 30 deletions
This file was deleted.

python/pyspark/accumulators.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -123,13 +123,13 @@ class Accumulator(object):
123123

124124
"""
125125
A shared variable that can be accumulated, i.e., has a commutative and associative "add"
126-
operation. Worker tasks on a Spark cluster can add values to an Accumulator with the C{+=}
127-
operator, but only the driver program is allowed to access its value, using C{value}.
126+
operation. Worker tasks on a Spark cluster can add values to an Accumulator with the `+=`
127+
operator, but only the driver program is allowed to access its value, using `value`.
128128
Updates from the workers get propagated automatically to the driver program.
129129
130-
While C{SparkContext} supports accumulators for primitive data types like C{int} and
131-
C{float}, users can also define accumulators for custom types by providing a custom
132-
L{AccumulatorParam} object. Refer to the doctest of this module for an example.
130+
While :class:`SparkContext` supports accumulators for primitive data types like :class:`int` and
131+
:class:`float`, users can also define accumulators for custom types by providing a custom
132+
:class:`AccumulatorParam` object. Refer to the doctest of this module for an example.
133133
"""
134134

135135
def __init__(self, aid, value, accum_param):
@@ -185,14 +185,14 @@ class AccumulatorParam(object):
185185
def zero(self, value):
186186
"""
187187
Provide a "zero value" for the type, compatible in dimensions with the
188-
provided C{value} (e.g., a zero vector)
188+
provided `value` (e.g., a zero vector)
189189
"""
190190
raise NotImplementedError
191191

192192
def addInPlace(self, value1, value2):
193193
"""
194194
Add two values of the accumulator's data type, returning a new value;
195-
for efficiency, can also update C{value1} in place and return it.
195+
for efficiency, can also update `value1` in place and return it.
196196
"""
197197
raise NotImplementedError
198198

python/pyspark/broadcast.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -49,8 +49,8 @@ def _from_id(bid):
4949
class Broadcast(object):
5050

5151
"""
52-
A broadcast variable created with L{SparkContext.broadcast()}.
53-
Access its value through C{.value}.
52+
A broadcast variable created with :meth:`SparkContext.broadcast`.
53+
Access its value through :attr:`value`.
5454
5555
Examples:
5656
@@ -69,7 +69,7 @@ class Broadcast(object):
6969
def __init__(self, sc=None, value=None, pickle_registry=None, path=None,
7070
sock_file=None):
7171
"""
72-
Should not be called directly by users -- use L{SparkContext.broadcast()}
72+
Should not be called directly by users -- use :meth:`SparkContext.broadcast`
7373
instead.
7474
"""
7575
if sc is not None:

python/pyspark/conf.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,16 +79,16 @@ class SparkConf(object):
7979
parameters as key-value pairs.
8080
8181
Most of the time, you would create a SparkConf object with
82-
C{SparkConf()}, which will load values from C{spark.*} Java system
82+
``SparkConf()``, which will load values from `spark.*` Java system
8383
properties as well. In this case, any parameters you set directly on
84-
the C{SparkConf} object take priority over system properties.
84+
the :class:`SparkConf` object take priority over system properties.
8585
86-
For unit tests, you can also call C{SparkConf(false)} to skip
86+
For unit tests, you can also call ``SparkConf(false)`` to skip
8787
loading external settings and get the same configuration no matter
8888
what the system properties are.
8989
9090
All setter methods in this class support chaining. For example,
91-
you can write C{conf.setMaster("local").setAppName("My app")}.
91+
you can write ``conf.setMaster("local").setAppName("My app")``.
9292
9393
.. note:: Once a SparkConf object is passed to Spark, it is cloned
9494
and can no longer be modified by the user.

python/pyspark/context.py

Lines changed: 29 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ class SparkContext(object):
6161

6262
"""
6363
Main entry point for Spark functionality. A SparkContext represents the
64-
connection to a Spark cluster, and can be used to create L{RDD} and
64+
connection to a Spark cluster, and can be used to create :class:`RDD` and
6565
broadcast variables on that cluster.
6666
6767
.. note:: Only one :class:`SparkContext` should be active per JVM. You must `stop()`
@@ -86,7 +86,7 @@ def __init__(self, master=None, appName=None, sparkHome=None, pyFiles=None,
8686
gateway=None, jsc=None, profiler_cls=BasicProfiler):
8787
"""
8888
Create a new SparkContext. At least the master and app name should be set,
89-
either through the named parameters here or through C{conf}.
89+
either through the named parameters here or through `conf`.
9090
9191
:param master: Cluster URL to connect to
9292
(e.g. mesos://host:port, spark://host:port, local[4]).
@@ -102,7 +102,7 @@ def __init__(self, master=None, appName=None, sparkHome=None, pyFiles=None,
102102
the batch size based on object sizes, or -1 to use an unlimited
103103
batch size
104104
:param serializer: The serializer for RDDs.
105-
:param conf: A L{SparkConf} object setting Spark properties.
105+
:param conf: A :class:`SparkConf` object setting Spark properties.
106106
:param gateway: Use an existing gateway and JVM, otherwise a new JVM
107107
will be instantiated.
108108
:param jsc: The JavaSparkContext instance (optional).
@@ -576,7 +576,7 @@ def _serialize_to_jvm(self, data, serializer, reader_func, createRDDServer):
576576

577577
def pickleFile(self, name, minPartitions=None):
578578
"""
579-
Load an RDD previously saved using L{RDD.saveAsPickleFile} method.
579+
Load an RDD previously saved using :meth:`RDD.saveAsPickleFile` method.
580580
581581
>>> tmpFile = NamedTemporaryFile(delete=True)
582582
>>> tmpFile.close()
@@ -624,20 +624,24 @@ def wholeTextFiles(self, path, minPartitions=None, use_unicode=True):
624624
as `utf-8`), which is faster and smaller than unicode. (Added in
625625
Spark 1.2)
626626
627-
For example, if you have the following files::
627+
For example, if you have the following files:
628628
629-
hdfs://a-hdfs-path/part-00000
630-
hdfs://a-hdfs-path/part-00001
631-
...
632-
hdfs://a-hdfs-path/part-nnnnn
629+
.. code-block:: text
633630
634-
Do C{rdd = sparkContext.wholeTextFiles("hdfs://a-hdfs-path")},
635-
then C{rdd} contains::
631+
hdfs://a-hdfs-path/part-00000
632+
hdfs://a-hdfs-path/part-00001
633+
...
634+
hdfs://a-hdfs-path/part-nnnnn
635+
636+
Do ``rdd = sparkContext.wholeTextFiles("hdfs://a-hdfs-path")``,
637+
then ``rdd`` contains:
636638
637-
(a-hdfs-path/part-00000, its content)
638-
(a-hdfs-path/part-00001, its content)
639-
...
640-
(a-hdfs-path/part-nnnnn, its content)
639+
.. code-block:: text
640+
641+
(a-hdfs-path/part-00000, its content)
642+
(a-hdfs-path/part-00001, its content)
643+
...
644+
(a-hdfs-path/part-nnnnn, its content)
641645
642646
.. note:: Small files are preferred, as each file will be loaded
643647
fully in memory.
@@ -705,7 +709,7 @@ def sequenceFile(self, path, keyClass=None, valueClass=None, keyConverter=None,
705709
and value Writable classes
706710
2. Serialization is attempted via Pyrolite pickling
707711
3. If this fails, the fallback is to call 'toString' on each key and value
708-
4. C{PickleSerializer} is used to deserialize pickled objects on the Python side
712+
4. :class:`PickleSerializer` is used to deserialize pickled objects on the Python side
709713
710714
:param path: path to sequncefile
711715
:param keyClass: fully qualified classname of key Writable class
@@ -872,17 +876,16 @@ def union(self, rdds):
872876

873877
def broadcast(self, value):
874878
"""
875-
Broadcast a read-only variable to the cluster, returning a
876-
L{Broadcast<pyspark.broadcast.Broadcast>}
879+
Broadcast a read-only variable to the cluster, returning a :class:`Broadcast`
877880
object for reading it in distributed functions. The variable will
878881
be sent to each cluster only once.
879882
"""
880883
return Broadcast(self, value, self._pickled_broadcast_vars)
881884

882885
def accumulator(self, value, accum_param=None):
883886
"""
884-
Create an L{Accumulator} with the given initial value, using a given
885-
L{AccumulatorParam} helper object to define how to add values of the
887+
Create an :class:`Accumulator` with the given initial value, using a given
888+
:class:`AccumulatorParam` helper object to define how to add values of the
886889
data type if provided. Default AccumulatorParams are used for integers
887890
and floating-point numbers if you do not provide one. For other types,
888891
a custom AccumulatorParam can be used.
@@ -902,12 +905,11 @@ def accumulator(self, value, accum_param=None):
902905
def addFile(self, path, recursive=False):
903906
"""
904907
Add a file to be downloaded with this Spark job on every node.
905-
The C{path} passed can be either a local file, a file in HDFS
908+
The `path` passed can be either a local file, a file in HDFS
906909
(or other Hadoop-supported filesystems), or an HTTP, HTTPS or
907910
FTP URI.
908911
909-
To access the file in Spark jobs, use
910-
L{SparkFiles.get(fileName)<pyspark.files.SparkFiles.get>} with the
912+
To access the file in Spark jobs, use :meth:`SparkFiles.get` with the
911913
filename to find its download location.
912914
913915
A directory can be given if the recursive option is set to True.
@@ -932,7 +934,7 @@ def addFile(self, path, recursive=False):
932934
def addPyFile(self, path):
933935
"""
934936
Add a .py or .zip dependency for all tasks to be executed on this
935-
SparkContext in the future. The C{path} passed can be either a local
937+
SparkContext in the future. The `path` passed can be either a local
936938
file, a file in HDFS (or other Hadoop-supported filesystems), or an
937939
HTTP, HTTPS or FTP URI.
938940
@@ -978,7 +980,7 @@ def setJobGroup(self, groupId, description, interruptOnCancel=False):
978980
Application programmers can use this method to group all those jobs together and give a
979981
group description. Once set, the Spark web UI will associate such jobs with this group.
980982
981-
The application can use L{SparkContext.cancelJobGroup} to cancel all
983+
The application can use :meth:`SparkContext.cancelJobGroup` to cancel all
982984
running jobs in this group.
983985
984986
>>> import threading
@@ -1023,7 +1025,7 @@ def setLocalProperty(self, key, value):
10231025
def getLocalProperty(self, key):
10241026
"""
10251027
Get a local property set in this thread, or null if it is missing. See
1026-
L{setLocalProperty}
1028+
:meth:`setLocalProperty`.
10271029
"""
10281030
return self._jsc.getLocalProperty(key)
10291031

@@ -1041,7 +1043,7 @@ def sparkUser(self):
10411043

10421044
def cancelJobGroup(self, groupId):
10431045
"""
1044-
Cancel active jobs for the specified group. See L{SparkContext.setJobGroup}
1046+
Cancel active jobs for the specified group. See :meth:`SparkContext.setJobGroup`.
10451047
for more information.
10461048
"""
10471049
self._jsc.sc().cancelJobGroup(groupId)

python/pyspark/files.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,7 @@
2424
class SparkFiles(object):
2525

2626
"""
27-
Resolves paths to files added through
28-
L{SparkContext.addFile()<pyspark.context.SparkContext.addFile>}.
27+
Resolves paths to files added through :meth:`SparkContext.addFile`.
2928
3029
SparkFiles contains only classmethods; users should not create SparkFiles
3130
instances.
@@ -41,7 +40,7 @@ def __init__(self):
4140
@classmethod
4241
def get(cls, filename):
4342
"""
44-
Get the absolute path of a file added through C{SparkContext.addFile()}.
43+
Get the absolute path of a file added through :meth:`SparkContext.addFile`.
4544
"""
4645
path = os.path.join(SparkFiles.getRootDirectory(), filename)
4746
return os.path.abspath(path)
@@ -50,7 +49,7 @@ def get(cls, filename):
5049
def getRootDirectory(cls):
5150
"""
5251
Get the root directory that contains files added through
53-
C{SparkContext.addFile()}.
52+
:meth:`SparkContext.addFile`.
5453
"""
5554
if cls._is_running_on_worker:
5655
return cls._root_directory

python/pyspark/ml/feature.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2560,7 +2560,7 @@ class IndexToString(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadable,
25602560
corresponding string values.
25612561
The index-string mapping is either from the ML attributes of the input column,
25622562
or from user-supplied labels (which take precedence over ML attributes).
2563-
See L{StringIndexer} for converting strings into indices.
2563+
See :class:`StringIndexer` for converting strings into indices.
25642564
25652565
.. versionadded:: 1.6.0
25662566
"""

python/pyspark/ml/linalg/__init__.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@
1717

1818
"""
1919
MLlib utilities for linear algebra. For dense vectors, MLlib
20-
uses the NumPy C{array} type, so you can simply pass NumPy arrays
21-
around. For sparse vectors, users can construct a L{SparseVector}
22-
object from MLlib or pass SciPy C{scipy.sparse} column vectors if
20+
uses the NumPy `array` type, so you can simply pass NumPy arrays
21+
around. For sparse vectors, users can construct a :class:`SparseVector`
22+
object from MLlib or pass SciPy `scipy.sparse` column vectors if
2323
SciPy is available in their environment.
2424
"""
2525

@@ -758,7 +758,7 @@ class Vectors(object):
758758
.. note:: Dense vectors are simply represented as NumPy array objects,
759759
so there is no need to covert them for use in MLlib. For sparse vectors,
760760
the factory methods in this class create an MLlib-compatible type, or users
761-
can pass in SciPy's C{scipy.sparse} column vectors.
761+
can pass in SciPy's `scipy.sparse` column vectors.
762762
"""
763763

764764
@staticmethod

python/pyspark/mllib/classification.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -659,11 +659,11 @@ def train(cls, data, lambda_=1.0):
659659
Train a Naive Bayes model given an RDD of (label, features)
660660
vectors.
661661
662-
This is the Multinomial NB (U{http://tinyurl.com/lsdw6p}) which
662+
This is the `Multinomial NB <http://tinyurl.com/lsdw6p>`_ which
663663
can handle all kinds of discrete data. For example, by
664664
converting documents into TF-IDF vectors, it can be used for
665665
document classification. By making every vector a 0-1 vector,
666-
it can also be used as Bernoulli NB (U{http://tinyurl.com/p7c96j6}).
666+
it can also be used as `Bernoulli NB <http://tinyurl.com/p7c96j6>`_.
667667
The input feature values must be nonnegative.
668668
669669
:param data:

0 commit comments

Comments
 (0)