@@ -438,22 +438,65 @@ run a 1-sample, 2-sided Kolmogorov-Smirnov test. The following example demonstra
438438and interpret the hypothesis tests.
439439
440440{% highlight scala %}
441- import org.apache.spark.SparkContext
442- import org.apache.spark.mllib.stat.Statistics._
441+ import org.apache.spark.mllib.stat.Statistics
443442
444443val data: RDD[ Double] = ... // an RDD of sample data
445444
446445// run a KS test for the sample versus a standard normal distribution
447446val testResult = Statistics.kolmogorovSmirnovTest(data, "norm", 0, 1)
448447println(testResult) // summary of the test including the p-value, test statistic,
449- // and null hypothesis
450- // if our p-value indicates significance, we can reject the null hypothesis
448+ // and null hypothesis
449+ // if our p-value indicates significance, we can reject the null hypothesis
451450
452451// perform a KS test using a cumulative distribution function of our making
453452val myCDF: Double => Double = ...
454453val testResult2 = Statistics.kolmogorovSmirnovTest(data, myCDF)
455454{% endhighlight %}
456455</div >
456+
457+ <div data-lang =" java " markdown =" 1 " >
458+ [ ` Statistics ` ] ( api/java/org/apache/spark/mllib/stat/Statistics.html ) provides methods to
459+ run a 1-sample, 2-sided Kolmogorov-Smirnov test. The following example demonstrates how to run
460+ and interpret the hypothesis tests.
461+
462+ {% highlight java %}
463+ import java.util.Arrays;
464+
465+ import org.apache.spark.api.java.JavaDoubleRDD;
466+ import org.apache.spark.api.java.JavaSparkContext;
467+
468+ import org.apache.spark.mllib.stat.Statistics;
469+ import org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult;
470+
471+ JavaSparkContext jsc = ...
472+ JavaDoubleRDD data = jsc.parallelizeDoubles(Arrays.asList(0.2, 1.0, ...));
473+ KolmogorovSmirnovTestResult testResult = Statistics.kolmogorovSmirnovTest(data, "norm", 0.0, 1.0);
474+ // summary of the test including the p-value, test statistic,
475+ // and null hypothesis
476+ // if our p-value indicates significance, we can reject the null hypothesis
477+ System.out.println(testResult);
478+ {% endhighlight %}
479+ </div >
480+
481+ <div data-lang =" python " markdown =" 1 " >
482+ [ ` Statistics ` ] ( api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics ) provides methods to
483+ run a 1-sample, 2-sided Kolmogorov-Smirnov test. The following example demonstrates how to run
484+ and interpret the hypothesis tests.
485+
486+ {% highlight python %}
487+ from pyspark.mllib.stat import Statistics
488+
489+ parallelData = sc.parallelize([ 1.0, 2.0, ... ] )
490+
491+ # run a KS test for the sample versus a standard normal distribution
492+ testResult = Statistics.kolmogorovSmirnovTest(parallelData, "norm", 0, 1)
493+ print(testResult) # summary of the test including the p-value, test statistic,
494+ # and null hypothesis
495+ # if our p-value indicates significance, we can reject the null hypothesis
496+ # Note that the Scala functionality of calling Statistics.kolmogorovSmirnovTest with
497+ # a lambda to calculate the CDF is not made available in the Python API
498+ {% endhighlight %}
499+ </div >
457500</div >
458501
459502
0 commit comments