[MLLIB] SPARK-4362: Added classProbabilities method for Naive Bayes #3626

actgardner · 2014-12-05T21:45:04Z

Added methods which accept an RDD or array and return a map of (label -> posterior prob.) for each input set rather than only returning the key with the maximum value.

srowen · 2014-12-05T21:46:15Z

mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala

I'd probably import scala.collection.mutable and write mutable.Map. Not sure what others' preference is.

AmplabJenkins · 2014-12-05T21:47:12Z

Can one of the admins verify this patch?

srowen · 2014-12-05T21:48:11Z

mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala

.. but why are you returning mutable Map anyway?

Scala newbie. I couldn't find a better pattern to build the map than mutating it in the foreach. Should I just build a map then make it immutable for returning?

That's fine, but you need not promise a mutable Map in the return
type. You can return it as a scala.collection.Map

On Fri, Dec 5, 2014 at 3:51 PM, alanctgardner [email protected] wrote:

In
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala:

@@ -65,6 +65,24 @@ class NaiveBayesModel private[mllib] (
override def predict(testData: Vector): Double = {
labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
}
+

def classProbabilities(testData: RDD[Vector]):

RDD[scala.collection.mutable.Map[Double, Double]] = {

val bcModel = testData.context.broadcast(this)

testData.mapPartitions { iter =>

val model = bcModel.value

iter.map(model.classProbabilities)

}

}

def classProbabilities(testData: Vector):
scala.collection.mutable.Map[Double, Double] = {

Scala newbie. I couldn't find a better pattern to build the map than
mutating it in the foreach. Should I just build a map then make it immutable
for returning?

—
Reply to this email directly or view it on GitHub.

srowen · 2015-02-23T15:28:04Z

@alanctgardner have you had a look at @jkbradley 's feedback? I'm wondering this is still live. It needs a rebase if so.

jkbradley · 2015-02-23T19:02:11Z

@alanctgardner That will be great if you change it to predictProbabilities; thanks. I agree with what @jatinpreet was saying about the correctness, and with @srowen 's comment on how to fix it: The value of brzPi + brzTheta * testData.toBreeze is a log probability, which needs to be exponentiated before you normalize it here: [https://github.com//pull/3626/files?diff=split#diff-6d8eff78be2fb624d4a076db334208a4R84]

Could you please rebase off of master and make these couple of updates? After that, I can make a final pass. Thanks!

srowen · 2015-03-12T16:39:18Z

Mind closing this PR? if it's not going to be updated.

acidghost · 2015-06-10T07:26:36Z

Hello there! I'm interested in reopening this PR and contributing with a patch to [SPARK-4362] based on this PR with the needed changes. I read the wiki, should I simply follow those steps and create a new PR?

acidghost · 2015-06-11T13:17:38Z

@jkbradley @srowen Do you think that the return type of predictProbabilities should be scala.collection.Map[Double, Double] or Vector[Double]?

srowen · 2015-06-11T14:33:30Z

I think the original Map was OK, IMHO.

acidghost · 2015-06-11T14:51:21Z

@srowen I just created PR #6761

srowen reviewed Dec 5, 2014
View reviewed changes

Alan Gardner added 3 commits March 2, 2015 14:28

Added classProbabilities method

da414d8

Import mutable to be less verbose

a97d0f8

Normalize posteriors, change signature to Map interface

7d6b5b4

asfgit closed this in 0cc8fcb Apr 12, 2015

[MLLIB] SPARK-4362: Added classProbabilities method for Naive Bayes #3626

[MLLIB] SPARK-4362: Added classProbabilities method for Naive Bayes #3626

Uh oh!

Conversation

actgardner commented Dec 5, 2014

Uh oh!

srowen Dec 5, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented Dec 5, 2014

Uh oh!

srowen Dec 5, 2014

Choose a reason for hiding this comment

Uh oh!

actgardner Dec 5, 2014

Choose a reason for hiding this comment

Uh oh!

srowen Dec 5, 2014

Choose a reason for hiding this comment

Uh oh!

srowen commented Feb 23, 2015

Uh oh!

jkbradley commented Feb 23, 2015

Uh oh!

srowen commented Mar 12, 2015

Uh oh!

acidghost commented Jun 10, 2015

Uh oh!

acidghost commented Jun 11, 2015

Uh oh!

srowen commented Jun 11, 2015

Uh oh!

acidghost commented Jun 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants