Skip to content

Conversation

@actgardner
Copy link

Added methods which accept an RDD or array and return a map of (label -> posterior prob.) for each input set rather than only returning the key with the maximum value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably import scala.collection.mutable and write mutable.Map. Not sure what others' preference is.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. but why are you returning mutable Map anyway?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scala newbie. I couldn't find a better pattern to build the map than mutating it in the foreach. Should I just build a map then make it immutable for returning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, but you need not promise a mutable Map in the return
type. You can return it as a scala.collection.Map

On Fri, Dec 5, 2014 at 3:51 PM, alanctgardner [email protected] wrote:

In
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala:

@@ -65,6 +65,24 @@ class NaiveBayesModel private[mllib] (
override def predict(testData: Vector): Double = {
labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
}
+

  • def classProbabilities(testData: RDD[Vector]):
  • RDD[scala.collection.mutable.Map[Double, Double]] = {
  • val bcModel = testData.context.broadcast(this)
  • testData.mapPartitions { iter =>
  •  val model = bcModel.value
    
  •  iter.map(model.classProbabilities)
    
  • }
  • }
  • def classProbabilities(testData: Vector):
    scala.collection.mutable.Map[Double, Double] = {

Scala newbie. I couldn't find a better pattern to build the map than
mutating it in the foreach. Should I just build a map then make it immutable
for returning?


Reply to this email directly or view it on GitHub.

@srowen
Copy link
Member

srowen commented Feb 23, 2015

@alanctgardner have you had a look at @jkbradley 's feedback? I'm wondering this is still live. It needs a rebase if so.

@jkbradley
Copy link
Member

@alanctgardner That will be great if you change it to predictProbabilities; thanks. I agree with what @jatinpreet was saying about the correctness, and with @srowen 's comment on how to fix it: The value of brzPi + brzTheta * testData.toBreeze is a log probability, which needs to be exponentiated before you normalize it here: [https://github.com//pull/3626/files?diff=split#diff-6d8eff78be2fb624d4a076db334208a4R84]

Could you please rebase off of master and make these couple of updates? After that, I can make a final pass. Thanks!

@srowen
Copy link
Member

srowen commented Mar 12, 2015

Mind closing this PR? if it's not going to be updated.

@asfgit asfgit closed this in 0cc8fcb Apr 12, 2015
@acidghost
Copy link

Hello there! I'm interested in reopening this PR and contributing with a patch to [SPARK-4362] based on this PR with the needed changes. I read the wiki, should I simply follow those steps and create a new PR?

@acidghost
Copy link

@jkbradley @srowen Do you think that the return type of predictProbabilities should be scala.collection.Map[Double, Double] or Vector[Double]?

@srowen
Copy link
Member

srowen commented Jun 11, 2015

I think the original Map was OK, IMHO.

@acidghost
Copy link

@srowen I just created PR #6761

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants