-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[MLLIB] SPARK-4362: Added classProbabilities method for Naive Bayes #3626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably import scala.collection.mutable and write mutable.Map. Not sure what others' preference is.
|
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. but why are you returning mutable Map anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scala newbie. I couldn't find a better pattern to build the map than mutating it in the foreach. Should I just build a map then make it immutable for returning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, but you need not promise a mutable Map in the return
type. You can return it as a scala.collection.Map
On Fri, Dec 5, 2014 at 3:51 PM, alanctgardner [email protected] wrote:
In
mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala:@@ -65,6 +65,24 @@ class NaiveBayesModel private[mllib] (
override def predict(testData: Vector): Double = {
labels(brzArgmax(brzPi + brzTheta * testData.toBreeze))
}
+
- def classProbabilities(testData: RDD[Vector]):
- RDD[scala.collection.mutable.Map[Double, Double]] = {
- val bcModel = testData.context.broadcast(this)
- testData.mapPartitions { iter =>
val model = bcModel.valueiter.map(model.classProbabilities)- }
- }
- def classProbabilities(testData: Vector):
scala.collection.mutable.Map[Double, Double] = {Scala newbie. I couldn't find a better pattern to build the map than
mutating it in the foreach. Should I just build a map then make it immutable
for returning?—
Reply to this email directly or view it on GitHub.
|
@alanctgardner have you had a look at @jkbradley 's feedback? I'm wondering this is still live. It needs a rebase if so. |
|
@alanctgardner That will be great if you change it to predictProbabilities; thanks. I agree with what @jatinpreet was saying about the correctness, and with @srowen 's comment on how to fix it: The value of Could you please rebase off of master and make these couple of updates? After that, I can make a final pass. Thanks! |
|
Mind closing this PR? if it's not going to be updated. |
|
Hello there! I'm interested in reopening this PR and contributing with a patch to [SPARK-4362] based on this PR with the needed changes. I read the wiki, should I simply follow those steps and create a new PR? |
|
@jkbradley @srowen Do you think that the return type of |
|
I think the original Map was OK, IMHO. |
Added methods which accept an RDD or array and return a map of (label -> posterior prob.) for each input set rather than only returning the key with the maximum value.