Skip to content

Conversation

@thunterdb
Copy link
Contributor

Replaces a number of occurences of MLlib in the documentation that were meant to refer to the spark.mllib package instead. It should clarify for new users the difference between spark.mllib (the package) and MLlib (the umbrella project for ML in spark).

It also removes some files that I forgot to delete with #10207

@jkbradley
Copy link
Member

Early comment: Maybe we should reinstate all existing .md files but replace the contents with redirection links to the relevant new parts of the guide. Websites might have links which will be broken by these changes. I guess that should include files removed in [https://github.com//pull/10207] too.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47461 has finished for PR 10234 at commit 42bf337.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.\n

@jkbradley
Copy link
Member

Related to my earlier comment, it might be good to rename the ml-intro file to ml-guide; there isn't a need to rename it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"DataFrame" (F capitalized) (elsewhere too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected everywhere in the documentation

@jkbradley
Copy link
Member

Done with review.
One final comment: I'd prefer adding spark.ml or spark.mllib to the end of both the title and displayTitle. I like having it in displayTitle since that provides the clearest signal about which page the user is viewing.

@thunterdb
Copy link
Contributor Author

@jkbradley I agree with you. Note that when you use the doc, the current section appears in bold in the side menu (and the submenu gets expandend). This is why I did not include it again, but I do not have a strong opinion on this point:

screen shot 2015-12-09 at 5 14 18 pm

@jkbradley
Copy link
Member

OK then I'll still vote for including it : )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.mllib right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this file; the "ml-" prefix ones are for spark.ml. (It's true the functionality is almost the same currently, but it's a bit different and will diverge more.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the purpose now. It was the old MLlib text, but a lot of it still applies. The distinction is removed.

@thunterdb
Copy link
Contributor Author

@jkbradley done with changes, let me know what you think.

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47530 has finished for PR 10234 at commit 6d66645.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.\n

@jkbradley
Copy link
Member

In addition to moving ml-intro back to ml-guide, it'd be nice if the sidebar had links back to the main spark.ml and spark.mllib pages. That could be done in a separate JIRA/PR, if you prefer.

@thunterdb
Copy link
Contributor Author

@jkbradley done:
screen shot 2015-12-10 at 11 19 30 am

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47535 has finished for PR 10234 at commit c75a5ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.\n

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47537 has finished for PR 10234 at commit 8432ac9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.\n

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc should now be a redirect to the ml-classification-regression.html#survival-regression section.

Also, it looks like some of the math renders incorrectly, but let's fix that in a follow-up.

@jkbradley
Copy link
Member

That's the only remaining issue I found. I checked against the Spark 1.5 doc links as well.

@jkbradley
Copy link
Member

LGTM pending tests. Thanks! (not sure what the deal is with survival regression math; I probably am missing some library on my computer.)

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47540 has finished for PR 10234 at commit 68f7e53.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * * more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.\n

@jkbradley
Copy link
Member

Merging with master and branch-1.6
Thank you!

asfgit pushed a commit that referenced this pull request Dec 10, 2015
…rk.mllib and mllib in the documentation.

Replaces a number of occurences of `MLlib` in the documentation that were meant to refer to the `spark.mllib` package instead. It should clarify for new users the difference between `spark.mllib` (the package) and MLlib (the umbrella project for ML in spark).

It also removes some files that I forgot to delete with #10207

Author: Timothy Hunter <[email protected]>

Closes #10234 from thunterdb/12212.

(cherry picked from commit 2ecbe02)
Signed-off-by: Joseph K. Bradley <[email protected]>
@asfgit asfgit closed this in 2ecbe02 Dec 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants