Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Mar 9, 2015

Updates to the documentation are as follows:

  • Added information on Kafka Direct API and Kafka Python API
  • Added joins to the main streaming guide
  • Improved details on the fault-tolerance semantics

Generated docs located here
http://people.apache.org/~tdas/spark-1.3.0-temp-docs/streaming-programming-guide.html#fault-tolerance-semantics

More things to add:

  • Configuration for Kafka receive rate
  • May be add concurrentJobs

@tdas tdas changed the title [SPARK-6128][Streaming][Documentation] Streaming guide update 1.3 [SPARK-6128][Streaming][Documentation] Updates to Spark Streaming Programming Guide Mar 9, 2015
@tdas
Copy link
Contributor Author

tdas commented Mar 9, 2015

@JoshRosen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"loose" -> "lose".

"zero-data" probably shouldn't be hyphenated. There's an extra space before the period at the end of the this sentence, too.

Typo: "Ssee".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@SparkQA
Copy link

SparkQA commented Mar 10, 2015

Test build #28411 has finished for PR 4956 at commit 86c4c2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2015

Test build #28417 has finished for PR 4956 at commit 04167a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 10, 2015

Test build #28418 has finished for PR 4956 at commit 380cf8d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 11, 2015

Test build #28477 has finished for PR 4956 at commit debe484.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Row(word: String)
    • class JavaSQLContextSingleton
    • public class JavaRow implements java.io.Serializable
    • You can also easily use machine learning algorithms provided by [MLlib](mllib-guide.html). First of all, there are streaming machine learning algorithms (e.g. (Streaming Linear Regression](mllib-linear-methods.html#streaming-linear-regression), [Streaming KMeans](file:///Users/tdas/Projects/Spark/spark/docs/_site/mllib-clustering.html#streaming-k-means), etc.) which can simultaneously learn from the streaming data as well as apply the model on the streaming data. Beyond these, for a much larger class of machine learning algorithms, you can learn a learning model offline (i.e. using historical data) and then apply the model online on streaming data. See the [MLlib](mllib-guide.html) guide for more details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and then queried it using" -> drop the 'it'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file:// link here should be updated. Also, it looks like the link to Streaming Linear Regression starts with a opening paren rather than a square bracket, causing it to be misformatted in Markdown.

@tdas
Copy link
Contributor Author

tdas commented Mar 12, 2015

Thank you so much @JoshRosen . I am merging this to unblock the release!

@asfgit asfgit closed this in cd3b68d Mar 12, 2015
asfgit pushed a commit that referenced this pull request Mar 12, 2015
…gramming Guide

Updates to the documentation are as follows:

- Added information on Kafka Direct API and Kafka Python API
- Added joins to the main streaming guide
- Improved details on the fault-tolerance semantics

Generated docs located here
http://people.apache.org/~tdas/spark-1.3.0-temp-docs/streaming-programming-guide.html#fault-tolerance-semantics

More things to add:
- Configuration for Kafka receive rate
- May be add concurrentJobs

Author: Tathagata Das <[email protected]>

Closes #4956 from tdas/streaming-guide-update-1.3 and squashes the following commits:

819408c [Tathagata Das] Minor fixes.
debe484 [Tathagata Das] Added DataFrames and MLlib
380cf8d [Tathagata Das] Fix link
04167a6 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-guide-update-1.3
0b77486 [Tathagata Das] Updates based on Josh's comments.
86c4c2a [Tathagata Das] Updated streaming guides
82de92a [Tathagata Das] Add Kafka to Python api docs

(cherry picked from commit cd3b68d)
Signed-off-by: Tathagata Das <[email protected]>
@SparkQA
Copy link

SparkQA commented Mar 12, 2015

Test build #28490 has finished for PR 4956 at commit 819408c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Row(word: String)
    • class JavaSQLContextSingleton
    • public class JavaRow implements java.io.Serializable
    • You can also easily use machine learning algorithms provided by [MLlib](mllib-guide.html). First of all, there are streaming machine learning algorithms (e.g. (Streaming Linear Regression](mllib-linear-methods.html#streaming-linear-regression), [Streaming KMeans](mllib-clustering.html#streaming-k-means), etc.) which can simultaneously learn from the streaming data as well as apply the model on the streaming data. Beyond these, for a much larger class of machine learning algorithms, you can learn a learning model offline (i.e. using historical data) and then apply the model online on streaming data. See the [MLlib](mllib-guide.html) guide for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants