Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Jan 4, 2017

What changes were proposed in this pull request?

Updates

  • Updated Late Data Handling section by adding a figure for Update Mode. Its more intuitive to explain late data handling with Update Mode, so I added the new figure before the Append Mode figure.
  • Updated Output Modes section with Update mode
  • Added options for all the sources and sinks


image



screen shot 2017-01-03 at 6 09 11 pm

screen shot 2017-01-03 at 6 10 00 pm



image
image
image

@SparkQA
Copy link

SparkQA commented Jan 4, 2017

Test build #70852 has finished for PR 16468 at commit 3285a2d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 4, 2017

Test build #70853 has finished for PR 16468 at commit fbacbf4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For structured-streaming-watermark-update-mode.png, in the third table (under 12:20), 12:00 - 12:10 cat 1 should be gray.

Do you want to document that without aggregation or window operators, update mode is same as append mode? Never mind. Just found that update mode requires aggregation.

<td>Append</td>
<td>Append, Update</td>
<td>
Complete mode note supported as it is infeasible to keep all data in the Result Table.
Copy link
Member

@zsxwing zsxwing Jan 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Complete mode not supported

<tr>
<td colspan="2" valign="middle"><br/>Queries without aggregation</td>
<td>Append</td>
<td>Append, Update</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdas
Copy link
Contributor Author

tdas commented Jan 4, 2017

Good catch about non-aggregation queries. we should support update mode, which is same as append mode. I will fix that in a follow up PR.

@tdas tdas changed the title [SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for update mode [SPARK-19074][SS][DOCS] Updated Structured Streaming Programming Guide for update mode and source/sink options Jan 5, 2017
(<a href="api/scala/index.html#org.apache.spark.sql.streaming.DataStreamReader">Scala</a>/<a href="api/java/org/apache/spark/sql/streaming/DataStreamReader.html">Java</a>/<a href="api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamReader">Python</a>).
E.g. for "parquet" format options see <code>DataStreamReader.parquet()</code></td>
<td>Yes</td>
<td>Supports regular expressions, but does not support multiple comma-separated paths/expressions.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: regular expressions -> glob paths

@zsxwing
Copy link
Member

zsxwing commented Jan 5, 2017

LGTM

@SparkQA
Copy link

SparkQA commented Jan 5, 2017

Test build #70895 has finished for PR 16468 at commit 8f01f56.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@david-weiluo-ren
Copy link
Contributor

david-weiluo-ren commented Jan 5, 2017

@tdas
It says “However, note that all of the operations applicable on static DataFrames/Datasets are not supported in streaming DataFrames/Datasets yet” in https://spark.apache.org/docs/2.1.0/structured-streaming-programming-guide.html#unsupported-operations

I think it should be “not all of the operations …. are supported in … yet” instead of “all of the operations … are not supported in … yet". You might want to fix this minor issue in this PR.

@tdas
Copy link
Contributor Author

tdas commented Jan 5, 2017

@david-weiluo-ren yeah the wording can be better. maybe "all of the operations ... are not yet supported"


- **Memory sink (for debugging)** - The output is stored in memory as an in-memory table.
Both, Append and Complete output modes, are supported. This should be used for debugging purposes
on low data volumes as the entire output is collected and stored in the driver's memory after
Copy link

@thomasdesr thomasdesr Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly repetitive, it says "[...] the entire output is collected and stored in the driver's memory [...]" and then says: "Note that the current implementations saves all the data in the driver memory".

If we want to say this twice to make sure people read it; maybe we can move the "note" reminder into the Notes column in the table a few lines down? :D

Copy link
Contributor Author

@tdas tdas Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree it repetitive, but i dont want people to miss this. I will rewrite this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah sorry I misunderstood. I thought the note in above the table and the Notes in the table was the repetition. But that's not the case. My bad.

<tr>
<th>Source</th>
<th>Options</th>
<th>Fault-tolerant</th>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link back to #fault-tolerance-semantics here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont want to make the table heading a link, but I will do something.

<th>Supported Output Modes</th>
<th style="width:30%">Usage</th>
<th>Options</th>
<th>Fault-tolerant</th>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@tdas
Copy link
Contributor Author

tdas commented Jan 6, 2017

Thank you very much @thomaso-mirodin @david-weiluo-ren @zsxwing
I have addressed your comments. I havent updated the screenshots though. Please look into the diff.

@SparkQA
Copy link

SparkQA commented Jan 6, 2017

Test build #70952 has finished for PR 16468 at commit d29ee29.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in b59cdda Jan 6, 2017
asfgit pushed a commit that referenced this pull request Jan 6, 2017
…e for update mode and source/sink options

## What changes were proposed in this pull request?

Updates
- Updated Late Data Handling section by adding a figure for Update Mode. Its more intuitive to explain late data handling with Update Mode, so I added the new figure before the Append Mode figure.
- Updated Output Modes section with Update mode
- Added options for all the sources and sinks

---------------------------
---------------------------

![image](https://cloud.githubusercontent.com/assets/663212/21665176/f150b224-d29f-11e6-8372-14d32da21db9.png)

---------------------------
---------------------------
<img width="931" alt="screen shot 2017-01-03 at 6 09 11 pm" src="https://cloud.githubusercontent.com/assets/663212/21629740/d21c9bb8-d1df-11e6-915b-488a59589fa6.png">
<img width="933" alt="screen shot 2017-01-03 at 6 10 00 pm" src="https://cloud.githubusercontent.com/assets/663212/21629749/e22bdabe-d1df-11e6-86d3-7e51d2f28dbc.png">

---------------------------
---------------------------
![image](https://cloud.githubusercontent.com/assets/663212/21665200/108e18fc-d2a0-11e6-8640-af598cab090b.png)
![image](https://cloud.githubusercontent.com/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png)
![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png)

Author: Tathagata Das <[email protected]>

Closes #16468 from tdas/SPARK-19074.

(cherry picked from commit b59cdda)
Signed-off-by: Tathagata Das <[email protected]>
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Jan 9, 2017
…e for update mode and source/sink options

## What changes were proposed in this pull request?

Updates
- Updated Late Data Handling section by adding a figure for Update Mode. Its more intuitive to explain late data handling with Update Mode, so I added the new figure before the Append Mode figure.
- Updated Output Modes section with Update mode
- Added options for all the sources and sinks

---------------------------
---------------------------

![image](https://cloud.githubusercontent.com/assets/663212/21665176/f150b224-d29f-11e6-8372-14d32da21db9.png)

---------------------------
---------------------------
<img width="931" alt="screen shot 2017-01-03 at 6 09 11 pm" src="https://cloud.githubusercontent.com/assets/663212/21629740/d21c9bb8-d1df-11e6-915b-488a59589fa6.png">
<img width="933" alt="screen shot 2017-01-03 at 6 10 00 pm" src="https://cloud.githubusercontent.com/assets/663212/21629749/e22bdabe-d1df-11e6-86d3-7e51d2f28dbc.png">

---------------------------
---------------------------
![image](https://cloud.githubusercontent.com/assets/663212/21665200/108e18fc-d2a0-11e6-8640-af598cab090b.png)
![image](https://cloud.githubusercontent.com/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png)
![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png)

Author: Tathagata Das <[email protected]>

Closes apache#16468 from tdas/SPARK-19074.
@spoddutur
Copy link

Hi TD,

As part of 2.1.0 release, Kafka as a source is added.
SPARK-17346: Kafka 0.10 support in Structured Streaming.
Wondering if kinesis support will be added in future. If yes, When can we expect it?

Reason for asking kinesis support is, we use kinesis spark streaming with spark 1.6 as of now and are planning to upgrade to Spark 2 Structured Streaming. So, kinda eager to know when can we expect kinesis support in StructuredStreaming.

Thanks in Advance,
Sruthi

uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…e for update mode and source/sink options

## What changes were proposed in this pull request?

Updates
- Updated Late Data Handling section by adding a figure for Update Mode. Its more intuitive to explain late data handling with Update Mode, so I added the new figure before the Append Mode figure.
- Updated Output Modes section with Update mode
- Added options for all the sources and sinks

---------------------------
---------------------------

![image](https://cloud.githubusercontent.com/assets/663212/21665176/f150b224-d29f-11e6-8372-14d32da21db9.png)

---------------------------
---------------------------
<img width="931" alt="screen shot 2017-01-03 at 6 09 11 pm" src="https://cloud.githubusercontent.com/assets/663212/21629740/d21c9bb8-d1df-11e6-915b-488a59589fa6.png">
<img width="933" alt="screen shot 2017-01-03 at 6 10 00 pm" src="https://cloud.githubusercontent.com/assets/663212/21629749/e22bdabe-d1df-11e6-86d3-7e51d2f28dbc.png">

---------------------------
---------------------------
![image](https://cloud.githubusercontent.com/assets/663212/21665200/108e18fc-d2a0-11e6-8640-af598cab090b.png)
![image](https://cloud.githubusercontent.com/assets/663212/21665148/cfe414fa-d29f-11e6-9baa-4124ccbab093.png)
![image](https://cloud.githubusercontent.com/assets/663212/21665226/2e8f39e4-d2a0-11e6-85b1-7657e2df5491.png)

Author: Tathagata Das <[email protected]>

Closes apache#16468 from tdas/SPARK-19074.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants