Skip to content

Conversation

@xuanyuanking
Copy link
Member

What changes were proposed in this pull request?

As discussion in #20675, we need add a new interface ContinuousDataReaderFactory to support the requirements of setting start offset in Continuous Processing.

How was this patch tested?

Existing UT.

@SparkQA
Copy link

SparkQA commented Feb 28, 2018

Test build #87758 has finished for PR 20689 at commit 59cef98.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@xuanyuanking
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 28, 2018

Test build #87767 has finished for PR 20689 at commit 59cef98.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* @param offset offset want to set as the DataReader's startOffset.
*/
default DataReader<T> createDataReaderWithOffset(PartitionOffset offset) {
throw new IllegalStateException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we want a default here - it seems like subclasses should always be able to provide an implementation, and thus that we should always require them to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to make this one just abstract.

@jose-torres
Copy link
Contributor

@tdas

@rdblue - this is a streaming DataSourceV2 API change specific to continuous processing (SPIP SPARK-20928). We're still iterating towards a solution that makes continuous processing compatible with all the existing Spark operations, so we don't have a full formal description of the API surface yet.

@SparkQA
Copy link

SparkQA commented Mar 1, 2018

Test build #87811 has finished for PR 20689 at commit 4bf17a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Otherwise LGTM


override def createDataReaderWithOffset(offset: PartitionOffset): DataReader[UnsafeRow] = {
val kafkaOffset = offset.asInstanceOf[KafkaSourcePartitionOffset]
assert(kafkaOffset.topicPartition == topicPartition)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may happen. I prefer to use require like this:

require(kafkaOffset.topicPartition == topicPartition, s"expected: $topicPartition actual: ${kafkaOffset.topicPartition}")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.


override def createDataReaderWithOffset(offset: PartitionOffset): DataReader[Row] = {
val rateStreamOffset = offset.asInstanceOf[RateStreamPartitionOffset]
assert(rateStreamOffset.partition == partitionIndex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* @param offset offset want to set as the DataReader's startOffset.
*/
default DataReader<T> createDataReaderWithOffset(PartitionOffset offset) {
throw new IllegalStateException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to make this one just abstract.

@jose-torres
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Mar 15, 2018

Test build #88253 has finished for PR 20689 at commit 992e2c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Mar 15, 2018

Thanks! Merging to master.

@asfgit asfgit closed this in 7c3e899 Mar 15, 2018
mstewart141 pushed a commit to mstewart141/spark that referenced this pull request Mar 24, 2018
…rtOffset

## What changes were proposed in this pull request?

As discussion in apache#20675, we need add a new interface `ContinuousDataReaderFactory` to support the requirements of setting start offset in Continuous Processing.

## How was this patch tested?

Existing UT.

Author: Yuanjian Li <[email protected]>

Closes apache#20689 from xuanyuanking/SPARK-23533.
otterc pushed a commit to linkedin/spark that referenced this pull request Mar 22, 2023
…rtOffset

## What changes were proposed in this pull request?

As discussion in apache#20675, we need add a new interface `ContinuousDataReaderFactory` to support the requirements of setting start offset in Continuous Processing.

## How was this patch tested?

Existing UT.

Author: Yuanjian Li <[email protected]>

Closes apache#20689 from xuanyuanking/SPARK-23533.

RB=1844647
G=superfriends-reviewers
R=mshen,fli,zolin,yezhou,latang
A=
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants