-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18776][SS] Make Offset for FileStreamSource corrected formatted in json #16205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * Offset for the [[FileStreamSource]]. | ||
| * @param logOffset Position in the [[FileStreamSourceLog]] | ||
| */ | ||
| case class FileStreamSourceOffset(logOffset: Long) extends Offset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named it logoffset to make a more clear that this is a offset in the log of FileStreamSource
|
Test build #69841 has finished for PR 16205 at commit
|
|
|
|
Oh, please also support to read the old format in case RC2 is passed. |
|
LGTM pending tests |
|
Test build #69888 has finished for PR 16205 at commit
|
|
Merging to master and 2.1 |
…d in json ## What changes were proposed in this pull request? - Changed FileStreamSource to use new FileStreamSourceOffset rather than LongOffset. The field is named as `logOffset` to make it more clear that this is a offset in the file stream log. - Fixed bug in FileStreamSourceLog, the field endId in the FileStreamSourceLog.get(startId, endId) was not being used at all. No test caught it earlier. Only my updated tests caught it. Other minor changes - Dont use batchId in the FileStreamSource, as calling it batch id is extremely miss leading. With multiple sources, it may happen that a new batch has no new data from a file source. So offset of FileStreamSource != batchId after that batch. ## How was this patch tested? Updated unit test. Author: Tathagata Das <[email protected]> Closes #16205 from tdas/SPARK-18776. (cherry picked from commit 458fa33) Signed-off-by: Tathagata Das <[email protected]>
…d in json ## What changes were proposed in this pull request? - Changed FileStreamSource to use new FileStreamSourceOffset rather than LongOffset. The field is named as `logOffset` to make it more clear that this is a offset in the file stream log. - Fixed bug in FileStreamSourceLog, the field endId in the FileStreamSourceLog.get(startId, endId) was not being used at all. No test caught it earlier. Only my updated tests caught it. Other minor changes - Dont use batchId in the FileStreamSource, as calling it batch id is extremely miss leading. With multiple sources, it may happen that a new batch has no new data from a file source. So offset of FileStreamSource != batchId after that batch. ## How was this patch tested? Updated unit test. Author: Tathagata Das <[email protected]> Closes apache#16205 from tdas/SPARK-18776.
…d in json ## What changes were proposed in this pull request? - Changed FileStreamSource to use new FileStreamSourceOffset rather than LongOffset. The field is named as `logOffset` to make it more clear that this is a offset in the file stream log. - Fixed bug in FileStreamSourceLog, the field endId in the FileStreamSourceLog.get(startId, endId) was not being used at all. No test caught it earlier. Only my updated tests caught it. Other minor changes - Dont use batchId in the FileStreamSource, as calling it batch id is extremely miss leading. With multiple sources, it may happen that a new batch has no new data from a file source. So offset of FileStreamSource != batchId after that batch. ## How was this patch tested? Updated unit test. Author: Tathagata Das <[email protected]> Closes apache#16205 from tdas/SPARK-18776.
What changes were proposed in this pull request?
logOffsetto make it more clear that this is a offset in the file stream log.Other minor changes
How was this patch tested?
Updated unit test.