Skip to content

Conversation

@mvogiatzis
Copy link
Contributor

Runs for all existing keys and returning "None" will remove the key-value pair.

Runs for *all* existing keys and returning "None" will remove the key-value pair.
@mvogiatzis mvogiatzis changed the title Added important updateStateByKey details [DOCS] Added important updateStateByKey details Jul 5, 2015
@srowen
Copy link
Member

srowen commented Jul 6, 2015

It's kind of implied by the docs above, since it must run on the current state of every key. This doesn't look like the right place; the doc of this method is above.

@mvogiatzis
Copy link
Contributor Author

The documentation above states: "This can be used to maintain arbitrary state data for each key" .

I would expect that each key means each incoming key in the batch, as the absence of new values for a key would mean no change of the existing state (e.g. for better performance). I found out the hard way (stackoverflow, mailing list, local testing).

I can move the description in the method doc above (although feels a big lengthy) or better in code, but I feel this information is necessary.

@srowen
Copy link
Member

srowen commented Jul 6, 2015

No I mean farther up in "UpdateStateByKey Operation" where the operation is explained, rather than putting this as a coda at the end after examples. You can weave a comment into the examples too I suppose, to clarify what keys are getting updated. I personally assumed the current behavior since there's otherwise no way key's values change unless a new value arrives, and that can't cover all use cases. Clarification never hurt though.

Moved the update description up, before the example. Maybe it's worth mentioning in the example that "Existing words with no new values will also be called by updateStateByKey until they return None"
@mvogiatzis
Copy link
Contributor Author

I agree, this is better. Maybe it's worth mentioning in the example that "Existing words with no new values will also be called by updateStateByKey until they return None"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You ended up with a stray space here, but if you don't have a moment to zap that today, it's no big deal, I'll merge. The text looks OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, removed.

@SparkQA
Copy link

SparkQA commented Jul 8, 2015

Test build #1011 has finished for PR 7229 at commit c2656f9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to go around one more time on this, but I think the text should be sharpened slightly. "operation" might be better as "update function", and it's a question of what the update function returns, not updateStateByKey. (None can be code font too.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Please update.

@andrewor14
Copy link
Contributor

@tdas

Replaced operation with update function and None with code-font None
@mvogiatzis
Copy link
Contributor Author

Is this ok now or should I take off the updateStateByKey update function ?

@srowen
Copy link
Member

srowen commented Jul 9, 2015

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section is about updateStateByKey so saying it again here is superfluous. Just "run the update function". Also I would clarify further: "In every batch, Spark will apply the update function for all... ". It wasnt clear that whether it was for every batch or overall.

asfgit pushed a commit that referenced this pull request Jul 10, 2015
Runs for *all* existing keys and returning "None" will remove the key-value pair.

Author: Michael Vogiatzis <[email protected]>

Closes #7229 from mvogiatzis/patch-1 and squashes the following commits:

e7a2946 [Michael Vogiatzis] Updated updateStateByKey text
00283ed [Michael Vogiatzis] Removed space
c2656f9 [Michael Vogiatzis] Moved description farther up
0a42551 [Michael Vogiatzis] Added important updateStateByKey details

(cherry picked from commit d538919)
Signed-off-by: Tathagata Das <[email protected]>
@tdas
Copy link
Contributor

tdas commented Jul 10, 2015

No worries, I have the made the change while merging it. Thanks @mvogiatzis, merged it to master and 1.4

@asfgit asfgit closed this in d538919 Jul 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants