Skip to content

Conversation

@HeartSaVioR
Copy link
Contributor

What changes were proposed in this pull request?

This patch measures and logs elapsed time for each operation which communicate with file system (mostly remote HDFS in production) in HDFSBackedStateStoreProvider to help investigating any latency issue.

How was this patch tested?

Manually tested.

@HeartSaVioR
Copy link
Contributor Author

There're plenty of other debug messages which might hide the log messages added from this patch. Would we want to log them with INFO instead of DEBUG?

@SparkQA
Copy link

SparkQA commented Jun 7, 2018

Test build #91525 has finished for PR 21506 at commit d84f98f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

@HyukjinKwon
Copy link
Member

add to whitelist

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Jun 10, 2018

Test build #91626 has finished for PR 21506 at commit d84f98f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 10, 2018

Test build #91627 has finished for PR 21506 at commit d84f98f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jose-torres jose-torres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree that these logs are worth the extra code.

I would argue that debug is the right logging level - these are pretty detailed metrics. But I don't feel super strongly about it.

lastAvailableMap =
synchronized { loadedMaps.get(lastAvailableVersion) }
.orElse(readSnapshotFile(lastAvailableVersion))
val (result, elapsedMs) = Utils.timeTakenMs {
Copy link
Contributor

@jose-torres jose-torres Jun 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github has an... interesting idea of how to display this diff. The only change was the existing code moving inside timeTakenMs and adding the logWarning statements, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup right. Most of the code change is just wrapping codes into timeTakenMs.


synchronized { loadedMaps.put(version, resultMap) }
resultMap
logWarning(s"Loading state for $version takes $elapsedMs ms.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this should be a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought about making a pair between warning message above and this, but once we are guiding end users to turn on DEBUG level to see information regarding addition latencies, turning this to DEBUG would be also OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed log level to DEBUG.

@SparkQA
Copy link

SparkQA commented Jun 11, 2018

Test build #91649 has finished for PR 21506 at commit 3d0e23f.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 11, 2018

Test build #91651 has finished for PR 21506 at commit 3d0e23f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

Kindly ping again to @tdas

@jose-torres
Copy link
Contributor

lgtm

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 4c388bc Jun 13, 2018
@HeartSaVioR HeartSaVioR deleted the SPARK-24485 branch January 25, 2019 22:22
var lastAvailableMap: Option[MapType] = None
while (lastAvailableMap.isEmpty) {
lastAvailableVersion -= 1
logWarning(s"The state for version $version doesn't exist in loadedMaps. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HeartSaVioR this is normal operation and not yet understand why use logWarning then. Can we lower this to debug or was there a reason to use warning? It's kinda overkill in unit tests...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a normal if it's not just restored from checkpoint. If someone encounters the warning message while batches are running, it should be considered seriously because full state is being loaded from HDFS now though we expect state cache should contain it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants