[SPARK-24485][SS] Measure and log elapsed time for filesystem operations in HDFSBackedStateStoreProvider #21506

HeartSaVioR · 2018-06-07T14:56:39Z

What changes were proposed in this pull request?

This patch measures and logs elapsed time for each operation which communicate with file system (mostly remote HDFS in production) in HDFSBackedStateStoreProvider to help investigating any latency issue.

How was this patch tested?

Manually tested.

…ons in HDFSBackedStateStoreProvider

HeartSaVioR · 2018-06-07T14:58:28Z

There're plenty of other debug messages which might hide the log messages added from this patch. Would we want to log them with INFO instead of DEBUG?

SparkQA · 2018-06-07T19:35:07Z

Test build #91525 has finished for PR 21506 at commit d84f98f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-06-10T00:24:58Z

cc. @tdas @jose-torres @jerryshao @arunmahadevan @HyukjinKwon

HyukjinKwon · 2018-06-10T02:12:29Z

add to whitelist

HyukjinKwon · 2018-06-10T02:13:01Z

ok to test

SparkQA · 2018-06-10T05:53:14Z

Test build #91626 has finished for PR 21506 at commit d84f98f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-06-10T06:52:18Z

Test build #91627 has finished for PR 21506 at commit d84f98f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jose-torres

In general I agree that these logs are worth the extra code.

I would argue that debug is the right logging level - these are pretty detailed metrics. But I don't feel super strongly about it.

jose-torres · 2018-06-10T18:32:09Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

-        lastAvailableMap =
-          synchronized { loadedMaps.get(lastAvailableVersion) }
-            .orElse(readSnapshotFile(lastAvailableVersion))
+    val (result, elapsedMs) = Utils.timeTakenMs {


Github has an... interesting idea of how to display this diff. The only change was the existing code moving inside timeTakenMs and adding the logWarning statements, correct?

Yup right. Most of the code change is just wrapping codes into timeTakenMs.

jose-torres · 2018-06-10T18:33:43Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala


-    synchronized { loadedMaps.put(version, resultMap) }
-    resultMap
+    logWarning(s"Loading state for $version takes $elapsedMs ms.")


I'm not sure this should be a warning.

I just thought about making a pair between warning message above and this, but once we are guiding end users to turn on DEBUG level to see information regarding addition latencies, turning this to DEBUG would be also OK.

Changed log level to DEBUG.

SparkQA · 2018-06-11T07:05:01Z

Test build #91649 has finished for PR 21506 at commit 3d0e23f.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-06-11T07:22:26Z

retest this please

SparkQA · 2018-06-11T11:52:09Z

Test build #91651 has finished for PR 21506 at commit 3d0e23f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2018-06-12T00:40:13Z

Kindly ping again to @tdas

jose-torres · 2018-06-13T04:29:29Z

lgtm

HyukjinKwon · 2018-06-13T04:35:51Z

Merged to master.

gaborgsomogyi · 2019-11-12T15:24:42Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

-    var lastAvailableMap: Option[MapType] = None
-    while (lastAvailableMap.isEmpty) {
-      lastAvailableVersion -= 1
+    logWarning(s"The state for version $version doesn't exist in loadedMaps. " +


@HeartSaVioR this is normal operation and not yet understand why use logWarning then. Can we lower this to debug or was there a reason to use warning? It's kinda overkill in unit tests...

This is not a normal if it's not just restored from checkpoint. If someone encounters the warning message while batches are running, it should be considered seriously because full state is being loaded from HDFS now though we expect state cache should contain it.

[SPARK-24485][SS] Measure and log elapsed time for filesystem operati…

d84f98f

…ons in HDFSBackedStateStoreProvider

HeartSaVioR mentioned this pull request Jun 7, 2018

Scalable Memory option for HDFSBackedStateStore #21500

Closed

jose-torres reviewed Jun 10, 2018

View reviewed changes

Address review comment: adjust log level for log message

3d0e23f

asfgit closed this in 4c388bc Jun 13, 2018

HeartSaVioR deleted the SPARK-24485 branch January 25, 2019 22:22

gaborgsomogyi reviewed Nov 12, 2019

View reviewed changes

[SPARK-24485][SS] Measure and log elapsed time for filesystem operations in HDFSBackedStateStoreProvider #21506

[SPARK-24485][SS] Measure and log elapsed time for filesystem operations in HDFSBackedStateStoreProvider #21506

Uh oh!

Conversation

HeartSaVioR commented Jun 7, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

HeartSaVioR commented Jun 7, 2018

Uh oh!

SparkQA commented Jun 7, 2018

Uh oh!

HeartSaVioR commented Jun 10, 2018

Uh oh!

HyukjinKwon commented Jun 10, 2018

Uh oh!

HyukjinKwon commented Jun 10, 2018

Uh oh!

SparkQA commented Jun 10, 2018

Uh oh!

SparkQA commented Jun 10, 2018

Uh oh!

jose-torres left a comment

Choose a reason for hiding this comment

Uh oh!

jose-torres Jun 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Jun 11, 2018

Choose a reason for hiding this comment

Uh oh!

jose-torres Jun 10, 2018

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Jun 11, 2018

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Jun 11, 2018

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 11, 2018

Uh oh!

HeartSaVioR commented Jun 11, 2018

Uh oh!

SparkQA commented Jun 11, 2018

Uh oh!

HeartSaVioR commented Jun 12, 2018

Uh oh!

jose-torres commented Jun 13, 2018

Uh oh!

HyukjinKwon commented Jun 13, 2018

Uh oh!

gaborgsomogyi Nov 12, 2019

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR Nov 15, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jose-torres Jun 10, 2018 •

edited

Loading