[SPARK-29111][CORE] Support snapshot/restore on KVStore #25811

HeartSaVioR · 2019-09-17T03:35:00Z

What changes were proposed in this pull request?

This patch proposes adding new feature to KVStore, snapshot (dump) & restore. The structure of snapshot file is also described to the design doc, but it's alike the structure of delta/snapshot file in HDFSBackedStateStoreProvider.

Here's the format of snapshot file. The file is in binary format, and follows the way how DataOutputStream writes the types of “int” and “byte[]”, hence it follows “big-endian” when writing “int” type of value.

The file will start with metadata information, as KVStore has at most one metadata object being stored.

the length of metadata class name (int) | metadata class name (byte[]) | the length of serialized metadata object (int) | serialized metadata object (byte[]) | -2 (int)

If there's no metadata stored in KVStore, the length of metadata class name would be -2, which denotes following fields will not be presented.

And the file will contain normal objects for each type which format is below:

the length of class name (int) | class name (byte[]) | the length of #1 serialized object (int) | #1 serialized object (byte[]) | ... (#2, #3, ...) | -2 (int)

Above format will be repeated for each type, and when no object is left for dumping, we simply put -1 in the place where the length of class name is expected to mark the end of file.

Why are the changes needed?

The new feature will be used as a building block for SPARK-28870. The patch is intended to be separate with the issue, as I would like to make each PR smaller.

Does this PR introduce any user-facing change?

No, as KVStore interface is defined as @Private.

How was this patch tested?

Added new UTs.

HeartSaVioR · 2019-09-17T03:37:42Z

common/kvstore/src/test/java/org/apache/spark/util/kvstore/ArrayKeyIndexType.java

  @Override
  public int hashCode() {
-    return key.hashCode();
+    return Arrays.hashCode(key);


This fixes possible existing bug - without fixing this, comparison of both Sets/Maps which contain ArrayKeyIndexType as key would fail.

Just created a separate PR for this: #26709

HeartSaVioR · 2019-09-17T03:39:05Z

common/kvstore/src/test/java/org/apache/spark/util/kvstore/InMemoryStoreSuite.java

  }

+  @Test
+  public void testMultipleTypesWriteReadDelete() throws Exception {


This test was only available for LevelDBSuite so I copied here to test new API addition.

SparkQA · 2019-09-17T03:42:16Z

Test build #110727 has finished for PR 25811 at commit 5a2dab1.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
public class KVStoreSnapshotter

HeartSaVioR

Both InMemoryStoreSuite and LevelDBSuite are heavily duplicated, but LevelDBSuite also has a logic which checks with LevelDB, so I leave them as they are. Please let me know if it would be worth to deduplicate them.

HeartSaVioR · 2019-09-17T03:39:50Z

common/kvstore/src/test/java/org/apache/spark/util/kvstore/IntKeyType.java

+
+import java.util.List;
+
+public class IntKeyType {


This is moved out of LevelDBSuite to co-use between test suites.

HeartSaVioR · 2019-09-17T03:41:11Z

common/kvstore/src/test/java/org/apache/spark/util/kvstore/KVStoreIteratorSuite.java

 import static org.junit.Assert.*;

-public abstract class DBIteratorSuite {
+public abstract class KVStoreIteratorSuite {


I've just renamed this as it made me confused - I imagined DB as LevelDB but there's separate suite for LevelDB. KVStore sounds better to me.

SparkQA · 2019-09-17T03:54:53Z

Test build #110728 has finished for PR 25811 at commit 694ff21.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-17T04:11:09Z

Test build #110731 has finished for PR 25811 at commit 7d0c6c0.

This patch fails Java style tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-09-17T05:07:01Z

common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java

-import java.util.HashSet;
-import java.util.List;
-import java.util.NoSuchElementException;
+import java.util.*;


nit. Can we add one line import java.util.Set instead?

IDE automatically did it. Just unrolled. Thanks!

SparkQA · 2019-09-17T06:47:52Z

Test build #110735 has finished for PR 25811 at commit 4004d57.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-17T07:05:02Z

Test build #110738 has finished for PR 25811 at commit 9b63b05.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-09-17T08:13:03Z

retest this, please

SparkQA · 2019-09-17T11:07:19Z

Test build #110759 has finished for PR 25811 at commit 9b63b05.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-09-17T14:32:02Z

retest this, please

SparkQA · 2019-09-17T14:33:31Z

Test build #110788 has started for PR 25811 at commit 9b63b05.

shaneknapp · 2019-09-17T17:40:50Z

test this please

SparkQA · 2019-09-17T18:59:34Z

Test build #110802 has finished for PR 25811 at commit 9b63b05.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-09-18T06:04:03Z

retest this, please

SparkQA · 2019-09-18T07:05:01Z

Test build #110869 has finished for PR 25811 at commit 9b63b05.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-09-18T07:18:47Z

retest this, please

SparkQA · 2019-09-18T09:54:48Z

Test build #110875 has finished for PR 25811 at commit 9b63b05.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-09-18T14:00:22Z

Known flaky test: SPARK-23197. Not relevant to this patch.

HeartSaVioR · 2019-09-18T14:00:28Z

retest this, please

SparkQA · 2019-09-18T17:04:37Z

Test build #110912 has finished for PR 25811 at commit 9b63b05.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-09-19T20:10:52Z

Let me cc. to same for #25670 - @felixcheung (as shepherd of SPARK-28594) @vanzin @squito @gengliangwang @dongjoon-hyun

also cc. to @Ngone51 as might be interested on this.

Ngone51

Generally, looks good!

Ngone51 · 2019-09-20T15:14:03Z

common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreSnapshotter.java

+    this.serializer = serializer;
+  }
+
+  public void dump(KVStore store, File snapshotFile) throws Exception {


Just want to let you know that SPARK-28867 may also want to dump KVStore to HDFS-support filesystem.

Nice catch! Thanks for pointing out.

Maybe we need to have InputStream/OutputStream (or more specific types) as a parameter instead of File so that it can be worked with any filesystem. kvstore module doesn't have Hadoop dependency so it would be ideal to avoid depending on Hadoop directly.

Well, InputStream/OutputStream could work around this case. But I'm afraid it brings troublesome usage for the caller, as they needs to prepare their own streams(local or HDFS, in or out). And I believe, currently, KVStore is only designed to persist in filesystems and underlying streams should always be file streams. So, exposing InputStream/OutputStream seems an overkill here.

I'd prefer to use HDFS API to support dump to both local and HDFS-supported filesystem and caller could only pass in a Path to indicate where they want to dump (just as previous File does). And since we're introducing a dump feature here, so I think it would be OK to depend on Hadoop directly now. If we don't depend on Hadoop here, we'll still depend it elsewhere.

WDYT ? @HeartSaVioR

Well, InputStream/OutputStream could work around this case. But I'm afraid it brings troublesome usage for the caller, as they needs to prepare their own streams(local or HDFS, in or out). And I believe, currently, KVStore is only designed to persist in filesystems and underlying streams should always be file streams. So, exposing InputStream/OutputStream seems an overkill here.

Sorry I totally disagree this is an workaround - especially the parameter here depending on is pure Java API, which any Java developers must know and familiar with how to deal with.

spark/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala

Lines 52 to 59 in c7c6b64

def replay(

logData: InputStream,

sourceName: String,

maybeTruncated: Boolean = false,

eventsFilter: ReplayEventsFilter = SELECT_ALL_FILTER): Unit = {

val lines = Source.fromInputStream(logData)(Codec.UTF8).getLines()

replay(lines, sourceName, maybeTruncated, eventsFilter)

}

Just look at ReplayListenerBus. What exactly it requires caller to prepare? InputStream. Does it use something other than file except some case of UT? Well, no. That's consideration on API level to open extension.

I'll leave the decision to the committers.

Even if we turn out to agree directly depend on Hadoop, I would just move this to core module and rewrite this as Scala. Writing this to Java was the intentional effort to put this along with other classes in common/kvstore, and I don't think modules in common depend on hadoop - except network-yarn which even define Hadoop dependency as provided. common modules are trying to avoid depending on Hadoop.

That's fine. If we want to follow ReplayListenerBus ' s way, I think it would be better if we could provides read/write helper method later, which just return inputStream/outputStream, similar as EventLoggingListener.openEventLog() do.

If we want to follow ReplayListenerBus ' s way, I think it would be better if we could provides read/write helper method later, which just return inputStream/outputStream, similar as EventLoggingListener.openEventLog() do.

That's a good idea. As the PR intends to touch only common-kvstore module, we would be better to deal with this in next PR (FOLLOWUP or include in next work).

Ngone51 · 2019-09-20T15:17:20Z

common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreSnapshotter.java

+    for (Class<?> clazz : types) {
+      writeClassName(clazz, output);
+
+      KVStoreView<?> view = store.view(clazz);


I'm wondering will there be a type with empty objects in the KVStore. Normally, it seems impossible.

I'm not sure any implementations allow type with empty objects.

I see what you say - if there's some implementation allow the case like types() returning Class but view(A.class) contains nothing, we don't provide the way to only add type to KVStore. We may want to be clear in the interface javadoc that "type with empty objects are ignored while recovering, so implementations should not rely on this", as thinking theoretically, but I'm afraid I might be over-thinking.

SparkQA · 2019-09-21T03:53:54Z

Test build #111102 has finished for PR 25811 at commit b6e65f8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HeartSaVioR · 2019-10-23T23:46:16Z

I'll add a version of Spark on top of snapshot file soon to give a hint to determine whether the snapshot file is compatible with the reader side.

…yIndexType correctly ### What changes were proposed in this pull request? This patch fixes the bug on ArrayKeyIndexType.hashCode() as it is simply calling Array.hashCode() which in turn calls Object.hashCode(). That should be Arrays.hashCode() to reflect the elements in the array. ### Why are the changes needed? I've encountered the bug in #25811 while adding test codes for #25811, and I've split the fix into individual PR to speed up reviewing. Without this patch, ArrayKeyIndexType would bring various issues when it's used as type of collections. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I've skipped adding UT as ArrayKeyIndexType is in test and the patch is pretty simple one-liner. Closes #26709 from HeartSaVioR/SPARK-30075. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Sean Owen <[email protected]>

…yIndexType correctly ### What changes were proposed in this pull request? This patch fixes the bug on ArrayKeyIndexType.hashCode() as it is simply calling Array.hashCode() which in turn calls Object.hashCode(). That should be Arrays.hashCode() to reflect the elements in the array. ### Why are the changes needed? I've encountered the bug in apache#25811 while adding test codes for apache#25811, and I've split the fix into individual PR to speed up reviewing. Without this patch, ArrayKeyIndexType would bring various issues when it's used as type of collections. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? I've skipped adding UT as ArrayKeyIndexType is in test and the patch is pretty simple one-liner. Closes apache#26709 from HeartSaVioR/SPARK-30075. Authored-by: Jungtaek Lim (HeartSaVioR) <[email protected]> Signed-off-by: Sean Owen <[email protected]>

…h any filesystems

SparkQA · 2019-12-08T04:55:53Z

Test build #114987 has finished for PR 25811 at commit 922707e.

This patch fails Java style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-12-08T13:44:55Z

Test build #114991 has finished for PR 25811 at commit 28cd3d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-01-08T04:59:00Z

Test build #116277 has finished for PR 25811 at commit 28cd3d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2020-01-24T16:31:47Z

Given we're going with the different approach you proposed, should this PR be closed?

HeartSaVioR · 2020-01-26T07:58:47Z

Thanks for the reminder!

I think this PR is still useful to implement incremental replaying on SHS side, but I can close this for now as I may not work on incremental replaying right now.

If some other contributors want to implement incremental replaying (actually I've seen couple of PRs for this, though all of them are restricted to in-memory KVStore) this PR can be used again. It would be just simple to reopen.

HeartSaVioR commented Sep 17, 2019

View reviewed changes

HeartSaVioR mentioned this pull request Sep 17, 2019

[SPARK-28869][CORE] Roll over event log files #25670

Closed

dongjoon-hyun reviewed Sep 17, 2019

View reviewed changes

dongjoon-hyun added the SPARK CORE label Sep 17, 2019

HeartSaVioR mentioned this pull request Sep 17, 2019

[WIP][CORE][SPARK-28867] InMemoryStore checkpoint to speed up replay log file in HistoryServer #25577

Closed

6 tasks

Ngone51 reviewed Sep 20, 2019

View reviewed changes

Ngone51 mentioned this pull request Oct 10, 2019

[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener #25943

Closed

HeartSaVioR mentioned this pull request Nov 29, 2019

[SPARK-30075][CORE][TESTS] Fix the hashCode implementation of ArrayKeyIndexType correctly #26709

Closed

HeartSaVioR added 6 commits December 8, 2019 13:43

[SPARK-29111][CORE] Support snapshot/restore on KVStore

cba55b7

Remove unused file

24fe854

Fix compilation - missed to add file

6ea1e36

Fix issues with java lint

c8261d7

Unroll imports

d94dc9c

Receive InputStream/OutputStream instead of File to be compatible wit…

922707e

…h any filesystems

HeartSaVioR force-pushed the SPARK-29111 branch from b6e65f8 to 922707e Compare December 8, 2019 04:43

Lint fix

28cd3d5

HeartSaVioR closed this Jan 26, 2020

	def replay(
	logData: InputStream,
	sourceName: String,
	maybeTruncated: Boolean = false,
	eventsFilter: ReplayEventsFilter = SELECT_ALL_FILTER): Unit = {
	val lines = Source.fromInputStream(logData)(Codec.UTF8).getLines()
	replay(lines, sourceName, maybeTruncated, eventsFilter)
	}

Uh oh!

[SPARK-29111][CORE] Support snapshot/restore on KVStore #25811

[SPARK-29111][CORE] Support snapshot/restore on KVStore #25811

Uh oh!

Conversation

HeartSaVioR commented Sep 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

HeartSaVioR Sep 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 17, 2019

Uh oh!

HeartSaVioR left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 17, 2019

Uh oh!

SparkQA commented Sep 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 17, 2019

Uh oh!

SparkQA commented Sep 17, 2019

Uh oh!

HeartSaVioR commented Sep 17, 2019

Uh oh!

SparkQA commented Sep 17, 2019

Uh oh!

HeartSaVioR commented Sep 17, 2019

Uh oh!

SparkQA commented Sep 17, 2019

Uh oh!

shaneknapp commented Sep 17, 2019

Uh oh!

SparkQA commented Sep 17, 2019

Uh oh!

HeartSaVioR commented Sep 18, 2019

Uh oh!

SparkQA commented Sep 18, 2019

Uh oh!

HeartSaVioR commented Sep 18, 2019

Uh oh!

SparkQA commented Sep 18, 2019

Uh oh!

HeartSaVioR commented Sep 18, 2019

Uh oh!

HeartSaVioR commented Sep 18, 2019

Uh oh!

SparkQA commented Sep 18, 2019

Uh oh!

HeartSaVioR commented Sep 19, 2019

Uh oh!

Ngone51 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HeartSaVioR commented Sep 17, 2019 •

edited

Loading

HeartSaVioR Sep 17, 2019 •

edited

Loading

HeartSaVioR commented Oct 23, 2019 •

edited

Loading