[ETCM-185] Closing db #757

biandratti · 2020-10-23T21:13:34Z

Description

We were using a volatile variable called "isClosed". When the node is shutdown, close the database, and the variable has the value "true". So usually when it receives a new request, this request validates the value of "isClosed" and throws the exception "This RocksDbDataSource has been closed".
I reproduced this error message when I used regular sync with network "mordor". And RocksDbDataSource receives at the same time a “close” request and “get” (or “update”) request for example.

Today, before validating the ReentrantReadWriteLock variable, we validated a volatile variable called "isClosed".
So, we could have a race condition problem. For example when calling the function "update" and "close" at the same time, both validate the variable "isClosed". But at this moment, the variable dbLock could be taken firstly for the function "close". This function closes the database and then unlocks the dbLock variable. Then the update function resumes but the database is closed.

Interrogant

We decided to change the behavior of variable ReentrantReadWriteLock “dbLock” from the “update” function. We began to use writeLock in place of readLock. Eventually if we detect a loss of performance for these changes, we recommend reverting this behavior and use readLock again.

Proposed Solution

I changed the order to avoid race conditions.
When the system applies shutdown, finishes the actor system and then closes the database. Usually when finalizing the actors some requests that these started remain pending without finishing. So we can see the exception “RocksDbDataSourceClosedException.
I tried with different strategies in the coordinated-shutdown of akka, and got the same result. So I decided as a workaround to change the order of shutdown, and in the end close the database. In this way the exception in my tests disappeared.

In a future we could work in a strategy with supervisors and a strategy of coordinated-shutdown from actors system

…-185-db-closed

mirkoAlic · 2020-10-28T14:39:25Z

src/main/scala/io/iohk/ethereum/domain/Blockchain.scala


  override def getBestBlock(): Block =
-    getBlockByNumber(getBestBlockNumber()).get
+    getBlockByNumber(getBestBlockNumber()).getOrElse(throw new RuntimeException("block not found"))


change reverted

src/main/scala/io/iohk/ethereum/db/dataSource/RocksDbDataSource.scala

lemastero · 2020-10-28T15:41:36Z

src/main/scala/io/iohk/ethereum/db/dataSource/RocksDbDataSource.scala

      case NonFatal(e) =>
        logger.error(s"Not found associated value to a namespace: $namespace and a key: $key, cause: {}", e.getMessage)
-        throw new RuntimeException(e)
+        throw e


mirkoAlic · 2020-10-28T15:42:40Z

Interrogant
update function, need dbLock.readLock() or dbLock.writeLock() ?

IMHO, dbLock.writeLock is the "proper" way, but tbh, I'm not sure if is not going to raise some kind of bottleneck. I suggest to apply the change, but also add a good comment about the reasoning behind it. Eventually if we detect an issue we could rollback the change. WDYT?

mirkoAlic · 2020-10-28T15:43:14Z

src/main/scala/io/iohk/ethereum/db/dataSource/RocksDbDataSource.scala

    */
  override def close(): Unit = {
-    logger.debug(s"About to close DataSource in path: ${rocksDbConfig.path}")
+    logger.info(s"About to close DataSource in path: ${rocksDbConfig.path}")


mirkoAlic · 2020-10-28T15:46:06Z

src/main/scala/io/iohk/ethereum/db/dataSource/RocksDbDataSource.scala

+      logger.info(s"DataSource closed successfully in the path: ${rocksDbConfig.path}")
    } catch {
      case NonFatal(e) =>
        logger.error("Not closed the DataSource properly, cause: {}", e)


At least as a temporal solution i think is preferable to have some kind of domain modelling for DataSource runtime exceptions, instead of just log and throw an exception; In that case, we could just throw our custom exceptions and their will provide better context about the actual issues (+ will be avoid the need of logged them)

lemastero · 2020-10-28T15:47:13Z

src/main/scala/io/iohk/ethereum/db/dataSource/RocksDbDataSource.scala

-  private val logger = LoggerFactory.getLogger("rocks-db")
-
  @volatile
  private var isClosed = false


WDYT about changing it to https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/AtomicReference.html
?

I have good experience with atomic refs. I remember using @volatile a very long time ago in Java and it was no reliable 🤔

Or even https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/AtomicBoolean.html :)

TBH I think that volatile is enough here since updates of this var don't involve a read (and that's the basic problem classes in java.util.concurrent.atomic solve - making reads and writes that depend on each other atomically)

as we are now writing to this variable only after write lock is in locked state, this could probably be even a simple var without @volatile annotations

I think that we need this variable as @volatile.
In case only one thread reads and writes the value of a volatile variable and other threads only read the variable, then the reading threads are guaranteed to see the latest value written to the volatile variable.
I agree with @kapke, that volatile is enough here since updates of this var don't involve a read. Are we ok with this? Or do you think in another approach?

but we only modify this variable from the inside of write lock and read only from inside read lock, which has strictly stronger guarantees than volatile variable. So the variable could be simple var and we would still have more synchronization than with volatile variable.

But to be honest we can leave it volatile, it is not big deal here. (I am against atomics here as having cas operations inside locks, would really be an overkill)

kapke · 2020-10-29T10:23:34Z

src/main/scala/io/iohk/ethereum/db/dataSource/RocksDbDataSource.scala


+  class RocksDbDataSourceClosedException(val message: String) extends IllegalStateException(message)
+
+  private val logger = LoggerFactory.getLogger("rocks-db")


Why not extends Logger then?

KonradStaniec · 2020-10-29T14:45:29Z

src/main/scala/io/iohk/ethereum/db/dataSource/RocksDbDataSource.scala

-  private val logger = LoggerFactory.getLogger("rocks-db")
-
  @volatile
  private var isClosed = false


as we are now writing to this variable only after write lock is in locked state, this could probably be even a simple var without @volatile annotations

KonradStaniec · 2020-10-29T14:54:55Z

src/main/scala/io/iohk/ethereum/db/dataSource/RocksDbDataSource.scala

  override def close(): Unit = {
-    logger.debug(s"About to close DataSource in path: ${rocksDbConfig.path}")
+    log.info(s"About to close DataSource in path: ${rocksDbConfig.path}")
    assureNotClosed()


I wonder if we should not check it after locking ?

For this case, it is the same. I changed it anyway, and put the same behavior that the other functions.

mirkoAlic

LGTM!

…-185-db-closed

biandratti added 2 commits October 23, 2020 17:23

change order of validation, lock and variable volatile

4e97439

Merge branch 'develop' of github.com:input-output-hk/mantis into ETCM…

bd5b29d

…-185-db-closed

biandratti added the bug Something isn't working label Oct 23, 2020

add a new exception from database closed

681fb46

biandratti changed the title ~~[ETCM-185] Closing twice rocks db~~ [ETCM-185] Closing db Oct 25, 2020

add log when datasource is closed

5657444

biandratti requested review from KonradStaniec, lemastero and ntallar October 26, 2020 11:45

biandratti added 2 commits October 27, 2020 09:29

order strategy of shutdown

dd5641e

merge with develop

6c4b228

mirkoAlic reviewed Oct 28, 2020

View reviewed changes

lemastero reviewed Oct 28, 2020

View reviewed changes

src/main/scala/io/iohk/ethereum/db/dataSource/RocksDbDataSource.scala Show resolved Hide resolved

lemastero reviewed Oct 28, 2020

View reviewed changes

mirkoAlic reviewed Oct 28, 2020

View reviewed changes

lemastero reviewed Oct 28, 2020

View reviewed changes

kapke reviewed Oct 29, 2020

View reviewed changes

add new exception from RocksDbDataSource

8e27755

KonradStaniec reviewed Oct 29, 2020

View reviewed changes

biandratti added 2 commits October 29, 2020 12:48

update same behavior with close db validation

f97b779

scalafmt...

5d9f839

mirkoAlic approved these changes Oct 30, 2020

View reviewed changes

KonradStaniec approved these changes Nov 2, 2020

View reviewed changes

Merge branch 'develop' of github.com:input-output-hk/mantis into ETCM…

2782dc7

…-185-db-closed

biandratti merged commit e279a5d into develop Nov 2, 2020

mirkoAlic deleted the ETCM-185-db-closed branch November 2, 2020 18:45


		class RocksDbDataSourceClosedException(val message: String) extends IllegalStateException(message)

		private val logger = LoggerFactory.getLogger("rocks-db")

[ETCM-185] Closing db #757

[ETCM-185] Closing db #757

Uh oh!

Conversation

biandratti commented Oct 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Interrogant

Proposed Solution

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mirkoAlic commented Oct 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mirkoAlic Oct 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kapke Oct 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KonradStaniec Oct 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mirkoAlic left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

biandratti commented Oct 23, 2020 •

edited

Loading

mirkoAlic commented Oct 28, 2020 •

edited

Loading

mirkoAlic Oct 28, 2020 •

edited

Loading

kapke Oct 29, 2020 •

edited

Loading

KonradStaniec Oct 29, 2020 •

edited

Loading