-
Notifications
You must be signed in to change notification settings - Fork 78
[ETCM-185] Closing db #757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| override def getBestBlock(): Block = | ||
| getBlockByNumber(getBestBlockNumber()).get | ||
| getBlockByNumber(getBestBlockNumber()).getOrElse(throw new RuntimeException("block not found")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change reverted
| case NonFatal(e) => | ||
| logger.error(s"Not found associated value to a namespace: $namespace and a key: $key, cause: {}", e.getMessage) | ||
| throw new RuntimeException(e) | ||
| throw e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
IMHO, |
| */ | ||
| override def close(): Unit = { | ||
| logger.debug(s"About to close DataSource in path: ${rocksDbConfig.path}") | ||
| logger.info(s"About to close DataSource in path: ${rocksDbConfig.path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| logger.info(s"DataSource closed successfully in the path: ${rocksDbConfig.path}") | ||
| } catch { | ||
| case NonFatal(e) => | ||
| logger.error("Not closed the DataSource properly, cause: {}", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least as a temporal solution i think is preferable to have some kind of domain modelling for DataSource runtime exceptions, instead of just log and throw an exception; In that case, we could just throw our custom exceptions and their will provide better context about the actual issues (+ will be avoid the need of logged them)
| private val logger = LoggerFactory.getLogger("rocks-db") | ||
|
|
||
| @volatile | ||
| private var isClosed = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about changing it to https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/AtomicReference.html
?
I have good experience with atomic refs. I remember using @volatile a very long time ago in Java and it was no reliable 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/AtomicBoolean.html :)
TBH I think that volatile is enough here since updates of this var don't involve a read (and that's the basic problem classes in java.util.concurrent.atomic solve - making reads and writes that depend on each other atomically)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we are now writing to this variable only after write lock is in locked state, this could probably be even a simple var without @volatile annotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we need this variable as @volatile.
In case only one thread reads and writes the value of a volatile variable and other threads only read the variable, then the reading threads are guaranteed to see the latest value written to the volatile variable.
I agree with @kapke, that volatile is enough here since updates of this var don't involve a read. Are we ok with this? Or do you think in another approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we only modify this variable from the inside of write lock and read only from inside read lock, which has strictly stronger guarantees than volatile variable. So the variable could be simple var and we would still have more synchronization than with volatile variable.
But to be honest we can leave it volatile, it is not big deal here. (I am against atomics here as having cas operations inside locks, would really be an overkill)
|
|
||
| class RocksDbDataSourceClosedException(val message: String) extends IllegalStateException(message) | ||
|
|
||
| private val logger = LoggerFactory.getLogger("rocks-db") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not extends Logger then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| private val logger = LoggerFactory.getLogger("rocks-db") | ||
|
|
||
| @volatile | ||
| private var isClosed = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we are now writing to this variable only after write lock is in locked state, this could probably be even a simple var without @volatile annotations
| override def close(): Unit = { | ||
| logger.debug(s"About to close DataSource in path: ${rocksDbConfig.path}") | ||
| log.info(s"About to close DataSource in path: ${rocksDbConfig.path}") | ||
| assureNotClosed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should not check it after locking ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case, it is the same. I changed it anyway, and put the same behavior that the other functions.
mirkoAlic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
We were using a volatile variable called "isClosed". When the node is shutdown, close the database, and the variable has the value "true". So usually when it receives a new request, this request validates the value of "isClosed" and throws the exception "This RocksDbDataSource has been closed".
I reproduced this error message when I used regular sync with network "mordor". And RocksDbDataSource receives at the same time a “close” request and “get” (or “update”) request for example.
Today, before validating the ReentrantReadWriteLock variable, we validated a volatile variable called "isClosed".
So, we could have a race condition problem. For example when calling the function "update" and "close" at the same time, both validate the variable "isClosed". But at this moment, the variable dbLock could be taken firstly for the function "close". This function closes the database and then unlocks the dbLock variable. Then the update function resumes but the database is closed.
Interrogant
We decided to change the behavior of variable ReentrantReadWriteLock “dbLock” from the “update” function. We began to use writeLock in place of readLock. Eventually if we detect a loss of performance for these changes, we recommend reverting this behavior and use readLock again.
Proposed Solution
I tried with different strategies in the coordinated-shutdown of akka, and got the same result. So I decided as a workaround to change the order of shutdown, and in the end close the database. In this way the exception in my tests disappeared.
In a future we could work in a strategy with supervisors and a strategy of coordinated-shutdown from actors system