- 
                Notifications
    You must be signed in to change notification settings 
- Fork 9.1k
HDFS-16535. SlotReleaser should reuse the domain socket based on socket paths #4158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS-16535. SlotReleaser should reuse the domain socket based on socket paths #4158
Conversation
| 💔 -1 overall 
 
 This message was automatically generated. | 
| @leosunli is this something you'd be interested in reviewing? | 
| @jojochuang thank for your reminder, I will review this PR. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @stiga-huang for catching it. I add two simple comments.
| } else { | ||
| shm.getEndpointShmManager().shutdown(shm); | ||
| IOUtilsClient.cleanupWithLogger(LOG, domainSocket, out); | ||
| domainSocket = null; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you remove the line 243? If remove it, the line 228 looks redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you remove the line 243?
domainSocket is now a local variable (defined at line 192). We don't need to update it at the end of this method. Note that this finally clause belongs to the try-clause starts at line 195.
If remove it, the line 228 looks redundant.
For line 228, it's needed in the next retry of the while-loop. The check at line 199 will catch it and trigger a re-connect.
| } | ||
| } | ||
|  | ||
| // Regression test for HDFS-16473 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the connection between this UT and HDFS-16473.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I gave a wrong JIRA. Thanks for catching this! It should be HDFS-16535.
| Thank @leosunli! I updated the comments. | 
| 🎊 +1 overall 
 
 This message was automatically generated. | 
| LGFM +1. | 
…et paths (#4158) Reviewed-by: Lisheng Sun <[email protected]> (cherry picked from commit 35d4c02)
…et paths (apache#4158) Reviewed-by: Lisheng Sun <[email protected]>
…ased on socket paths (apache#4158) Reviewed-by: Lisheng Sun <[email protected]> (cherry picked from commit 35d4c02) Change-Id: Ie4609bf36ada0156af684ace80222f699dad41d1
Description of PR
HDFS-13639 (#1885) improves the performance of short-circuit shm slot releasing by reusing the domain socket that the client previously used to send release request to the DataNode.
This is good when there are only one DataNode locates with the client (truth in most of the production environment). However, if we launch multiple DataNodes on a machine (usually for testing, e.g. Impala's end-to-end tests), the request could be sent to the wrong DataNode. See an example in IMPALA-11234.
We should only reuse the domain socket when it corresponds to the same socket path. This change introduces a map in ShortCircuitCache to track the mapping from socket path to the domain socket. SlotReleaser will pick the correct domain socket (if exists) based on the socket path.
How was this patch tested?
I deploy the fix in my Impala minicluster with multiple DataNodes launched on my machine. Verified that the error logs of slot release failures disappeared.
Also add a regression test: TestShortCircuitCache#testDomainSocketClosedByMultipleDNs().
For code changes: