-
Notifications
You must be signed in to change notification settings - Fork 28.9k
SPARK-1518: FileLogger: Fix compile against Hadoop trunk #898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15241/ |
|
Colin it appears this method does not exist in older version of Hadoop. I wonder if we need to put this into a shim ... |
|
@cmccabe - ah I thought you said this was added in 0.21... Our default build compiles against Hadoop 1.0.4... isn't 1.0.4 newer? |
|
hflush was in hadoop 0.21. You can download http://archive.apache.org/dist/hadoop/core/hadoop-0.21.0/hadoop-0.21.0.tar.gz and check for yourself in common/src/java/org/apache/hadoop/fs/FSDataOutputStream.java. I also verified that hadoop 1.0.4 does not have hflush (although, amusingly enough, it does have references to hflush in the code and documentation... from patches that were cherry-picked from other branches, presumably.) Instead, it has an implementation of hflush (I think?) inside the sync function. Looking at the "Hadoop genealogy" reveals how this could have happened: http://2.bp.blogspot.com/-GO6HF0OAFHw/UOfNEH-4sEI/AAAAAAAAAD0/dEWFFYTRgYw/s1600/output-file.png It looks like what happened was that the hadoop 0.20 line kind of diverged from the hadoop 0.21 line. The 1.0.4 release somehow came out of the 0.20 line, while the 0.21 line mutated into hadoop 2.x at some point. This was all before my time... even CDH3 had hflush, which is the oldest version of Hadoop I ever worked on. Sounds like we're back to reflection tricks, then. |
|
Yeah so I'm guessing @andrewor14 didn't use flush because it wasn't there (which is consistent with the docs). If you are feeling adventurous, I think we could write a Scala macro to do this reflection at compile time. Regular reflection should work as well. I think you'd just want to check if hflush is present and it not call sync. |
|
By the way, your chart has me thinking, we need to document the Spark version genealogy: :P |
|
Ha! On an actually-useful note, it'd be nice to have somewhere that lists On Wed, May 28, 2014 at 12:06 AM, Patrick Wendell
|
|
They do exists on github: https://github.com/apache/spark/releases |
|
But definitely a good idea to make them more visible. |
|
Merged build triggered. |
|
Merged build started. |
|
Wait nevermind, they're listed here: https://spark.apache.org/downloads.html On Wed, May 28, 2014 at 12:12 AM, Reynold Xin [email protected]:
|
|
The chart was made by Konstantin Boudnik, I just linked to it. I like the Spark version genealogy more-- it's a little easier to understand. :) Here's a version that uses regular reflection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that [getMethod()](http://docs.oracle.com/javase/7/docs/api/java/lang/Class.html#getMethod%28java.lang.String, java.lang.Class...%29) throws NoSuchMethodException rather than returning null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, the "Scala" way to do this may just be
Try(cls.getMethod("hflush")).getOrElse(cls.getMethod("sync"))There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15248/ |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15260/ |
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15262/ |
|
Merged build triggered. |
|
Merged build started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind adding See SPARK-1518 here? This might be a little hard to grok for someone not familiar with the nuances of Hadoop API's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
Merged build finished. |
|
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15263/ |
In Hadoop trunk (currently Hadoop 3.0.0), the deprecated FSDataOutputStream#sync() method has been removed. Instead, the FSDataOutputStream#hflush method fills the same role. We should call hflush if it is available. This patch uses reflection to maintain support for old versions of Hadoop that do not have hflush, but which do have the deprecated sync method.
|
Merged build triggered. |
|
Merged build started. |
|
Merged build finished. All automated tests passed. |
|
All automated tests passed. |
|
LGTM - thanks for this colin! |
In Hadoop trunk (currently Hadoop 3.0.0), the deprecated FSDataOutputStream#sync() method has been removed. Instead, we should call FSDataOutputStream#hflush, which does the same thing as the deprecated method used to do. Author: Colin McCabe <[email protected]> Closes #898 from cmccabe/SPARK-1518 and squashes the following commits: 752b9d7 [Colin McCabe] FileLogger: Fix compile against Hadoop trunk (cherry picked from commit 1765c8d) Signed-off-by: Patrick Wendell <[email protected]>
In Hadoop trunk (currently Hadoop 3.0.0), the deprecated FSDataOutputStream#sync() method has been removed. Instead, we should call FSDataOutputStream#hflush, which does the same thing as the deprecated method used to do. Author: Colin McCabe <[email protected]> Closes apache#898 from cmccabe/SPARK-1518 and squashes the following commits: 752b9d7 [Colin McCabe] FileLogger: Fix compile against Hadoop trunk
In Hadoop trunk (currently Hadoop 3.0.0), the deprecated FSDataOutputStream#sync() method has been removed. Instead, we should call FSDataOutputStream#hflush, which does the same thing as the deprecated method used to do. Author: Colin McCabe <[email protected]> Closes apache#898 from cmccabe/SPARK-1518 and squashes the following commits: 752b9d7 [Colin McCabe] FileLogger: Fix compile against Hadoop trunk
In Hadoop trunk (currently Hadoop 3.0.0), the deprecated
FSDataOutputStream#sync() method has been removed. Instead, we should
call FSDataOutputStream#hflush, which does the same thing as the
deprecated method used to do.