SPARK-1518: FileLogger: Fix compile against Hadoop trunk #898

cmccabe · 2014-05-28T00:29:44Z

In Hadoop trunk (currently Hadoop 3.0.0), the deprecated
FSDataOutputStream#sync() method has been removed. Instead, we should
call FSDataOutputStream#hflush, which does the same thing as the
deprecated method used to do.

AmplabJenkins · 2014-05-28T00:32:58Z

Merged build triggered.

AmplabJenkins · 2014-05-28T00:33:03Z

Merged build started.

AmplabJenkins · 2014-05-28T00:35:25Z

Merged build finished.

AmplabJenkins · 2014-05-28T00:35:25Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15241/

rxin · 2014-05-28T04:42:50Z

Colin it appears this method does not exist in older version of Hadoop. I wonder if we need to put this into a shim ...

pwendell · 2014-05-28T05:04:20Z

@cmccabe - ah I thought you said this was added in 0.21... Our default build compiles against Hadoop 1.0.4... isn't 1.0.4 newer?

cmccabe · 2014-05-28T05:51:21Z

hflush was in hadoop 0.21. You can download http://archive.apache.org/dist/hadoop/core/hadoop-0.21.0/hadoop-0.21.0.tar.gz and check for yourself in common/src/java/org/apache/hadoop/fs/FSDataOutputStream.java.

I also verified that hadoop 1.0.4 does not have hflush (although, amusingly enough, it does have references to hflush in the code and documentation... from patches that were cherry-picked from other branches, presumably.) Instead, it has an implementation of hflush (I think?) inside the sync function.

Looking at the "Hadoop genealogy" reveals how this could have happened: http://2.bp.blogspot.com/-GO6HF0OAFHw/UOfNEH-4sEI/AAAAAAAAAD0/dEWFFYTRgYw/s1600/output-file.png

It looks like what happened was that the hadoop 0.20 line kind of diverged from the hadoop 0.21 line. The 1.0.4 release somehow came out of the 0.20 line, while the 0.21 line mutated into hadoop 2.x at some point. This was all before my time... even CDH3 had hflush, which is the oldest version of Hadoop I ever worked on.

Sounds like we're back to reflection tricks, then.

pwendell · 2014-05-28T07:02:47Z

Yeah so I'm guessing @andrewor14 didn't use flush because it wasn't there (which is consistent with the docs). If you are feeling adventurous, I think we could write a Scala macro to do this reflection at compile time. Regular reflection should work as well. I think you'd just want to check if hflush is present and it not call sync.

pwendell · 2014-05-28T07:06:29Z

By the way, your chart has me thinking, we need to document the Spark version genealogy:

0.2 -> 0.3 -> 0.4 -> 0.5 -> 0.6 -> 0.7 -> 0.8 -> 0.9 -> 1.0

:P

ash211 · 2014-05-28T07:11:43Z

Ha! On an actually-useful note, it'd be nice to have somewhere that lists
Spark versions and the dates they were released. Such information doesn't
exist on spark.apache.org does it?

On Wed, May 28, 2014 at 12:06 AM, Patrick Wendell
[email protected]:

By the way, your chart has me thinking, we need to document the Spark
version genealogy:

0.2 -> 0.3 -> 0.4 -> 0.5 -> 0.6 -> 0.7 -> 0.8 -> 0.9 -> 1.0

:P

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/898#issuecomment-44372409
.

rxin · 2014-05-28T07:12:37Z

They do exists on github: https://github.com/apache/spark/releases

rxin · 2014-05-28T07:12:50Z

But definitely a good idea to make them more visible.

AmplabJenkins · 2014-05-28T07:22:58Z

Merged build triggered.

AmplabJenkins · 2014-05-28T07:23:05Z

Merged build started.

ash211 · 2014-05-28T07:23:06Z

Wait nevermind, they're listed here: https://spark.apache.org/downloads.html

On Wed, May 28, 2014 at 12:12 AM, Reynold Xin [email protected]:

But definitely a good idea to make them more visible.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/898#issuecomment-44372815
.

cmccabe · 2014-05-28T07:26:25Z

The chart was made by Konstantin Boudnik, I just linked to it. I like the Spark version genealogy more-- it's a little easier to understand. :)

Here's a version that uses regular reflection.

aarondav · 2014-05-28T07:38:48Z

core/src/main/scala/org/apache/spark/util/FileLogger.scala

It appears that [getMethod()](http://docs.oracle.com/javase/7/docs/api/java/lang/Class.html#getMethod%28java.lang.String, java.lang.Class...%29) throws NoSuchMethodException rather than returning null.

By the way, the "Scala" way to do this may just be

Try(cls.getMethod("hflush")).getOrElse(cls.getMethod("sync"))

AmplabJenkins · 2014-05-28T08:00:28Z

Merged build finished.

AmplabJenkins · 2014-05-28T08:00:28Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15248/

AmplabJenkins · 2014-05-28T19:07:58Z

Merged build triggered.

AmplabJenkins · 2014-05-28T19:08:05Z

Merged build started.

AmplabJenkins · 2014-05-28T19:10:23Z

Merged build finished.

AmplabJenkins · 2014-05-28T19:10:23Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15260/

AmplabJenkins · 2014-05-28T19:32:58Z

Merged build triggered.

AmplabJenkins · 2014-05-28T19:33:07Z

Merged build started.

AmplabJenkins · 2014-05-28T19:35:25Z

Merged build finished.

AmplabJenkins · 2014-05-28T19:35:25Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15262/

AmplabJenkins · 2014-05-28T20:22:58Z

Merged build triggered.

AmplabJenkins · 2014-05-28T20:23:08Z

Merged build started.

pwendell · 2014-05-28T20:29:43Z

core/src/main/scala/org/apache/spark/util/FileLogger.scala

Mind adding See SPARK-1518 here? This might be a little hard to grok for someone not familiar with the nuances of Hadoop API's

AmplabJenkins · 2014-05-28T21:54:02Z

Merged build finished.

AmplabJenkins · 2014-05-28T21:54:02Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15263/

In Hadoop trunk (currently Hadoop 3.0.0), the deprecated FSDataOutputStream#sync() method has been removed. Instead, the FSDataOutputStream#hflush method fills the same role. We should call hflush if it is available. This patch uses reflection to maintain support for old versions of Hadoop that do not have hflush, but which do have the deprecated sync method.

AmplabJenkins · 2014-05-28T23:17:58Z

Merged build triggered.

AmplabJenkins · 2014-05-28T23:18:04Z

Merged build started.

AmplabJenkins · 2014-05-28T23:53:54Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-28T23:53:54Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15268/

pwendell · 2014-06-04T22:34:37Z

LGTM - thanks for this colin!

In Hadoop trunk (currently Hadoop 3.0.0), the deprecated FSDataOutputStream#sync() method has been removed. Instead, we should call FSDataOutputStream#hflush, which does the same thing as the deprecated method used to do. Author: Colin McCabe <[email protected]> Closes #898 from cmccabe/SPARK-1518 and squashes the following commits: 752b9d7 [Colin McCabe] FileLogger: Fix compile against Hadoop trunk (cherry picked from commit 1765c8d) Signed-off-by: Patrick Wendell <[email protected]>

In Hadoop trunk (currently Hadoop 3.0.0), the deprecated FSDataOutputStream#sync() method has been removed. Instead, we should call FSDataOutputStream#hflush, which does the same thing as the deprecated method used to do. Author: Colin McCabe <[email protected]> Closes apache#898 from cmccabe/SPARK-1518 and squashes the following commits: 752b9d7 [Colin McCabe] FileLogger: Fix compile against Hadoop trunk

…OOM (#898) * Fix Driver OOM * Fix * Fix * Fix (#899) * Update DynamicDataPruningSuite.scala * Update DynamicDataPruningSuite.scala

aarondav reviewed May 28, 2014
View reviewed changes

cmccabe changed the title ~~FileLogger: Fix compile against Hadoop trunk~~ SPARK-1518: FileLogger: Fix compile against Hadoop trunk May 28, 2014

pwendell reviewed May 28, 2014
View reviewed changes

asfgit closed this in 1765c8d Jun 4, 2014

wangyum added a commit that referenced this pull request May 26, 2023

[CARMEL-5902] Take dynamicPruningMaxInsetNum of rows to avoid Driver …

0825e22

…OOM (#898) * Fix Driver OOM * Fix * Fix * Fix (#899) * Update DynamicDataPruningSuite.scala * Update DynamicDataPruningSuite.scala

SPARK-1518: FileLogger: Fix compile against Hadoop trunk #898

SPARK-1518: FileLogger: Fix compile against Hadoop trunk #898

Uh oh!

Conversation

cmccabe commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

rxin commented May 28, 2014

Uh oh!

pwendell commented May 28, 2014

Uh oh!

cmccabe commented May 28, 2014

Uh oh!

pwendell commented May 28, 2014

Uh oh!

pwendell commented May 28, 2014

Uh oh!

ash211 commented May 28, 2014

Uh oh!

rxin commented May 28, 2014

Uh oh!

rxin commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

ash211 commented May 28, 2014

Uh oh!

cmccabe commented May 28, 2014

Uh oh!

aarondav May 28, 2014

Choose a reason for hiding this comment

Uh oh!

aarondav May 28, 2014

Choose a reason for hiding this comment

Uh oh!

cmccabe May 28, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

pwendell May 28, 2014

Choose a reason for hiding this comment

Uh oh!

cmccabe May 28, 2014

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented May 28, 2014

Uh oh!

AmplabJenkins commented May 28, 2014