Skip to content

Conversation

@felixcheung
Copy link
Member

@felixcheung felixcheung commented Nov 6, 2016

What changes were proposed in this pull request?

Changes to DESCRIPTION to build vignettes.
Changes the metadata for vignettes to generate the recommended format (which is about <10% of size before). Unfortunately it does not look as nice
(before - left, after - right)

image

image

Also add information on how to run build/release to CRAN later.

How was this patch tested?

manually, unit tests

@shivaram

We need this for branch-2.1

@SparkQA
Copy link

SparkQA commented Nov 6, 2016

Test build #68249 has finished for PR 15790 at commit d51b04b.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Nov 7, 2016

This is great @felixcheung - Taking a closer look now

@felixcheung
Copy link
Member Author

felixcheung commented Nov 7, 2016

interesting. jenkins failed because it is building vignettes again right after it is built correctly, and the second time it didn't have the Spark jar SPARK_HOME isn't set (so trying to download it as per install.spark)

I'll try to track down where it is doing that.

  Use Petal_Width instead of Petal.Width  as column name
/home/jenkins/workspace/SparkPullRequestBuilder/R
* checking for file '/home/jenkins/workspace/SparkPullRequestBuilder/R/pkg/DESCRIPTION' ... OK
* preparing 'SparkR':
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... ERROR

Attaching package: 'SparkR'

The following objects are masked from 'package:stats':

    cov, filter, lag, na.omit, predict, sd, var, window

The following objects are masked from 'package:base':

    as.data.frame, colnames, colnames<-, drop, intersect, rank,
    rbind, sample, subset, summary, transform, union

Spark not found in SPARK_HOME: 
Spark not found in the cache directory. Installation will start.
MirrorUrl not provided.
Looking for preferred site from apache website...
Preferred mirror site found: http://mirror.nexcess.net/apache/spark
Downloading spark-2.1.0 for Hadoop 2.7 from:
- http://mirror.nexcess.net/apache/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz
trying URL 'http://mirror.nexcess.net/apache/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz'
Fetch failed from http://mirror.nexcess.net/apache/spark
To use backup site...
Downloading spark-2.1.0 for Hadoop 2.7 from:
- http://www-us.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz
trying URL 'http://www-us.apache.org/dist/spark/spark-2.1.0/spark-2.1.0-bin-hadoop2.7.tgz'
Fetch failed from http://www-us.apache.org/dist/spark
Quitting from lines 31-32 (sparkr-vignettes.Rmd) 
Error: processing vignette 'sparkr-vignettes.Rmd' failed with diagnostics:
Unable to download Spark spark-2.1.0 for Hadoop 2.7. Please check network connection, Hadoop version, or provide other mirror sites.
Execution halted

@felixcheung
Copy link
Member Author

felixcheung commented Nov 7, 2016

Basically it is building vignettes multiple times - once in create-doc.sh, once with R CMD build.

I'll skip the one in create-doc.sh to get it to create vignettes together with the source package, which I think is the more natural way. (R CMD build also install the package to build vignettes - so in addition to the one time in create-doc.sh we are doing this twice)

Also, I'll pass --no-vignettes with --no-manual to R CMD check (in check-cran.sh while running from run-tests.sh), otherwise it will once again try to build vignettes (this time from the source package) - so it could end up be doing that 3 times in total.

And it seems we have a new dependencies on qpdf - as a result there is a new WARNING

* checking for unstated dependencies in examples ... OK
 WARNING
‘qpdf’ is needed for checks on size reduction of PDFs
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK

Once Jenkins boxes have this we could remove this warning check.

@SparkQA
Copy link

SparkQA commented Nov 8, 2016

Test build #68307 has finished for PR 15790 at commit 4d6c919.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

felixcheung commented Nov 8, 2016

I looked, the qpdf warning is from a fairly strict check - I don't see a way around it other than getting qpdf on the system (it's a tool, not a R package).

@shivaram
Copy link
Contributor

shivaram commented Nov 8, 2016

Can you open a Spark JIRA for installing qpdf and cc @shaneknapp on it ? We can install it on the Jenkins machines.

@felixcheung
Copy link
Member Author

felixcheung commented Nov 8, 2016

@felixcheung felixcheung changed the title [SPARK-18264][SPARKR] update vignettes for CRAN release build and add info on release [SPARK-18264][SPARKR] build vignettes with package, update vignettes for CRAN release build and add info on release Nov 8, 2016
@shaneknapp
Copy link
Contributor

test this please

@shaneknapp
Copy link
Contributor

ok, qpdf is installed per the jira. no environment variables set up yet though.

@SparkQA
Copy link

SparkQA commented Nov 8, 2016

Test build #68348 has finished for PR 15790 at commit 4d6c919.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

R/run-tests.sh Outdated
# We have one more NOTE in Jenkins due to "No repository set"
if [[ $NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES -gt 3 ]]; then
# We have one warning on ‘qpdf’ is needed for checks on size reduction of PDFs
if [[ $NUM_CRAN_WARNING != 1 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES -gt 3 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this now. This was the cause for the latest test failure ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yap.

@SparkQA
Copy link

SparkQA commented Nov 8, 2016

Test build #68352 has finished for PR 15790 at commit 9f24e3f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

this is good to go in?

@felixcheung
Copy link
Member Author

ping @shivaram

@shivaram
Copy link
Contributor

shivaram commented Nov 9, 2016

Sorry I got caught up with some other stuff - Will take a look at this today.

Copy link
Contributor

@shivaram shivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @felixcheung - Change looks pretty good to me. I just had a couple of minor inline comments. I also want to try out the scripts on my machine - but I can do that after the merge as well.


### Release

First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale file not under source control.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - stale files

toc_depth: 4
toc_float: true
highlight: textmate
vignette: >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the theme was the one giving us the nice looking HTML . Is that not supported with rmarkdown::html_vignette ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, it seems to have a theme already and is complaining about theme already defined

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - I think thats fine then.

R/check-cran.sh Outdated
# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
# Build source package with vignettes
SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I ran into a problem while trying to run this locally. It seems to be getting SPARK_HOME wrong for some reason

./R/check-cran.sh: line 45: /Users/shivaram/spark-1/R/bin/load-spark-env.sh: No such file or directory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's odd, it works in my environment.
do you happen to have check-cran.sh under a subdirectory of R?

this is the code:

SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
. "${SPARK_HOME}"/bin/load-spark-env.sh

so for /opt/spark/R/check-cran.sh, it should look for /opt/spark/bin/load-spark-env.sh
and SPARK_HOME should be /opt/spark

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this needs to be at the top of the file - The current working directory is probably changed in the middle by create-docs.sh or the pushd $FWDIR ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what that command is doing is

  1. get the dirname of the currently running file, check-cran.sh (dirname "$0")
  2. cd to one level up (cd x/..)
  3. print current directory pwd
  4. set that to SPARK_HOME

so it shouldn't matter what the current directory is but only with where the check-cran.sh is found?

I'm running this on Ubuntu, it's possible if you are running on Mac the behavior is different somehow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the problem might be related to how you invoke it. If you invoke the script with an absolute path say /Users/shivaram/spark-1/R/check-cran.sh then the existing code works fine. If I however use a relative path -- for example ./R/check-cran.sh while I am in /Users/shivaram/spark-1 then it doesn't work.

I think moving it to the top of the file is probably a good idea because the $0 is relative to where the script started from and when we do pushd $FWDIR the working directory changes.

Copy link
Member Author

@felixcheung felixcheung Nov 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. I changed this up a bit, not sure if it helps but both works on my setup

@shivaram
Copy link
Contributor

@felixcheung I noticed one more thing - We are somehow not registering the vignette correctly with the R package. So for example if I launch ./bin/sparkR and then run vignette(package="dplyr")
I see a list of vignettes that I can then launch with vignette("introduction", package="dplyr"). However this doesn't seem to work with our vignette - I'm not sure what we need to do to get this to work though.

@felixcheung
Copy link
Member Author

felixcheung commented Nov 10, 2016

Tested that more, I think the vignettes works only with installed package

R CMD INSTALL SparkR_2.1.0.tar.gz

and then

library(SparkR)
vignette("sparkr-vignettes", package="SparkR")

Copy link
Contributor

@shivaram shivaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update and especially for figuring out the install step. I just had a few more minor comments.

R/run-tests.sh Outdated
else
# We have 2 existing NOTEs for new maintainer, attach()
# We have one more NOTE in Jenkins due to "No repository set"
# We have one warning on ‘qpdf’ is needed for checks on size reduction of PDFs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This comment can be removed now ?

toc_depth: 4
toc_float: true
highlight: textmate
vignette: >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - I think thats fine then.

R/create-docs.sh Outdated
if [ -d "$SPARK_JARS_DIR" ]; then
# render creates SparkR vignettes
Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'
# Find Spark jars.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is now duplicated in two files. Do you think we should just create a build-vignette.sh and call it in two places ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I thought about that. I think there's value in knitting html and vignettes in create-doc.sh, it is a bit duplicated to have vignettes in 2 places but

in create-doc.sh

...
render("pkg/vignettes/sparkr-vignettes.Rmd");

in check-cran.sh

  SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

in the latter it is building the full package along with vignettes, so the actual command and behavior isn't exactly the same.

Perhaps we should just take vignettes build out of create-doc.sh?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that seems fine to me. the create-doc.sh is used to generate docs for the Spark website etc. so its probably ok to not do the vignette as a part of that.

@SparkQA
Copy link

SparkQA commented Nov 10, 2016

Test build #68497 has finished for PR 15790 at commit 323609e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2016

Test build #68498 has finished for PR 15790 at commit 4d34bbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2016

Test build #68500 has finished for PR 15790 at commit 1681005.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

felixcheung commented Nov 11, 2016

I think install-dev might not build the vignettes and so it won't go into the release tgz

@shivaram
Copy link
Contributor

So one proposal I was thinking of is to just check in a built version of the vignette in to the source tree. That way the release packaging wouldn't need to change. The only thing to keep in mind is that whenever we update the vignette we will need to rebuild it. Thoughts ?

@felixcheung
Copy link
Member Author

Problem is the required and generated vignette.rds RDS file is a binary file?
I'm not sure about checking in binaries in git, that would show up in a source-only release?
Maybe create-distribution.sh should run the equivalent of R CMD install SparkR.tar.gz to generate the binaries necessary that would only go to a binary release?

How about we merge this PR first - I can test out the release mechanism more next week.

@shivaram
Copy link
Contributor

Sure - Sounds good. LGTM. Merging this to master and branch-2.1

asfgit pushed a commit that referenced this pull request Nov 11, 2016
…for CRAN release build and add info on release

## What changes were proposed in this pull request?

Changes to DESCRIPTION to build vignettes.
Changes the metadata for vignettes to generate the recommended format (which is about <10% of size before). Unfortunately it does not look as nice
(before - left, after - right)

![image](https://cloud.githubusercontent.com/assets/8969467/20040492/b75883e6-a40d-11e6-9534-25cdd5d59a8b.png)

![image](https://cloud.githubusercontent.com/assets/8969467/20040490/a40f4d42-a40d-11e6-8c91-af00ddcbdad9.png)

Also add information on how to run build/release to CRAN later.

## How was this patch tested?

manually, unit tests

shivaram

We need this for branch-2.1

Author: Felix Cheung <[email protected]>

Closes #15790 from felixcheung/rpkgvignettes.

(cherry picked from commit ba23f76)
Signed-off-by: Shivaram Venkataraman <[email protected]>
@asfgit asfgit closed this in ba23f76 Nov 11, 2016
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…for CRAN release build and add info on release

## What changes were proposed in this pull request?

Changes to DESCRIPTION to build vignettes.
Changes the metadata for vignettes to generate the recommended format (which is about <10% of size before). Unfortunately it does not look as nice
(before - left, after - right)

![image](https://cloud.githubusercontent.com/assets/8969467/20040492/b75883e6-a40d-11e6-9534-25cdd5d59a8b.png)

![image](https://cloud.githubusercontent.com/assets/8969467/20040490/a40f4d42-a40d-11e6-8c91-af00ddcbdad9.png)

Also add information on how to run build/release to CRAN later.

## How was this patch tested?

manually, unit tests

shivaram

We need this for branch-2.1

Author: Felix Cheung <[email protected]>

Closes apache#15790 from felixcheung/rpkgvignettes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants