Skip to content

Conversation

@MichaelChirico
Copy link
Contributor

What changes were proposed in this pull request?

Repeated sapply avoided in internal checkSchemaInArrow

Why are the changes needed?

Current implementation is doubly inefficient:

  1. Repeatedly doing the same (95%) sapply loop
  2. Doing scalar == on a vector (== should be done over the whole vector for efficiency)

Does this PR introduce any user-facing change?

No

How was this patch tested?

By my trusty friend the CI bots

@MichaelChirico MichaelChirico changed the title [R] vectorize schema validation for arrow in types.R [SPARK-31578][R] vectorize schema validation for arrow in types.R Apr 27, 2020
@SparkQA
Copy link

SparkQA commented Apr 27, 2020

Test build #121913 has finished for PR 28372 at commit 0fc8513.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 27, 2020

Test build #121915 has finished for PR 28372 at commit 8004d22.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 27, 2020

Test build #121918 has finished for PR 28372 at commit 03b7898.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-31578][R] vectorize schema validation for arrow in types.R [SPARK-31578][R] Vectorize schema validation for arrow in types.R Apr 28, 2020
@HyukjinKwon
Copy link
Member

I am going to merge. The tests are already passed, and I don't believe removing the return causes any test or build to break.

@MichaelChirico
Copy link
Contributor Author

linter failure looks transient to me:

E: The repository 'https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ Release' does not have a Release file.

HyukjinKwon pushed a commit that referenced this pull request Apr 28, 2020
### What changes were proposed in this pull request?

Repeated `sapply` avoided in internal `checkSchemaInArrow`

### Why are the changes needed?

Current implementation is doubly inefficient:

 1. Repeatedly doing the same (95%) `sapply` loop
 2. Doing scalar `==` on a vector (`==` should be done over the whole vector for efficiency)

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

By my trusty friend the CI bots

Closes #28372 from MichaelChirico/vectorize-types.

Authored-by: Michael Chirico <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
(cherry picked from commit 410fa91)
Signed-off-by: HyukjinKwon <[email protected]>
@SparkQA
Copy link

SparkQA commented Apr 28, 2020

Test build #121937 has finished for PR 28372 at commit 0c2718b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants