Skip to content

Commit 410fa91

Browse files
Michael ChiricoHyukjinKwon
authored andcommitted
[SPARK-31578][R] Vectorize schema validation for arrow in types.R
### What changes were proposed in this pull request? Repeated `sapply` avoided in internal `checkSchemaInArrow` ### Why are the changes needed? Current implementation is doubly inefficient: 1. Repeatedly doing the same (95%) `sapply` loop 2. Doing scalar `==` on a vector (`==` should be done over the whole vector for efficiency) ### Does this PR introduce any user-facing change? No ### How was this patch tested? By my trusty friend the CI bots Closes #28372 from MichaelChirico/vectorize-types. Authored-by: Michael Chirico <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
1 parent a68d98c commit 410fa91

File tree

1 file changed

+6
-11
lines changed

1 file changed

+6
-11
lines changed

R/pkg/R/types.R

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -94,27 +94,22 @@ checkSchemaInArrow <- function(schema) {
9494
}
9595

9696
# Both cases below produce a corrupt value for unknown reason. It needs to be investigated.
97-
if (any(sapply(schema$fields(), function(x) x$dataType.toString() == "FloatType"))) {
97+
field_strings <- sapply(schema$fields(), function(x) x$dataType.toString())
98+
if (any(field_strings == "FloatType")) {
9899
stop("Arrow optimization in R does not support float type yet.")
99100
}
100-
if (any(sapply(schema$fields(), function(x) x$dataType.toString() == "BinaryType"))) {
101+
if (any(field_strings == "BinaryType")) {
101102
stop("Arrow optimization in R does not support binary type yet.")
102103
}
103-
if (any(sapply(schema$fields(),
104-
function(x) startsWith(x$dataType.toString(),
105-
"ArrayType")))) {
104+
if (any(startsWith(field_strings, "ArrayType"))) {
106105
stop("Arrow optimization in R does not support array type yet.")
107106
}
108107

109108
# Arrow optimization in Spark does not yet support both cases below.
110-
if (any(sapply(schema$fields(),
111-
function(x) startsWith(x$dataType.toString(),
112-
"StructType")))) {
109+
if (any(startsWith(field_strings, "StructType"))) {
113110
stop("Arrow optimization in R does not support nested struct type yet.")
114111
}
115-
if (any(sapply(schema$fields(),
116-
function(x) startsWith(x$dataType.toString(),
117-
"MapType")))) {
112+
if (any(startsWith(field_strings, "MapType"))) {
118113
stop("Arrow optimization in R does not support map type yet.")
119114
}
120115
}

0 commit comments

Comments
 (0)