forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 14
Branch 2.2 merge #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Branch 2.2 merge #201
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nnections ## What changes were proposed in this pull request? This patch changes the order in which _acceptConnections_ starts the client thread and schedules the client timeout action ensuring that the latter has been scheduled before the former get a chance to cancel it. ## How was this patch tested? Due to the non-deterministic nature of the patch I wasn't able to add a new test for this issue. Author: Andrea zito <[email protected]> Closes apache#19217 from nivox/SPARK-21991. (cherry picked from commit 6ea8a56) Signed-off-by: Marcelo Vanzin <[email protected]>
…files Prior to this commit getAllBlocks implicitly assumed that the directories managed by the DiskBlockManager contain only the files corresponding to valid block IDs. In reality, this assumption was violated during shuffle, which produces temporary files in the same directory as the resulting blocks. As a result, calls to getAllBlocks during shuffle were unreliable. The fix could be made more efficient, but this is probably good enough. `DiskBlockManagerSuite` Author: Sergei Lebedev <[email protected]> Closes apache#19458 from superbobry/block-id-option. (cherry picked from commit b377ef1) Signed-off-by: Wenchen Fan <[email protected]>
…se by test dataset not deterministic) ## What changes were proposed in this pull request? Fix NaiveBayes unit test occasionly fail: Set seed for `BrzMultinomial.sample`, make `generateNaiveBayesInput` output deterministic dataset. (If we do not set seed, the generated dataset will be random, and the model will be possible to exceed the tolerance in the test, which trigger this failure) ## How was this patch tested? Manually run tests multiple times and check each time output models contains the same values. Author: WeichenXu <[email protected]> Closes apache#19558 from WeichenXu123/fix_nb_test_seed. (cherry picked from commit 841f1d7) Signed-off-by: Joseph K. Bradley <[email protected]>
## What changes were proposed in this pull request? Fix java lint ## How was this patch tested? Run `./dev/lint-java` Author: Andrew Ash <[email protected]> Closes apache#19574 from ash211/aash/fix-java-lint. (cherry picked from commit 5433be4) Signed-off-by: Marcelo Vanzin <[email protected]>
## What changes were proposed in this pull request? This PR proposes to revive `stringsAsFactors` option in collect API, which was mistakenly removed in apache@71a138c. Simply, it casts `charactor` to `factor` if it meets the condition, `stringsAsFactors && is.character(vec)` in primitive type conversion. ## How was this patch tested? Unit test in `R/pkg/tests/fulltests/test_sparkSQL.R`. Author: hyukjinkwon <[email protected]> Closes apache#19551 from HyukjinKwon/SPARK-17902. (cherry picked from commit a83d8d5) Signed-off-by: hyukjinkwon <[email protected]>
…ass fields When the given closure uses some fields defined in super class, `ClosureCleaner` can't figure them and don't set it properly. Those fields will be in null values. Added test. Author: Liang-Chi Hsieh <[email protected]> Closes apache#19556 from viirya/SPARK-22328. (cherry picked from commit 4f8dc6b) Signed-off-by: Wenchen Fan <[email protected]>
It's possible that users create a `Dataset`, and call `collect` of this `Dataset` in many threads at the same time. Currently `Dataset#collect` just call `encoder.fromRow` to convert spark rows to objects of type T, and this encoder is per-dataset. This means `Dataset#collect` is not thread-safe, because the encoder uses a projection to output the object to a re-usable row. This PR fixes this problem, by creating a new projection when calling `Dataset#collect`, so that we have the re-usable row for each method call, instead of each Dataset. N/A Author: Wenchen Fan <[email protected]> Closes apache#19577 from cloud-fan/encoder. (cherry picked from commit 5c3a1f3) Signed-off-by: gatorsmile <[email protected]>
…s between data and partition schema This is a regression introduced by apache#14207. After Spark 2.1, we store the inferred schema when creating the table, to avoid inferring schema again at read path. However, there is one special case: overlapped columns between data and partition. For this case, it breaks the assumption of table schema that there is on ovelap between data and partition schema, and partition columns should be at the end. The result is, for Spark 2.1, the table scan has incorrect schema that puts partition columns at the end. For Spark 2.2, we add a check in CatalogTable to validate table schema, which fails at this case. To fix this issue, a simple and safe approach is to fallback to old behavior when overlapeed columns detected, i.e. store empty schema in metastore. new regression test Author: Wenchen Fan <[email protected]> Closes apache#19579 from cloud-fan/bug2.
…ginal column ## What changes were proposed in this pull request? This is a followup of apache#17075 , to fix the bug in codegen path. ## How was this patch tested? new regression test Author: Wenchen Fan <[email protected]> Closes apache#19576 from cloud-fan/bug. (cherry picked from commit 7fdacbc) Signed-off-by: gatorsmile <[email protected]>
This PR sets the java.io.tmpdir for CRAN checks and also disables the hsperfdata for the JVM when running CRAN checks. Together this prevents files from being left behind in `/tmp` ## How was this patch tested? Tested manually on a clean EC2 machine Author: Shivaram Venkataraman <[email protected]> Closes apache#19589 from shivaram/sparkr-tmpdir-clean. (cherry picked from commit 1fe2761) Signed-off-by: Shivaram Venkataraman <[email protected]>
…uuid, inet and cidr to StingType in PostgreSQL ## What changes were proposed in this pull request? This PR fixes the conversion error when transforming array types of `uuid`, `inet` and `cidr` to `StingType` in PostgreSQL. ## How was this patch tested? Added test in `PostgresIntegrationSuite`. Author: Jen-Ming Chung <[email protected]> Closes apache#19604 from jmchung/SPARK-22291-FOLLOWUP.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.