Skip to content

Conversation

@markhamstra
Copy link

No description provided.

Andrea zito and others added 12 commits October 25, 2017 10:10
…nnections

## What changes were proposed in this pull request?
This patch changes the order in which _acceptConnections_ starts the client thread and schedules the client timeout action ensuring that the latter has been scheduled before the former get a chance to cancel it.

## How was this patch tested?
Due to the non-deterministic nature of the patch I wasn't able to add a new test for this issue.

Author: Andrea zito <[email protected]>

Closes apache#19217 from nivox/SPARK-21991.

(cherry picked from commit 6ea8a56)
Signed-off-by: Marcelo Vanzin <[email protected]>
…files

Prior to this commit getAllBlocks implicitly assumed that the directories
managed by the DiskBlockManager contain only the files corresponding to
valid block IDs. In reality, this assumption was violated during shuffle,
which produces temporary files in the same directory as the resulting
blocks. As a result, calls to getAllBlocks during shuffle were unreliable.

The fix could be made more efficient, but this is probably good enough.

`DiskBlockManagerSuite`

Author: Sergei Lebedev <[email protected]>

Closes apache#19458 from superbobry/block-id-option.

(cherry picked from commit b377ef1)
Signed-off-by: Wenchen Fan <[email protected]>
…se by test dataset not deterministic)

## What changes were proposed in this pull request?

Fix NaiveBayes unit test occasionly fail:
Set seed for `BrzMultinomial.sample`, make `generateNaiveBayesInput` output deterministic dataset.
(If we do not set seed, the generated dataset will be random, and the model will be possible to exceed the tolerance in the test, which trigger this failure)

## How was this patch tested?

Manually run tests multiple times and check each time output models contains the same values.

Author: WeichenXu <[email protected]>

Closes apache#19558 from WeichenXu123/fix_nb_test_seed.

(cherry picked from commit 841f1d7)
Signed-off-by: Joseph K. Bradley <[email protected]>
## What changes were proposed in this pull request?

Fix java lint

## How was this patch tested?

Run `./dev/lint-java`

Author: Andrew Ash <[email protected]>

Closes apache#19574 from ash211/aash/fix-java-lint.

(cherry picked from commit 5433be4)
Signed-off-by: Marcelo Vanzin <[email protected]>
## What changes were proposed in this pull request?

This PR proposes to revive `stringsAsFactors` option in collect API, which was mistakenly removed in apache@71a138c.

Simply, it casts `charactor` to `factor` if it meets the condition, `stringsAsFactors && is.character(vec)` in primitive type conversion.

## How was this patch tested?

Unit test in `R/pkg/tests/fulltests/test_sparkSQL.R`.

Author: hyukjinkwon <[email protected]>

Closes apache#19551 from HyukjinKwon/SPARK-17902.

(cherry picked from commit a83d8d5)
Signed-off-by: hyukjinkwon <[email protected]>
…ass fields

When the given closure uses some fields defined in super class, `ClosureCleaner` can't figure them and don't set it properly. Those fields will be in null values.

Added test.

Author: Liang-Chi Hsieh <[email protected]>

Closes apache#19556 from viirya/SPARK-22328.

(cherry picked from commit 4f8dc6b)
Signed-off-by: Wenchen Fan <[email protected]>
It's possible that users create a `Dataset`, and call `collect` of this `Dataset` in many threads at the same time. Currently `Dataset#collect` just call `encoder.fromRow` to convert spark rows to objects of type T, and this encoder is per-dataset. This means `Dataset#collect` is not thread-safe, because the encoder uses a projection to output the object to a re-usable row.

This PR fixes this problem, by creating a new projection when calling `Dataset#collect`, so that we have the re-usable row for each method call, instead of each Dataset.

N/A

Author: Wenchen Fan <[email protected]>

Closes apache#19577 from cloud-fan/encoder.

(cherry picked from commit 5c3a1f3)
Signed-off-by: gatorsmile <[email protected]>
…s between data and partition schema

This is a regression introduced by apache#14207. After Spark 2.1, we store the inferred schema when creating the table, to avoid inferring schema again at read path. However, there is one special case: overlapped columns between data and partition. For this case, it breaks the assumption of table schema that there is on ovelap between data and partition schema, and partition columns should be at the end. The result is, for Spark 2.1, the table scan has incorrect schema that puts partition columns at the end. For Spark 2.2, we add a check in CatalogTable to validate table schema, which fails at this case.

To fix this issue, a simple and safe approach is to fallback to old behavior when overlapeed columns detected, i.e. store empty schema in metastore.

new regression test

Author: Wenchen Fan <[email protected]>

Closes apache#19579 from cloud-fan/bug2.
…ginal column

## What changes were proposed in this pull request?

This is a followup of apache#17075 , to fix the bug in codegen path.

## How was this patch tested?

new regression test

Author: Wenchen Fan <[email protected]>

Closes apache#19576 from cloud-fan/bug.

(cherry picked from commit 7fdacbc)
Signed-off-by: gatorsmile <[email protected]>
This PR sets the java.io.tmpdir for CRAN checks and also disables the hsperfdata for the JVM when running CRAN checks. Together this prevents files from being left behind in `/tmp`

## How was this patch tested?
Tested manually on a clean EC2 machine

Author: Shivaram Venkataraman <[email protected]>

Closes apache#19589 from shivaram/sparkr-tmpdir-clean.

(cherry picked from commit 1fe2761)
Signed-off-by: Shivaram Venkataraman <[email protected]>
…uuid, inet and cidr to StingType in PostgreSQL

## What changes were proposed in this pull request?

This PR fixes the conversion error when transforming array types of `uuid`, `inet` and `cidr` to `StingType` in PostgreSQL.

## How was this patch tested?

Added test in `PostgresIntegrationSuite`.

Author: Jen-Ming Chung <[email protected]>

Closes apache#19604 from jmchung/SPARK-22291-FOLLOWUP.
@markhamstra markhamstra merged commit 3f9d207 into alteryx:csd-2.2 Oct 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants