Skip to content

Conversation

@davidnavas
Copy link

As measured locally (amazon's VM performance may vary), allocating a 12 million entry BitSet and then clearing a million times takes one minute. When you have a skewed query that has one entry that matches very well (say, 12 million entries against customer 0) and then another 6 million with one entry each, the outer join code can wind up taking several minutes clearing bits that aren't strictly required.

This doesn't seem like it's the entire cause of slowness for @yeweizhang 's query (6 minutes isn't the whole measured slowness, and I don't think cpu cache pollution will make up the balance), but it seems worth fixing.

@davidnavas
Copy link
Author

How should I run tests/release?

@markhamstra
Copy link

I'll take a look at the tests -- essentially you want to run them locally and ignore a couple of known, benign test failures.

@markhamstra markhamstra merged commit e69d5c5 into alteryx:csd-1.6 Aug 15, 2016
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 7, 2017
…yx#173)

* Support configuring SSL using PEM files.

* Address some missed comments

* Fix import ordering

* Slight rewording of comments

* Fix scalastyle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants