Skip to content

Conversation

@jkbradley
Copy link
Member

What changes were proposed in this pull request?

The default stopwords were a Java object. They are no longer.

How was this patch tested?

Unit test which failed before the fix

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55941 has finished for PR 12422 at commit e35d482.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor

sethah commented Apr 15, 2016

@jkbradley It is unclear to me if this was a bug before or if it was by design. I think @holdenk mentioned that there is the benefit of leaving stopwords as a JavaObject so that if the user never accesses them, they don't need to be transferred from Java to a Python list. I don't think the benefit is high since stopwords is short, for now.

Aside from that, it LGTM.

@holdenk
Copy link
Contributor

holdenk commented Apr 15, 2016

The original design was to allow us to not have to transfer it if the user was using the default stop words, but overall this change looks good me (I'm not sure I'd call it a bug but that a minor point). That being said the stopword list is small enough this optimization isn't important and this simplifies the work going on in #11939 so LGTM.

@jkbradley
Copy link
Member Author

OK thanks for checking this
Merging with master and branch-1.6

@asfgit asfgit closed this in d6ae7d4 Apr 15, 2016
asfgit pushed a commit that referenced this pull request Apr 15, 2016
…pwords

The default stopwords were a Java object.  They are no longer.

Unit test which failed before the fix

Author: Joseph K. Bradley <[email protected]>

Closes #12422 from jkbradley/pyspark-stopwords.

(cherry picked from commit d6ae7d4)
Signed-off-by: Joseph K. Bradley <[email protected]>

Conflicts:
	python/pyspark/ml/feature.py
	python/pyspark/ml/tests.py
@jkbradley jkbradley deleted the pyspark-stopwords branch April 15, 2016 18:59
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 19, 2016
…pwords

The default stopwords were a Java object.  They are no longer.

Unit test which failed before the fix

Author: Joseph K. Bradley <[email protected]>

Closes apache#12422 from jkbradley/pyspark-stopwords.

(cherry picked from commit d6ae7d4)
Signed-off-by: Joseph K. Bradley <[email protected]>

Conflicts:
	python/pyspark/ml/feature.py
	python/pyspark/ml/tests.py

(cherry picked from commit 58dfba6)
lw-lin pushed a commit to lw-lin/spark that referenced this pull request Apr 20, 2016
…pwords

## What changes were proposed in this pull request?

The default stopwords were a Java object.  They are no longer.

## How was this patch tested?

Unit test which failed before the fix

Author: Joseph K. Bradley <[email protected]>

Closes apache#12422 from jkbradley/pyspark-stopwords.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants