Skip to content

Conversation

@szczeles
Copy link

@szczeles szczeles commented Aug 8, 2016

What changes were proposed in this pull request?

Ability to use KafkaUtils.createDirectStream with starting offsets in python 3 by using java.lang.Number instead of Long during param mapping in scala helper. This allows py4j to pass Integer or Long to the map and resolves ClassCastException problems.

How was this patch tested?

unit tests

@jerryshao - could you please look at this PR?

@MLnick
Copy link
Contributor

MLnick commented Aug 8, 2016

ok to test

@SparkQA
Copy link

SparkQA commented Aug 8, 2016

Test build #63363 has finished for PR 14540 at commit d2a4027.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Aug 8, 2016

Also cc @tdas who might be interested

@holdenk
Copy link
Contributor

holdenk commented Aug 8, 2016

I wonder if we should also look at the Python MLlib API because we use java longs for things like seeds.

@jerryshao
Copy link
Contributor

jerryshao commented Aug 9, 2016

Looks good to me~~, we should also change the unit tests accordingly. Currently several related tests are excluded from python3 test~~.

@szczeles
Copy link
Author

szczeles commented Aug 9, 2016

@holdenk I've checked setSeed methods in MLlib and it seems py4j handles them well. If function gets simple arguments (strings, numerric, bool), py4j applies conversion between types (see https://github.com/bartdag/py4j/blob/master/py4j-java/src/main/java/py4j/reflection/MethodInvoker.java#L99).

For setSeed(Long), if argument is mapped to Integer, it goes through toString and Long.parseLong (see https://github.com/bartdag/py4j/blob/master/py4j-java/src/main/java/py4j/reflection/TypeConverter.java#L88)

Apparently, this conversion does not work for complex types like fromOffsets map.

@holdenk
Copy link
Contributor

holdenk commented Aug 9, 2016

Ah interesting, we might want to report the bug upstream with Py4J - but this change looks good to me :) Thanks for getting this working in Python 3 :) cc @davies who can maybe take a look as well?

@davies
Copy link
Contributor

davies commented Aug 9, 2016

LGTM, merging this into master and 2.0 branch, thanks!

asfgit pushed a commit that referenced this pull request Aug 9, 2016
…reateDirectStream for python3

## What changes were proposed in this pull request?

Ability to use KafkaUtils.createDirectStream with starting offsets in python 3 by using java.lang.Number instead of Long during param mapping in scala helper. This allows py4j to pass Integer or Long to the map and resolves ClassCastException problems.

## How was this patch tested?

unit tests

jerryshao  - could you please look at this PR?

Author: Mariusz Strzelecki <[email protected]>

Closes #14540 from szczeles/kafka_pyspark.

(cherry picked from commit 29081b5)
Signed-off-by: Davies Liu <[email protected]>
@asfgit asfgit closed this in 29081b5 Aug 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants