Skip to content

Conversation

@pjfanning
Copy link
Member

https://issues.apache.org/jira/browse/HADOOP-18342

This is based on latest avro jar (1.11.1). The shaded avro jar also contains a shaded copy of jackson (based on version avro needs). This is jackson 2.12.7 which matches what hadoop uses but I think it might be safer not to rely on avro and hadoop needing the same version of jackson and jackson upgrades can prove problematic for downstream projects.

I have omitted these jars from the shaded jar:

  • slf4j-api
  • commons-compress

These 2 are more stable and might be tidier not to completely bloat the avro shaded jar.

@steveloughran
Copy link
Contributor

I'd prefer retaining jackson unshaded just to avoid the need to do another release of this every time there's a jackson CVE emergency

leaving out slf4j is good and flexible; though the dependencies of the shaded jar should declare that and compress as either provided or required.

now, the parquet lib includes the version. I'm not sure that is the right thing to do (as it complicates that update), but it would allow someone brave to have the different versions coexisting.

thoughts?

@pjfanning
Copy link
Member Author

Thanks @steveloughran - I'll look into changing the shaded avro jar and pom.xml as you suggest.

I'm not sure if I understand your point about parquet. Do you think parquet needs to be shaded as an extra hadoop-thirdparty jar? Or does hadoop use parquet-avro jar and if so, would we need to shade parquet-avro too?

@steveloughran
Copy link
Contributor

it's that because the parquet bit of the shaded jar has a version in it, upgrading is harder than just bumping up the build number, i need to change the module name etc too.

@steveloughran
Copy link
Contributor

no problem, great that you are working on this.

@pjfanning
Copy link
Member Author

pjfanning commented Sep 4, 2022

@steveloughran I have made some progress with the hadoop-avro jar. I still need to fully test it in a hadoop build though. I have noticed that hadoop also uses avro-maven-plugin to compile generated classes.

It does appear that we will need to shade this too because at the moment avro-maven-plugin generates classes that import org.apache.avro classes.

@pjfanning
Copy link
Member Author

@steveloughran I think it's easier to keep using the standard avro-maven-plugin and to postprocess the generated source to replace the org.apache.avro package names in the generated source. I have this approach working in a local hadoop build.

@pjfanning
Copy link
Member Author

@steveloughran could you review this? apache/hadoop#4854 uses a temp jar I published myself but can be adjusted when this hadoop-thirdparty PR is merged and released.

@steveloughran
Copy link
Contributor

lgtm; needs a newline at the end of the new pom.
+1 pending that

@steveloughran
Copy link
Contributor

do see #19 and discussion about whether artifacts need names. it may be good to use a version number here...

@pjfanning
Copy link
Member Author

do see #19 and discussion about whether artifacts need names. it may be good to use a version number here...

I can rename the module and jar to hadoop-shaded-avro_1_11 if that makes sense.

@steveloughran
Copy link
Contributor

let's do that. yes, upgrading is harder, but it's a lot more obvious what is in use.

(the other strategy would be to include it in the version itself, but that'd just confuse people)

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@steveloughran steveloughran merged commit 476dbc0 into apache:trunk Oct 23, 2022
@steveloughran
Copy link
Contributor

this is in; we need to get the protobuf update in and it'll be good for a release. though we may also want to update guava, which maybe we can do by adding a guava artifact with a version...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants