Skip to content

Conversation

@adeandrade
Copy link

TaskAttemptContext's constructor will clone the configuration instead of referencing it. Calling setConf after creating TaskAttemptContext makes any changes to the configuration made inside setConf unperceived by RecordReader instances.

As an example, Titan's InputFormat will change conf when calling setConf. They wrap their InputFormat around Cassandra's ColumnFamilyInputFormat, and append Cassandra's configuration. This change fixes the following error when using Titan's CassandraInputFormat with Spark:

java.lang.RuntimeException: org.apache.thrift.protocol.TProtocolException: Required field 'keyspace' was not present! Struct: set_key space_args(keyspace:null)

There's a discussion of this error here: https://groups.google.com/forum/#!topic/aureliusgraphs/4zpwyrYbGAE

@andrewor14
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Dec 1, 2015

Test build #46908 has finished for PR 10046 at commit 186dd2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Dec 3, 2015

LGTM.

@vanzin
Copy link
Contributor

vanzin commented Dec 3, 2015

Actually... what's actually changing here? conf is not modified between the old location of the code and the new location. The only place it's being used is in the call to configurable.setConf(conf); even ignoring the fact that only a single InputFormat in Hadoop's sources implements Configurable (DBInputFormat), I don't expect a method called setConf to modify its input argument.

Have you seen any actual issue that is fixed by moving this code around?

@adeandrade
Copy link
Author

Titan's InputFormat for example will change conf when calling setConf. They wrap their InputFormat around Cassandra's ColumnFamilyInputFormat, and append Cassandra's configuration based on their own.

This may be the best way to deal with nested InputFormat classes, unless Hadoop changes the interface.

@vanzin
Copy link
Contributor

vanzin commented Dec 3, 2015

Can you put that in the PR description and the bug itself? From the change, it's not immediately clear what's being fixed. Also, changing the conf in setConf sounds really sketchy, and in a similar vein, setConf is free to do other things that this change would not address (e.g. clone the config and change the clone).

I've seen input format classes that did sketchy things like this, and my suggestion to them was "use the configuration from the JobContext / TaskAttemptContext; that's how the API was meant to be used".

@adeandrade
Copy link
Author

Done.

I agree it is sketchy. If setConf were to change the configuration it should return a new copy of it.

Titan should probably redefine the configuration in createRecordReader instead.

@vanzin
Copy link
Contributor

vanzin commented Dec 4, 2015

Ok, merging to master / 1.6.

asfgit pushed a commit that referenced this pull request Dec 4, 2015
…tConf.

TaskAttemptContext's constructor will clone the configuration instead of referencing it. Calling setConf after creating TaskAttemptContext makes any changes to the configuration made inside setConf unperceived by RecordReader instances.

As an example, Titan's InputFormat will change conf when calling setConf. They wrap their InputFormat around Cassandra's ColumnFamilyInputFormat, and append Cassandra's configuration. This change fixes the following error when using Titan's CassandraInputFormat with Spark:

*java.lang.RuntimeException: org.apache.thrift.protocol.TProtocolException: Required field 'keyspace' was not present! Struct: set_key space_args(keyspace:null)*

There's a discussion of this error here: https://groups.google.com/forum/#!topic/aureliusgraphs/4zpwyrYbGAE

Author: Anderson de Andrade <[email protected]>

Closes #10046 from adeandrade/newhadooprdd-fix.

(cherry picked from commit f434f36)
Signed-off-by: Marcelo Vanzin <[email protected]>
@asfgit asfgit closed this in f434f36 Dec 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants