Skip to content

Conversation

@hotou
Copy link
Contributor

@hotou hotou commented Feb 20, 2015

Fix a overflow bug in JdbcRDD when calculating partitions for large BIGINT ids

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this would be more robust if we just increment by a fixed width every iteration?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigDecimal is overkill and feels funny to be introducing floating point. i and length are int already so using longs here is sufficient. I really think this is solve by simply changing i to i.toLong and removing the superfluous .toLong at the end then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen No that won't work, the problem here is this
length is long I think since it's diff of 2 longs, but (i * length) can overflow long. In my test case for example

567279357766147899L * 20 = -7101156918386593636

So we need a type that is can represent larger numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin I actually favor the current partition algo, it's pretty neat in a way, for example

lowerBound = 1
upperBound = 100
numPartition = 8

With fix length increase, you get
[1,13],[14,26],[27,39],[40,52],[53,65],[66,78],[79,91],[92,100]
In which you always end up with one small partition at the end

With the current algo you get
[1,12],[13,25],[26,37],[38,50],[51,62],[63,75],[76,87],[88,100]
You get more evenly distributed partitions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right I see that since the bounds are used for ID-like values, it's not unrealistic to think that length is near 2^63, and it's a long. Hm, does this overflow anyway in the final partition? if upperBound is near 2^63 then this can compute an end that doesn't fit back into a long even if the intermediate result does. PS why not BigInteger?

While we're fixing it though, I tend to think that @rxin is right and that this can just map to fixed intervals rather than perform this computation.

BTW since the range is inclusive, I feel like the last partition should subtract 1 from its end? and this should never be larger than upperBound? there might be a few reasons to revise this logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does although the difference is +/-1 for all partitions but the last. Yea I get your point about the final partition but isn't it possible that its end is >= upperBound? I haven't thought it through, I admit. A version with BigInteger here is definitely an improvement so I'd support that. I suppose I'm still concerned this fails in exactly the corner case you're trying to fix -- very large length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Sorry I don't understand how last partition can get >= upperBound

With this codes

val length = 1 + upperBound - lowerBound
(0 until numPartitions).map(i => {
val start = lowerBound + ((BigInt(i) * length) / numPartitions).toLong
val end = lowerBound + ((BigInt(i + 1) * length) / numPartitions).toLong - 1
new JdbcPartition(i, start, end)
})

the last iteration on this:
i = numPartitions - 1

So end = lowerBound + (numPartitions - 1 + 1) * (1 + upperBound - lowerBound) / numPartitions - 1
= lowerBound + numPartitions/ numPartitions * (1 + upperBound - lowerBound) - 1
= lowerBound 1 + upperBound - lowerBound - 1
= upperBound

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's sound, good explanation. This is also why it can't overflow when toLong is called. I think this is the right fix with BigInt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a problem here where
length = 1 + upperBound - lowerBound
This can over flow as well, if lowerBound = 0 and upperBound = Long.MAX_VALUE
I should fix this, actually make the change smaller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these changes accidental?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to add a new test for this bug, is this not the right place to put it ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but here it looks like you just indented existing code or something. The new test below looks right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. that's because I need to create a new table, and I put that and the old table creation in one single
try {
...
} finally {
conn.close()
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except can't we keep this catch block in common in the outer try block? I realize it would mean changing some of the variable names in the second block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a problem when I tried to to that. The original writer uses the inner catch block to prevent re-creating the table.

catch {
case e: SQLException if e.getSQLState == "X0Y32" =>
// table exists
}

Which means it has to exist for each table to be created. I was simply following that pattern.

An alternative would be is to drop and re-create the table each time, which produce cleaner codes, but may slow down the test suite a little bit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right I get it, never mind. LGTM. I can merge in a bit if there are no further comments.

@asfgit asfgit closed this in 7683982 Feb 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants