Skip to content

Conversation

@ankurdave
Copy link
Contributor

RoutingTableMessage was used to construct routing tables to enable
joining VertexRDDs with partitioned edges. It stored three elements: the
destination vertex ID, the source edge partition, and a byte specifying
the position in which the edge partition referenced the vertex to enable
join elimination.

However, this was incompatible with sort-based shuffle (SPARK-2045). It
was also slightly wasteful, because partition IDs are usually much
smaller than 2^32, though this was mitigated by a custom serializer that
used variable-length encoding.

This commit replaces RoutingTableMessage with a pair of (VertexId, Int)
where the Int encodes both the source partition ID (in the lower 30
bits) and the position (in the top 2 bits).

RoutingTableMessage was used to construct routing tables to enable
joining VertexRDDs with partitioned edges. It stored three elements: the
destination vertex ID, the source edge partition, and a byte specifying
the position in which the edge partition referenced the vertex to enable
join elimination.

However, this was incompatible with sort-based shuffle (SPARK-2045). It
was also slightly wasteful, because partition IDs are usually much
smaller than 2^32, though this was mitigated by a custom serializer that
used variable-length encoding.

This commit replaces RoutingTableMessage with a pair of (VertexId, Int)
where the Int encodes both the source partition ID (in the lower 30
bits) and the position (in the top 2 bits).
@ankurdave
Copy link
Contributor Author

@mateiz

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1553. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17059/consoleFull

@mateiz
Copy link
Contributor

mateiz commented Jul 23, 2014

@rxin you should take a look too

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1553:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17059/consoleFull

@rxin
Copy link
Contributor

rxin commented Jul 23, 2014

Would be great to have some measurement of the performance impact.

@ankurdave
Copy link
Contributor Author

Informal benchmark: I constructed a graph locally to compare the communication cost of building the routing table before and after. On a graph with 13M edges, this PR reduces the shuffle size slightly (2.3 MB to 2.04 MB). The shuffle time wasn't measurably different, but for a larger graph it should also go down slightly because of less memory allocation and a simpler serializer (fixed-size int instead of variable-length).

@rxin
Copy link
Contributor

rxin commented Jul 24, 2014

Merging this in master. Thanks.

@asfgit asfgit closed this in 2d25e34 Jul 24, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
RoutingTableMessage was used to construct routing tables to enable
joining VertexRDDs with partitioned edges. It stored three elements: the
destination vertex ID, the source edge partition, and a byte specifying
the position in which the edge partition referenced the vertex to enable
join elimination.

However, this was incompatible with sort-based shuffle (SPARK-2045). It
was also slightly wasteful, because partition IDs are usually much
smaller than 2^32, though this was mitigated by a custom serializer that
used variable-length encoding.

This commit replaces RoutingTableMessage with a pair of (VertexId, Int)
where the Int encodes both the source partition ID (in the lower 30
bits) and the position (in the top 2 bits).

Author: Ankur Dave <[email protected]>

Closes apache#1553 from ankurdave/remove-RoutingTableMessage and squashes the following commits:

697e17b [Ankur Dave] Replace RoutingTableMessage with pair
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants