Skip to content

Conversation

nnethercote
Copy link
Contributor

This reduces memory usage on some benchmarks because no space is wasted
for padding. For a check-clean build of keccak it reduces max-rss
by 20%.

r? @nikomatsakis, but I want to do a perf run. Locally, I had these results:

  • instructions: slight regression
  • max-rss: big win on "Clean" builds
  • faults: big win on "Clean" and "Nll" builds
  • wall-time: small win on "Clean" and "Nll" builds

So I want to see how a different machine compares.

This reduces memory usage on some benchmarks because no space is wasted
for padding. For a `check-clean` build of `keccak` it reduces `max-rss`
by 20%.
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 14, 2018
@nnethercote
Copy link
Contributor Author

@bors try

@bors
Copy link
Collaborator

bors commented Sep 14, 2018

⌛ Trying commit efae70c with merge 7da277b...

bors added a commit that referenced this pull request Sep 14, 2018
Split `Liveness::users` into three.

This reduces memory usage on some benchmarks because no space is wasted
for padding. For a `check-clean` build of `keccak` it reduces `max-rss`
by 20%.

r? @nikomatsakis, but I want to do a perf run. Locally, I had these results:
- instructions: slight regression
- max-rss: big win on "Clean" builds
- faults: big win on "Clean" and "Nll" builds
- wall-time: small win on "Clean" and "Nll" builds

So I want to see how a different machine compares.
// separate `Vec`s so that no space is wasted for padding.
users_reader: Vec<LiveNode>,
users_writer: Vec<LiveNode>,
users_used: Vec<bool>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth also making this a bitset/bitvec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#54208 (comment) discusses this. It would reduce memory more but increase instruction counts. Vec<bool> might be the best middle ground; let's see how the perf results look with it.

@bors
Copy link
Collaborator

bors commented Sep 14, 2018

☀️ Test successful - status-travis
State: approved= try=True

@nnethercote
Copy link
Contributor Author

@rust-timer build 7da277b

@rust-timer
Copy link
Collaborator

Success: Queued 7da277b with parent 4f921d7, comparison URL.

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me if the perf results look good; note that we would like to rewrite this completely to operate on MIR. But that's a bigger job (@wesleywiser was interested, I think).

@nikomatsakis
Copy link
Contributor

Perf results look like a small perf hit (5-6%) but a big memory use hit (20% etc in some cases). Interesting.

@nnethercote
Copy link
Contributor Author

I think this is a rare case where instruction counts are misleading!

Note that keccak is the one mostly clearly affected, and inflate and clap-rs are also affected, and nothing else is. So only look at the results for those three benchmarks; the rest is noise. Also keccak-check is the only one that measures NLL.

With all that in mind, here are the results just for keccak, including check, debug and opt.

  • cpu-clock: 0--6% better, 0.7% better for nll-check
  • cycles: 0--4% better, 0.6% better for nll-check
  • faults: 4--18% better, 6.6% better for nll-check
  • instructions: 0--3% worse, 0.3% worse for nll-check
  • max-rss: 0--20% better, no change for nll-check
  • task-clock: 0--6% better, 0.7% better for nll-check
  • wall-time: 0--6% better, 0.7% better for nll-check

Instructions gets worse, but everything else gets better. And we have a simple theoretical explanation for this: less memory traffic. So I think we should land this, but I am happy to defer to @nikomatsakis's decision.

@wesleywiser
Copy link
Member

I think the other important bit to note is that it seems to be mostly the clean incremental builds which are showing regressions. A 5% regression on the clean incremental time is usually a very, very small amount of clock time since clean incremental builds are usually very fast.

@nnethercote
Copy link
Contributor Author

A 5% regression on the clean incremental time is usually a very, very small

At the risk of laboring the point: it's a win, not a regression on the time. It's only a regression on instruction counts.

@wesleywiser
Copy link
Member

wesleywiser commented Sep 15, 2018 via email

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Collaborator

bors commented Sep 18, 2018

📌 Commit efae70c has been approved by nikomatsakis

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 18, 2018
@nikomatsakis
Copy link
Contributor

Sorry, forgot I hadn't already written that. Thanks @nnethercote for the extra details.

@bors
Copy link
Collaborator

bors commented Sep 20, 2018

⌛ Testing commit efae70c with merge 1d33aed...

bors added a commit that referenced this pull request Sep 20, 2018
…akis

Split `Liveness::users` into three.

This reduces memory usage on some benchmarks because no space is wasted
for padding. For a `check-clean` build of `keccak` it reduces `max-rss`
by 20%.

r? @nikomatsakis, but I want to do a perf run. Locally, I had these results:
- instructions: slight regression
- max-rss: big win on "Clean" builds
- faults: big win on "Clean" and "Nll" builds
- wall-time: small win on "Clean" and "Nll" builds

So I want to see how a different machine compares.
@bors
Copy link
Collaborator

bors commented Sep 20, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing 1d33aed to master...

@bors bors merged commit efae70c into rust-lang:master Sep 20, 2018
@nnethercote nnethercote deleted the keccak-Liveness-memory branch September 20, 2018 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants