Skip to content

Conversation

@arielb1
Copy link
Contributor

@arielb1 arielb1 commented Apr 20, 2017

Fix up some quite important performance "surprises" I've found running callgrind on rustc.

This really should land in 1.18.

@arielb1 arielb1 changed the title Performance audit, Spring 2017 [WIP] Performance audit, Spring 2017 Apr 20, 2017
@rust-highfive
Copy link
Contributor

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@KalitaAlexey
Copy link
Contributor

@arielb1,
Are you going to collect data to show improvement?

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 20, 2017

@KalitaAlexey

I'm doing measurements locally, but after this is done it'll show up on perf.rust-lang.org (or the mirror https://perf-rlo.herokuapp.com).

Ariel Ben-Yehuda added 2 commits April 20, 2017 15:14
This improves LLVM performance by 10% lost during the shimmir transition.
this improves typeck & trans performance by 1%. This looked hotter on
callgrind than it is on a CPU.
@arielb1 arielb1 force-pushed the rustc-spring-cleaning branch from ce14cf8 to c8fe505 Compare April 20, 2017 12:14
@michaelwoerister
Copy link
Member

this avoids parsing item attributes on each call to item_attrs, which takes
off 33% (!) of translation time and 50% (!) of trans-item collection time.

WOW! :)

@arielb1 arielb1 force-pushed the rustc-spring-cleaning branch from c8fe505 to 71d3270 Compare April 20, 2017 15:54
(self.tcx().mk_region(ty::ReStatic),
self.tcx().mk_region(ty::ReStatic))
(self.tcx().types.re_static,
self.tcx().types.re_static)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that we don't have these constants pre-interned. This is going to conflict like hell with one of my in-progress branches, but oh well. =)

Copy link
Contributor

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me on the stuff so far

value, amount);

value.fold_with(&mut RegionFolder::new(tcx, &mut false, &mut |region, _current_depth| {
tcx.mk_region(shift_region(*region, amount))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow. By-value shift_region may have been used by elision or something. Can we just kill it?


pub fn field<C: LayoutTyper<'tcx>>(&self, cx: C, i: usize) -> C::TyLayout {
cx.layout_of(self.field_type(cx, i))
cx.layout_of(cx.normalize_associated_type(self.field_type(cx, i)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to do it here, when layout_of does it at the start?

Copy link
Contributor Author

@arielb1 arielb1 Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it doesn't. types in trans are always normalized.

return TyLayout { ty: ty, layout: layout, variant_index: None };
}

self.tcx().infer_ctxt((), traits::Reveal::All).enter(|infcx| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is creating the infer_ctxt costly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

@eddyb eddyb Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you (trans-)normalize before checking the cache above? Would that solve the problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to get rid of the normalize_associated_type method - can't field_of rely on layout_of normalizing before checking the cache? if !ty.has_projection_types() { is really fast, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except when there are projection types (nested binders) etc. This method is hot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is has_projection_types anything other than a flag check? I'm not sure I understand what's going on. Can the cache be hit with the unnormalized type if has_projection_types returns true?

ref item => bug!("trait_impl_polarity: {:?} not an impl", item)
}
} else {
self.sess.cstore.impl_polarity(id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the CrateStore method? They keep piling up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this commit does what it's supposed to.

}
}

impl<T> Rc<[T]> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, for the purposes of the compiler, does Rc<[T]> provide a measurable improvement over Rc<Vec<T>>?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may also be more easily implementable by consuming Vec<T> as you've got to copy data anyway. With Vec<T> the box_free also doesn't need to be exposed as you can just .set_len(0) to drop all the elements.

Copy link
Contributor Author

@arielb1 arielb1 Apr 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't bother checking. But Rc<Vec<T>> is too ugly to me.

@arielb1 arielb1 force-pushed the rustc-spring-cleaning branch from 82092ca to c357feb Compare April 20, 2017 19:15
@arielb1 arielb1 changed the title [WIP] Performance audit, Spring 2017 Performance audit, Spring 2017 Apr 20, 2017
@arielb1
Copy link
Contributor Author

arielb1 commented Apr 20, 2017

I think this is enough for one PR.

That method is *incredibly* hot, so this ends up saving 10% of trans
time.

BTW, we really should be doing dependency tracking there - and possibly be
taking the respective perf hit (got to find a way to make DTMs fast), but
`layout_cache` is a non-dep-tracking map.
@arielb1 arielb1 force-pushed the rustc-spring-cleaning branch from c357feb to f964da5 Compare April 20, 2017 19:56
@eddyb
Copy link
Member

eddyb commented Apr 20, 2017

@bors r=nikomatsakis,eddyb

@bors
Copy link
Collaborator

bors commented Apr 20, 2017

📌 Commit f964da5 has been approved by nikomatsakis,eddyb

frewsxcv added a commit to frewsxcv/rust that referenced this pull request Apr 20, 2017
…nikomatsakis,eddyb

Performance audit, Spring 2017

Fix up some quite important performance "surprises" I've found running callgrind on rustc.

This really should land in 1.18.
@eddyb
Copy link
Member

eddyb commented Apr 20, 2017

@bors r- Build failed.

unsafe fn box_free<T: ?Sized>(ptr: *mut T) {
pub(crate) unsafe fn box_free<T: ?Sized>(ptr: *mut T) {
let size = size_of_val(&*ptr);
let align = min_align_of_val(&*ptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions are not imported during testing (so this fails to compile then).

Ariel Ben-Yehuda added 2 commits April 21, 2017 02:13
this avoids parsing item attributes on each call to `item_attrs`, which takes
off 33% (!) of translation time and 50% (!) of trans-item collection time.
improves trans performance by *another* 10%.
Ariel Ben-Yehuda added 2 commits April 21, 2017 02:13
this improves trans performance by *another* 10%.
this is another one of these things that looks *much* worse on valgrind.
@arielb1 arielb1 force-pushed the rustc-spring-cleaning branch from f964da5 to dae49f1 Compare April 20, 2017 23:14
@arielb1
Copy link
Contributor Author

arielb1 commented Apr 20, 2017

@bors r=eddyb

@bors
Copy link
Collaborator

bors commented Apr 20, 2017

📌 Commit dae49f1 has been approved by eddyb

TimNN added a commit to TimNN/rust that referenced this pull request Apr 21, 2017
…eddyb

Performance audit, Spring 2017

Fix up some quite important performance "surprises" I've found running callgrind on rustc.

This really should land in 1.18.
@shepmaster shepmaster added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Apr 21, 2017
@arielb1 arielb1 force-pushed the rustc-spring-cleaning branch from 9cab5fd to 461bee5 Compare April 22, 2017 12:10
@arielb1
Copy link
Contributor Author

arielb1 commented Apr 22, 2017

So now I solved the "specialization caching" problem for real.

In some cases (e.g. <[int-var] as Add<[int-var]>>), selection can turn up
a large number of candidates. Bailing out early avoids O(n^2) performance.

This improves item-type checking time by quite a bit, resulting in ~2% of total
time-to-typeck.
@arielb1 arielb1 force-pushed the rustc-spring-cleaning branch from 461bee5 to 1b207ca Compare April 22, 2017 12:15
@eddyb
Copy link
Member

eddyb commented Apr 22, 2017

@bors r=nikomatsakis,eddyb

@bors
Copy link
Collaborator

bors commented Apr 22, 2017

📌 Commit 1b207ca has been approved by nikomatsakis,eddyb

@bors
Copy link
Collaborator

bors commented Apr 22, 2017

🔒 Merge conflict

@bors
Copy link
Collaborator

bors commented Apr 22, 2017

☔ The latest upstream changes (presumably #41464) made this pull request unmergeable. Please resolve the merge conflicts.

@arielb1
Copy link
Contributor Author

arielb1 commented Apr 22, 2017

Moving the branch over to arielb1/rust.

@arielb1 arielb1 closed this Apr 22, 2017
@arielb1 arielb1 deleted the rustc-spring-cleaning branch April 22, 2017 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants