Skip to content

Conversation

Pazzaz
Copy link
Contributor

@Pazzaz Pazzaz commented Jun 16, 2018

The current impl Sum for Duration uses fold to perform several adds (or really checked_adds) of durations. In doing so, it has to guarantee the number of nanoseconds is valid after every addition. If you squeese the current implementation into a single function it looks kind of like this:

fn sum<I: Iterator<Item = Duration>>(iter: I) -> Duration {
    let mut sum = Duration::new(0, 0);
    for rhs in iter {
        if let Some(mut secs) = sum.secs.checked_add(rhs.secs) {
            let mut nanos = sum.nanos + rhs.nanos;
            if nanos >= NANOS_PER_SEC {
                nanos -= NANOS_PER_SEC;
                if let Some(new_secs) = secs.checked_add(1) {
                    secs = new_secs;
                } else {
                    panic!("overflow when adding durations");
                }
            }
            sum = Duration { secs, nanos }
        } else {
            panic!("overflow when adding durations");
        }
    }
    sum
}

We only need to check if nanos is in the correct range when giving our final answer so we can have a more optimized version like so:

fn sum<I: Iterator<Item = Duration>>(iter: I) -> Duration {
    let mut total_secs: u64 = 0;
    let mut total_nanos: u64 = 0;

    for entry in iter {
        total_secs = total_secs
            .checked_add(entry.secs)
            .expect("overflow in iter::sum over durations");
        total_nanos = match total_nanos.checked_add(entry.nanos as u64) {
            Some(n) => n,
            None => {
                total_secs = total_secs
                    .checked_add(total_nanos / NANOS_PER_SEC as u64)
                    .expect("overflow in iter::sum over durations");
                (total_nanos % NANOS_PER_SEC as u64) + entry.nanos as u64
            }
        };
    }
    total_secs = total_secs
        .checked_add(total_nanos / NANOS_PER_SEC as u64)
        .expect("overflow in iter::sum over durations");
    total_nanos = total_nanos % NANOS_PER_SEC as u64;
    Duration {
        secs: total_secs,
        nanos: total_nanos as u32,
    }
}

We now only convert total_nanos to total_secs (1) if total_nanos overflows and (2) at the end of the function when we have to output a valid Duration. This gave a 5-22% performance improvement when I benchmarked it, depending on how big the nano value of the Durations in iter were.

@pietroalbini
Copy link
Member

r? @sfackler (someone from the libs team)

@pietroalbini pietroalbini added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 16, 2018
@pietroalbini
Copy link
Member

Ping from triage @sfackler! This PR needs your review.

@sfackler
Copy link
Member

@bors r+

Sorry for the delay!

@bors
Copy link
Collaborator

bors commented Jun 27, 2018

📌 Commit d22ad76 has been approved by sfackler

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 27, 2018
@bors
Copy link
Collaborator

bors commented Jun 27, 2018

⌛ Testing commit d22ad76 with merge 612c280...

bors added a commit that referenced this pull request Jun 27, 2018
Optimize sum of Durations by using custom function

The current `impl Sum for Duration` uses `fold` to perform several `add`s (or really `checked_add`s) of durations. In doing so, it has to guarantee the number of nanoseconds is valid after every addition. If you squeese the current implementation into a single function it looks kind of like this:
````rust
fn sum<I: Iterator<Item = Duration>>(iter: I) -> Duration {
    let mut sum = Duration::new(0, 0);
    for rhs in iter {
        if let Some(mut secs) = sum.secs.checked_add(rhs.secs) {
            let mut nanos = sum.nanos + rhs.nanos;
            if nanos >= NANOS_PER_SEC {
                nanos -= NANOS_PER_SEC;
                if let Some(new_secs) = secs.checked_add(1) {
                    secs = new_secs;
                } else {
                    panic!("overflow when adding durations");
                }
            }
            sum = Duration { secs, nanos }
        } else {
            panic!("overflow when adding durations");
        }
    }
    sum
}
````
We only need to check if `nanos` is in the correct range when giving our final answer so we can have a more optimized version like so:
````rust
fn sum<I: Iterator<Item = Duration>>(iter: I) -> Duration {
    let mut total_secs: u64 = 0;
    let mut total_nanos: u64 = 0;

    for entry in iter {
        total_secs = total_secs
            .checked_add(entry.secs)
            .expect("overflow in iter::sum over durations");
        total_nanos = match total_nanos.checked_add(entry.nanos as u64) {
            Some(n) => n,
            None => {
                total_secs = total_secs
                    .checked_add(total_nanos / NANOS_PER_SEC as u64)
                    .expect("overflow in iter::sum over durations");
                (total_nanos % NANOS_PER_SEC as u64) + entry.nanos as u64
            }
        };
    }
    total_secs = total_secs
        .checked_add(total_nanos / NANOS_PER_SEC as u64)
        .expect("overflow in iter::sum over durations");
    total_nanos = total_nanos % NANOS_PER_SEC as u64;
    Duration {
        secs: total_secs,
        nanos: total_nanos as u32,
    }
}
````
We now only convert `total_nanos` to `total_secs` (1) if `total_nanos` overflows and (2) at the end of the function when we have to output a valid `Duration`. This gave a 5-22% performance improvement when I benchmarked it, depending on how big the `nano` value of the `Duration`s in `iter` were.
@bors
Copy link
Collaborator

bors commented Jun 27, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: sfackler
Pushing 612c280 to master...

@bors bors merged commit d22ad76 into rust-lang:master Jun 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants