Skip to content

Conversation

@T-Gro
Copy link
Member

@T-Gro T-Gro commented Sep 7, 2023

Fixing my own mistake from the past.

When working on nullness PR, there is a situation where a 2nd optimization data resource can exist if compiling with optimize:+ (cc @vzarytovskii )
Depending on the contents, adding a hash of the two optimization resources can overflow an int32. Via a regular plus addition, it would just overflow and it would be ok for a hash result - so far so good.

However, the List.sumBy function is doing a Checked addition and therefore throw at runtime with an integeroverflow. I/We should be definitely more careful about the usage of .sum / .sumBy in the compiler codebase.

This was really strange to investigate, as only some projects have been failing - depending on the numeric value of their optimization data hashes.

@T-Gro T-Gro marked this pull request as ready for review September 7, 2023 11:54
@T-Gro T-Gro requested a review from a team as a code owner September 7, 2023 11:54
@T-Gro T-Gro merged commit 4dd0341 into main Sep 7, 2023
@T-Gro T-Gro deleted the prevent-optDataHash-crash branch September 7, 2023 14:46
@smoothdeveloper
Copy link
Contributor

wondering if it would make sense to have List.sumByUnchecked or simply doing the sum in the code?

Of course, int64 has far less chance to hit the same limitation, but checked overflow can still theoretically occur, AFAIU.

Nonetheless, a comment on why the int64 cast is used may not hurt.

@T-Gro
Copy link
Member Author

T-Gro commented Sep 7, 2023

I was thinking about this as well - I think having the default Sum being checked fits the F# philosophy. Silently overflowing and returning invalid data would be a lot worse.

List.sumByUnchecked - if there is enough interest, it could be added to the standard library.
I tried searching, but it seems noone has encountered the need for it so far.. ?

@smoothdeveloper
Copy link
Contributor

@T-Gro sorry for the confusion, I meant, should the fix do the sum with unchecked?

let optDataHash =
  optDataResources
  |> List.map (fun ilResource ->
      use s = ilResource.GetBytes().AsStream()
      let sha256 = System.Security.Cryptography.SHA256.Create()
      sha256.ComputeHash s
      |> hash
      |> int64
  )
  // remark: using the unchecked + operator, instead of List.sum
  // which may throw due to potential overflow
  |> List.fold (FSharp.Core.Operators.(+)) 0L
  |> hash

I agree that the default for List.sum* being checked being good, and it would be a suggestion to add unchecked operations to the module.

@T-Gro
Copy link
Member Author

T-Gro commented Sep 8, 2023

Now I see.

Each optimization IL resource get's a int32 hash. They are being summed up into a int64.
Even if each IL resource had a hash getting close to int32.MaxValue, we could fit 4,294,967,298 of them into a int64. (i.e. that many compiled resources)

=> I think that for this use case, the classical sumBy<_,int64> is fine :-)

@smoothdeveloper
Copy link
Contributor

Thanks, I decided to not enter the parallel universe where we'd have 4+ billions resources with hashes that are close to Int32.MaxValue, I hope inhabitants of that universe won't use the fsc implementation we have here 🙂.

All clear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants