-
Notifications
You must be signed in to change notification settings - Fork 830
Determinism of the F# Compiler #1042 #2954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@davidglassborow, |
|
@dsyme I've yet to add tests for this (I'm having problems with the tests on master at the moment), but wanted to get your feedback on if this looks like the correct approach. A couple of questions:
|
@davidglassborow The code looks right to me!
I think that's the only way right now
That seems fine. I suppose the docs would get updated separately. |
b09bd3c to
becf624
Compare
|
@dsyme I've made good progress for the exe and for portable and embedded pdbs. The problem comes with debug:full or debug:pdbonly. For mono this is fine because it uses the mdb file which is deterministic. For windows it uses the native symbol writer, and this seems to generate guid and timestamps internally. Looks like the roslyn team made a version of the native symbol writer that supports deterministic pdb generation (https://www.nuget.org/packages/Microsoft.DiaSymReader.Native/). Given the push towards portable pdbs, I'm wondering about not supporting deterministic builds in the debug:pdbonly and debug:full scenarios, but am unsure how much of an issue that would be. Otherwise we could add a new package dependancy for the native reader, and use that (see rosyln's version. Any thoughts ? cc/ @jaredpar @KevinRansom @cartermp |
|
@davidglassborow Great work!
I've not strong opinions - certainly the work is valuable as is - I'm personally fine with the extra package dependency and it helps engineering to align with what Roslyn do. |
Note that it's not fully deterministic. True it allows us to control the PDB ID and other inherently non deterministic values. But the resulting PDB is still not yet byte for byte deterministic. It's being looked at but don't have a firm ETA for it. |
|
Thanks @jaredpar, is there a roslyn issue # I can use to track progress of this ? Also is the source to that nuget package open source ? |
|
We do not have Roslyn issue - since it's not code we own. The code is not open source. BTW, I'm working on extracting some of the native PDB writer code into a common library as public API (Microsoft.DiaSymReader.dll), so that other tools and compilers can use it directly without copying Roslyn internal implementation. You might want to wait for that: dotnet/roslyn#19297. |
|
There is one item in the check list that I wanted to comment on based on our Roslyn experience: tests Unit tests were excellent in helping us establish determinism in sub-systems of the compiler. As well as helping us lock down on bugs in those areas. But unit tests, no matter how many we added, were not enough to validate determinism. Determinism is a feature than is ultimately about the integration of all the various pieces of the compiler against all the possible inputs it can deal with. Hence it's much better suited to have full integration tests. The best integration test we found is the roslyn source code itself. On every PR we have a job dedicated to:
Until we had this job we were a bit falsely confident in the deterministic capabilities of the compiler. Adding this job was a humbling experience and revealed a host of corner case bugs we'd missed. Now though we have it in place and feel very confident about our feature. I highly suggest F# consider a similar approach to testing. |
|
@davidglassborow I think we could get some smoke tests in now (just a couple of simple ones) and take this PR - it's great work @jaredpar FOr some reason I'm not so pessimistic - AFAIK this PR really does address the sources of non-determinism. But we'll see :) |
I'm always inherently terrified I will break anything that I don't have a ton of testing for 😄 |
|
@dsyme okay I'll look at some simple smoke tests. Are there any examples in the current test suite of the sort of thing your thinking ? Also I want to specifically test the the guid inside the exe matches the one in the pdb files. Can you think of any existing tools or approach I could use for this ? (I'm wondering if there are standard CLR IL or debugging tools that would do the job) |
Perhaps compile something like tests\fsharp\core\libtest\test.fsx or one of the large similar files
I don't know of tools to crack the PDB, sorry. There must be some |
7014ed0 to
0d7ba5f
Compare
|
@dsyme I've finally had time to have a look at this again. I've added tests\fsharp\core\libtest\test.fsx as test case. Given the native dependancy on sorting out debug:full and debug:pdbonly (dotnet/roslyn#19297), and that that the embedded pdb uses compression which makes it very hard to sort out mvid without large scale restructuring of the code, I suggest we make --deterministic support only builds with no debug info. If your happy with this I'll add an appropriate error. |
|
I think Roslyn currently is more or less fully deterministic when creating portable pdbs, but when creating old-school pdbs, two compilations may produce equal assemblies but different pdbs. Disallowing PDB generation overall would make this feature, in my honest opinion, close to useless. Would it be possible to instead only allow --deterministic in combination with portable pdbs in the first step? Support for native pdbs may or may not be added later then, npdbs are in the process of being phased out and replaced with portable anyway. |
This is mostly correct. Let me state the actual rules. When given deterministic input and the
There are rare occasions you will find the native PDB produced is fully deterministically. That should be considered "by luck" than "by design".
Agree. In particular because I'm very hopeful the fix for native PDBs is coming. If F# does the work now to be fully deterministic for portable and as detererministic as possible for native. Then once this fix is available it should be a simple NuGet package upgrade for F#. |
|
Re Windows PDBs: As far as I can tell F# is currently using the SymWriter that's globally registered by .NET Framework: That version is not being updated, other than security fixes. To get the version that supports determinism (modulo gaps between fields) F# will need to start using the SymWriter from Microsoft.DiaSymReader.Native.nupkg, like Roslyn does. Once the native implementation is fixed to be fully deterministic this package will be updated and F# can get full determinism for free. But work needs to be done to move to this package first. Coincidentally, I'm currently working on extracting some of the code that Roslyn uses for Windows PDB writing to Microsoft.DiaSymReader.dll making it easier for other PDB producers to write correct PDBs. See https://github.com/dotnet/symreader/tree/master/src/Microsoft.DiaSymReader/Writer. Let me know if you have any questions. |
|
Okay I'll plan to get deterministic portable pdbs in this PR, and leave the native pub support to later on when the native sym writer is fully deterministic (and use the Microsoft.DiaSymReader nuget package to do so). @tmat @jaredpar did either of you know of any command line tools I can use as part of the tests to confirm the MVID identifiers between the exe and pdbs match ? |
|
For Portable and Embedded PDBs: mdv.exe Prints out debug info and metadata for a PE file. Prints out info for Portable PDB. Prints out info for PDB embedded into Lib.dll. |
|
Thanks @tmat. I'm probably being thick, but I can't see the mdv.exe in that nuget package, just the Microsoft.Metadata.Visualizer.dll |
|
@davidglassborow Ah... my fault. It's not there! Forgot to publish it. Let me fix that... |
Thread a new option through the options to the code writing the IL. For the moment just use a constant for deterministic timestamp, and use 0 as the timestamp in the MVID
…es and pausing to prevent race-condition
If compiling deterministically, use deterministic id provider support in making a pdb builder. see https://github.com/dotnet/roslyn/blob/master/src/Compilers/Core/Portable/PEWriter/PeWriter.cs#L132
418e4b6 to
edd46c6
Compare
Make the checks for finding embedded guids more strict and unique
|
Ok I've made determinism work only for portable pdbs (--debug:portable and --debug:embedded). |
|
This is great, I approve! :) |
|
As I understand the current implementation, the combination Would it be possible to instead allow this combination, with the caveat that the assembly will be deterministic, but the pdb not? That's basically the behaviour of Roslyn (ref #2954 (comment)). We would like to move to portable pdb as fast as possible (and hopefully will, in the near future), but can't at this exact moment now, because a few build tools rely on native pdbs. |
|
@0x53A unfortunately that's a lot more complicated to do. The native PDB contains the MVID in it, and currently we don't calculate the deterministic MVID until after the PDB has been generated. This means recording in the memory stream where it is written and updating it (like the PR does with the metadata). However the native PDB is written using Windows native DLL currently. It probably could be done, but to be inline with Roslyn we would have to change the code to use the Microsoft.DiaSymReader.Native dll. I'm not going to have any time soon to do it, but happy to try and help if you want to have a crack. |
|
@davidglassborow Ok then that makes sense, thank you. That deterministic will work at all will be great, and as you said, better to ship it now and improve it later. =) |
KevinRansom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
|
@cartermp I'll raise a PR on the docs to document this new option, but newbie question, should I be looking at https://github.com/Microsoft/visualfsharpdocs/ or https://github.com/dotnet/docs for the fsc reference docs ? |
|
The latter! That's where the active docs are. This is very cool. |
|
is There is one usage: b4901a5#diff-5b9ab9dd9d7133aaf23add1048742031R10637 and I'm about to introduce some more, should I make a shim for it and hardcode it to edit: scratch that, this is in compile error path, deterministic build of non compilable code doesn't matter. |
No. Caring about this value implies that you believe builds will be identical across OS that differ in line ending. It's a good ideal but in practice it's not a reality. |
Add --deterministic+ option to fsc.exe to make compilation deterministic given exactly the same inputs, using similar logic to that of Roslyn.
See discussions on #1042 for background for what they did for C# and VB.net
Progress
Background reference:
C# Behaviour
Wildcard vs. Deterministic
C# error message is generic version format, see example below when compiling a cs file
MVID and Timestamp