Skip to content

Conversation

@stefan-sf-ibm
Copy link
Contributor

While raising image_unload event ep_rt_mono_write_event_module_unload is called which itself calls get_module_event_data which accesses image->image_info. Thus set image->image_info to NULL after free.

@ghost ghost added community-contribution Indicates that the PR has been added by a community member area-VM-meta-mono labels Mar 25, 2022
@vargaz
Copy link
Contributor

vargaz commented Mar 25, 2022

Might be better to call image_unloaded earlier instead of calling it on a basically freed structure.

@stefan-sf-ibm
Copy link
Contributor Author

I'm not sure what the exact distinction between MONO_PROFILER_RAISE (image_unloaded, (image)) and MONO_PROFILER_RAISE (image_unloading, (image)) should be. Whether it is only about image information contained in the event or also about the exact timing when the unload materialized. If the timing is not of interest then one could just move MONO_PROFILER_RAISE (image_unloaded, (image)) close to MONO_PROFILER_RAISE (image_unloading, (image)). Then storage is still available and e.g. module_il_pdb_path can be computed. If MONO_PROFILER_RAISE (image_unloaded, (image)) should be triggered as late as possible, then I guess one has to precompute all the required information prior basically freeing image and use that in the end. Any thoughts?

@lambdageek
Copy link
Member

@lateralusX @josalem What's the difference between the ModuleUnloading and ModuleUnloaded events? Is the timing important?

I think mono_profiler_module_unloaded probably shouldn't call get_module_event_data (or maybe do it in a special mode where it only accesses the bare minimum information - the module path). Alternately if we know in mono that ModuleUnloading is always followed by ModuleUnloaded, maybe we can access the information we need early and store it away in "Unloading" until the corresponding "Unloaded" event needs it.

@lambdageek
Copy link
Member

I would also not be opposed to a new MonoProfiler image_unloaded_with_details event where we just pass the path and guid as event arguments.

@lateralusX
Copy link
Member

lateralusX commented Mar 25, 2022

I guess the main relationship between the events is that one should come before the other in the stream. The second things they can be used it to get an idea about how long it takes to unload something, so putting them closer will reduce ability to calculate that time, but don't think that is super important. If the Mono profiler API hands over something as part of the callback, it needs to include live data, or NULL:ed out to indicate that its is not available.

@lateralusX
Copy link
Member

lateralusX commented Mar 25, 2022

For the unload profiler event it looks like a lot of things can be freed up in the image struct, and most logic in mono_image_close_except_pools is mainly releasing memory, so moving the event up to a location where data is still valid is probably the more safe alternative.

@lateralusX
Copy link
Member

lateralusX commented Mar 25, 2022

I think mono_profiler_module_unloaded probably shouldn't call get_module_event_data (or maybe do it in a special mode where it only accesses the bare minimum information - the module path)

I think the problem is the design of the profiler API, if it pass a data structure as part of its API, that needs to be accessible or NULL:ed out, if not, it should use a different API altogether, not passing a MonoImage * as parameter, or we should fire the event at a location where the data is still around.

@lambdageek
Copy link
Member

For the unload profiler event it looks like a lot of things can be freed up in the image struct, and most logic in mono_image_close_except_pools is mainly releasing memory, so moving the event up to a location where data is still valid is probably the more safe alternative.

Sounds good to me

@stefan-sf-ibm could you update this PR to move the image_unloaded event earlier. Setting image_info to null is good, too, so leave that in there.

@vargaz
Copy link
Contributor

vargaz commented Mar 25, 2022

image_unloaded should probably come after other events caused by the unloading, i.e. references to other images are dropped causing them to be unloaded as well.

@stefan-sf-ibm stefan-sf-ibm force-pushed the fix-image-unload-event branch from 90d0ac1 to fe59d6b Compare March 28, 2022 07:39
@stefan-sf-ibm
Copy link
Contributor Author

Pushed the unloaded event upwards which means it is pretty close to the unloading event, now. Anyhow, it should come before mono_image_storage_close I guess. At that point in time at least name, filename etc. are also available.

@lateralusX
Copy link
Member

lateralusX commented Mar 28, 2022

Looks like mono_image_close_except_pools_all will trigger more unload events, so if we are going to capture the right order of unload events, it looks like it can't move much at all.

@lateralusX
Copy link
Member

lateralusX commented Mar 28, 2022

So what get_module_event_data looks at during the unloaded event is mainly the following image fields:

image->assembly (just converting it to uint64_t)
image->dynamic
image->aot_module
image->filename
image->image_info
image->guid

and for the EventPipe unloaded event, we are pretty much only using:

pointer to image itself, that should still be valid.
image guid
image->filename, if not NULL.
and image->image_info needs to be NULL:ed out if freed.

so one option could be to make sure that atleast these fields are still available when we do the profiler callback.

@lateralusX
Copy link
Member

So maybe delay parts of this,

if (debug_assembly_unload) {
until after profiler callback and set image->image_info to NULL as you already did, will at least take care of the fields currently in use, but as previously said, the underlying issue is with the profiler API, handing out access to a struct with unknown state.

@stefan-sf-ibm
Copy link
Contributor Author

I think after

mono_image_storage_close (image->storage);

pointer image->raw_data gets invalidated which is used in get_module_event_data in order to determine module_il_pdb_{age,path}. So just delaying

if (debug_assembly_unload) {

probably won't help.

As you already proposed it might be better to just change the API. I gave it a try in the following commit and basically copied get_module_event_data into mono_profiler_get_module_event_data where I added

module_data->guid = g_strdup (mono_image_get_guid (image));

since the guid is required by mono_profiler_module_unloaded.

Benefit would be that the time between events unloading and unloaded would have a meaning and beside that the order of further events, possibly triggered by mono_image_close_except_pools_all, is kept.

Any thoughts about this approach? I'm not sure whether it is worthwhile to change the API for all other image related events like image_loaded in order to keep the API consistent. If we could settle for an API change I would go on and replace all calls get_module_event_data by mono_profiler_get_module_event_data.

@stefan-sf-ibm stefan-sf-ibm force-pushed the fix-image-unload-event branch from fe59d6b to 5063024 Compare March 29, 2022 08:39
@stefan-sf-ibm stefan-sf-ibm marked this pull request as draft March 29, 2022 08:41
@lateralusX
Copy link
Member

lateralusX commented Mar 29, 2022

Problem with changing existing API is that this is a public API, so that would break other profiler providers, if we are going down that route we will need to add a new API for the unloaded event, aligned with what @lambdageek suggested above and then let EventPipe profiler provider hook up to the new API, instead of using the old. Profiler still needs to fire both events (inexpensive if there are no listeners). If we are going to change to this route, then we should add a new API, that takes more details around image, probably all the data needed by FireEtwModuleUnload_V2, close to the change you already did.

An alternative to simplify to profiler API, is adding one new image unloading/unloaded API, and then have a enum telling if its unloading or unloaded, first call to unloading pass in Image *, and take back a void * returned from profiler provider, second call to pass NULL as Image and return back void * to profiler provider. That way, the provider could implement the logic on data it would like to preserve between calls, and then free that up on second call. Putting both into one API will prevent an implementer to only subscribe to one of them, since both are needed, or we could have two new API's and then its up to the provider to subscribe to both if it returns data that should go into the second call.

if we add two new API's, it could look like this:

mono_image_unloading_v2(size_t id, Image *, void **);
mono_image_unloaded_v2(size_t id, void *);

and we can keep all logic around what data that needs to be preserved in EventPipe.

pointer image->raw_data gets invalidated which is used in get_module_event_data in order to determine module_il_pdb_{age,path}.

get_module_event_data checks that image->image_info is not NULL before it tries to calculate module_il_pdb_path and module_il_pdb_age,

if (image && image->image_info) {
, so if we NULL that field when we close the underlying image, that code should not try to access underlying data structures.

@stefan-sf-ibm
Copy link
Contributor Author

Right I somehow managed to neglect the fact that this is a public API. Sorry for the noise.

I will restore the old commit and delay freeing of image->filename and image->guid only in case of !debug_assembly_unload (as previously).

@stefan-sf-ibm stefan-sf-ibm force-pushed the fix-image-unload-event branch from 5063024 to cb28f00 Compare March 29, 2022 14:49
@stefan-sf-ibm stefan-sf-ibm marked this pull request as ready for review March 30, 2022 06:20
@lateralusX
Copy link
Member

@stefan-sf-ibm
Copy link
Contributor Author

I cannot reproduce the failure. Running the testsuite locally I get

Passed!  - Failed:     0, Passed:   646, Skipped:     0, Total:   646, Duration: 31 m 27 s - /runtime/artifacts/bin/DebuggerTestSuite/x64/Debug/DebuggerTestSuite.dll (net6.0)

Interestingly the local summary mentions far more test cases than the one from CI. Just to be sure: Is this the correct test suite src/mono/wasm/debugger/DebuggerTestSuite?

@lateralusX
Copy link
Member

lateralusX commented Apr 7, 2022

It could be a flaky test, just saw that test seemed to have code that read out the image GUID and did some lazy loading:

[xUnit.net 00:02:53.57] DebuggerTests.BreakpointTests.DebugHotReloadMethodUnchanged [FAIL]
Failed DebuggerTests.BreakpointTests.DebugHotReloadMethodUnchanged [30 ms]
Error Message:
Assert.True() Failure
Expected: True
Actual: False
Stack Trace:
at DebuggerTests.DebuggerTestBase.LoadAssemblyAndTestHotReload(String asm_file, String pdb_file, String asm_file_hot_reload, String class_name, String method_name) in //src/mono/wasm/debugger/DebuggerTestSuite/DebuggerTestBase.cs:line 1360
at DebuggerTests.BreakpointTests.DebugHotReloadMethodUnchanged() in /
/src/mono/wasm/debugger/DebuggerTestSuite/BreakpointTests.cs:line 351

its in src/mono/wasm/debugger/DebuggerTestSuite/BreakpointTests.cs so should be part of the test suite you run locally.

@thaystg Any thoughts around the failure above on this PR, or just flaky test? PR does set image GUID to NULL after freeing it, compared to just freeing it but keep dangling pointer in Image * type and test seems to do some lazy assembly loading and looks like the test at least access the image GUID, but looks like the raw binary version and not the string version, so should be unrelated.

@lateralusX
Copy link
Member

/azp run runtime-wasm

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lateralusX lateralusX force-pushed the fix-image-unload-event branch from cb28f00 to f091933 Compare May 6, 2022 10:58
@lateralusX
Copy link
Member

Rebased on main as well as testing to temporary remove setting image->guid to NULL to investigate if WASM tests still fails.

@lateralusX
Copy link
Member

All pass, when not setting image->guid to NULL, let's try to revert it back and see test results.

@lateralusX lateralusX merged commit cfa18c9 into dotnet:main May 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Diagnostics-mono community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants