Skip to content

Conversation

@michaelnebel
Copy link
Contributor

@michaelnebel michaelnebel commented Nov 1, 2023

In this PR we try to narrow which dependencies to include in standalone compilation.
Prior to this PR we took all dll's in all the packages that were downloaded as a part of the dotnet restore process and included them in the standalone compilation.

As a result

  • It often led to conflicts as Packages might contain the assemblies for multiple versions of .NET (which then conflict with each other). The deduplication of dll's can only fix this if the dll's have identical names.
  • Way to many assemblies not required for compilation got included.

We try to solve this issue by inspecting the content of all project.assets.json files after dotnet restores have been executed:

  • In the targets section we try to find all package references and only include the dll's that are required for compilation (under the compile tag). If a package reference is to a set of .NET framework reference assemblies we still include the entire package folder (for that package).

The added testcases contain examples of project.assets.json content (some context to the above).
Please note, that this work is based on reverse engineering (guessing) what the content of a project.assets.json file means.

Furthermore, this means that we have changed the logic in the DependencyManager to be addition based instead of subtraction based (that is we add packages, if we find information in the assets file about them otherwise we don't add them).

Implementation notes

  • .NET Frameworks are handled a bit differently. If we discover that framework dll's are downloaded as a part of the dotnet restore process we included the relevant NuGet package (based on a hardcoded list of priorities). If none of these packages have been downloaded, we add the assemblies from the .NET installation directory instead.
  • If the ASP.NET core NuGet package is downloaded and if the project files gives us an indication that this is needed for compilation we include this package (otherwise we include the ASP.NET from the .NET installation directory).
  • In the project.assets.json file there are sometimes a reference to a file named _._. This is an empty fail that may or may not exist and is sometimes there due to issues with empty directories (according to https://stackoverflow.com/questions/36338052/what-do-files-mean-in-nuget-packages). In this case we just add the entire directory as a dependency.
  • The AssemblyCache has been made more robust in case the information from the assets files turn out to be different than expected.
  • We might not need to the DownloadMissingPackages anymore. If we assume that we process all project files then if a package is not downloaded, it might be because it is not included in any of the assets files. At least for the integration tests, it appears that this part adds unwanted dlls <-- This happens because the logic in FileContent doesn't exclude package references that are commented out (I will fix this in another PR).

With these changes we reduce the number of extra dll's that is taken into the compilation for the standalone extractor compared to the tracing extractor.

Before

Results for nopSolutions/nopCommerce
 Query Dependencies.ql:
  Result count - both/traced/standalone: 446/0/35

Results for tobyash86/WebGoat.NET
 Query Dependencies.ql:
  Result count - both/traced/standalone: 367/0/17

Results for AAEmu/AAEmu
 Query Dependencies.ql:
  Result count - both/traced/standalone: 234/3/48

After

Results for nopSolutions/nopCommerce
  Query Dependencies.ql:
  Result count - both/traced/standalone: 446/0/1

Results for tobyash86/WebGoat.NET
 Query Dependencies.ql:
  Result count - both/traced/standalone: 367/0/2

Results for AAEmu/AAEmu
 Query Dependencies.ql:
  Result count - both/traced/standalone: 237/0/4

@github-actions github-actions bot added the C# label Nov 1, 2023
Comment on lines +153 to +142
catch (Exception e)
{
progressMonitor.LogDebug($"Failed to parse assets file (unexpected error): {e.Message}");
return false;
}

Check notice

Code scanning / CodeQL

Generic catch clause

Generic catch clause.
@michaelnebel michaelnebel force-pushed the csharp/projectassetspackages branch 4 times, most recently from c0deb0c to 415638f Compare November 2, 2023 10:20
@michaelnebel
Copy link
Contributor Author

@tamasvajk : Could you take a look at this PR? This is the re-write that only includes the package logic based on project.assets.json, but it also changes the logic in the dependencies manager from "subtraction" to "addition" (see the description above) <-- I think it will be hard to avoid that in conjunction with the package logic.
Please note that
(1) The integration tests are currently failing (but this is due to faulty logic in FileContent, which I will fix in another PR).
(2) I think that the windows integration test where microsoft.windowsdesktop.app.ref was included is most likely because this framework is also "just" downloaded during the project restore on windows machines (no matter whether it is needed or not). An indication of that is found here, where it is stated that microsoft.windowsdesktop.app.ref is specifically included in the .NET distribution package on windows machines - so it probably makes sense that on windows the .NET restore project also downloads this package. If you prefer, we can try to hardcode that we include the package if it is present, but I think it might overlap with some of the content from .NET Core.

Copy link
Contributor

@tamasvajk tamasvajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I've added some comments.

@michaelnebel michaelnebel force-pushed the csharp/projectassetspackages branch 2 times, most recently from 737732e to ef52629 Compare November 2, 2023 17:56
@michaelnebel michaelnebel force-pushed the csharp/projectassetspackages branch 2 times, most recently from 660748c to 8cb4fd0 Compare November 3, 2023 12:13
@michaelnebel michaelnebel force-pushed the csharp/projectassetspackages branch from 8cb4fd0 to 31f602c Compare November 3, 2023 12:35
@michaelnebel michaelnebel added the no-change-note-required This PR does not need a change note label Nov 3, 2023
@michaelnebel michaelnebel marked this pull request as ready for review November 3, 2023 13:24
@michaelnebel michaelnebel requested a review from a team as a code owner November 3, 2023 13:24
@michaelnebel
Copy link
Contributor Author

@tamasvajk : Ready for review.

I will also trigger a DCA run.

Copy link
Contributor

@tamasvajk tamasvajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I added some comments.

Comment on lines 106 to 122
return references
.Aggregate(dependencies, (deps, r) =>
{
var info = r.Value;
var name = r.Key.ToLowerInvariant();
if (info.Type != "package")
{
return deps;
}

// If this is a .NET framework reference then include everything.
return netFrameworks.Any(framework => name.StartsWith(framework))
? deps.Add(name, "")
: info
.Compile
.Aggregate(deps, (d, p) => d.Add(name, p.Key));
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this hard to read; dependencies, deps, and d all refer to the same variable, don't they?
Is this code equivalent to the below? (I find the below a lot easier to read)

foreach (var r in references)
{
    var info = r.Value;
    var name = r.Key.ToLowerInvariant();
    if (info.Type != "package")
    {
        continue;
    }
    // If this is a .NET framework reference then include everything.
    if (netFrameworks.Any(name.StartsWith))
    {
        dependencies.Add(name, "");
        continue;
    }
    // Otherwise only include the files mentioned in the compile section.
    foreach (var p in info.Compile)
    {
        dependencies.Add(name, p.Key);
    }
}
return dependencies;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree that Aggregate is not the most suitable way of doing this (I usually think in terms of functional programming and aggregation is just "folding", but since the data is collected as a side effect it doesn't make sense to use aggregate).
I know, we have a bit different preference when comes to implementing loops and I will try and do this using the ForeEach extension method instead - then the implementation also become similar to your suggestion :-)
Or should I just cave in and start using the foreach more? :-D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably try to convince me that I should start using ForEach more. :-)

I think it doesn't matter much which variant is used, I prefer readability over shortness. Lambdas oftentimes help with readability, but I'm not (yet) convinced that this is the case with simple foreach loops. Also, with ForEach I could always argue that there's an unnecessary lambda allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, I will try 😸

I like to avoid using continue and break statements in foreach loops - it doesn't take that many of those before the readability starts to deteriorate and they have a tendency to start popping up as the code becomes more complicated. In that aspect I also prefer the return statement instead of a continue statement, which is what we will get, if we use lambdas (this also makes it easier to re-factor the entire lambda out into a separate method, which encourages refactors when needed).
Furthermore, it is not possible to use break inside ForEach (ForEach means for each (unless an exception is thrown 😄 )). It is my impression that break usually means that we are searching for a specific element or collecting elements up until a specific condition or something similar and at the same time we and may or may not map these elements as well (in one go): I generally prefer to do filtering and then processing (if possible). The approach of doing lots of things at the same time also has a tendency to make the code more complicated in the long run.

In terms of performance, I agree that it is sometimes better to use the native C# foreach construct for performance reasons and I am sure the performance difference between foreach and ForEach is measurable if it is a loop that is executed 100k+ times. However, in most cases the difference in performance is completely insignificant.

Copy link
Contributor

@tamasvajk tamasvajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Let's wait for the DCA analysis to finish, but even the previous run shows significant reduction in the count of cs/compilation-error tuples. There are some projects where the compilation error count increased, these are good candidates for investigation for upcoming improvements.

@michaelnebel
Copy link
Contributor Author

michaelnebel commented Nov 6, 2023

Looks good to me. Let's wait for the DCA analysis to finish, but even the previous run shows significant reduction in the count of cs/compilation-error tuples. There are some projects where the compilation error count increased, these are good candidates for investigation for upcoming improvements.

In the second DCA run, the security-extended query suite was used instead of nightly (just to see, if we are loosing any of the most interesting results).
As a next step it is probably worth looking to why we loose some of the results for rapPayne/WebGoat.net.
However, overall DCA looks good.
(1) For the security extended query suite we get 47 new alerts but we loose 21 alerts (if we use counting as a metric we get an increase of 26 alerts).
(2) For all the projects in standalone suite, we get approximately 3k less compiler errors (in total this is approximately a reduction of 25% in compiler errors as we go from approximately 12k errors -> 9k errors).

Merging now.

@michaelnebel michaelnebel merged commit 3f0be47 into github:main Nov 6, 2023
@michaelnebel michaelnebel deleted the csharp/projectassetspackages branch November 6, 2023 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C# no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants