Skip to content
This repository was archived by the owner on Jul 19, 2025. It is now read-only.
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions lib/cc/engine/analyzers/analyzer_base.rb
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ class Base
::RubyParser::SyntaxError,
]

DEFAULT_MASS_THRESHOLD = 28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to clarify that this is intentional and vetted: previously these were different between PHP & Python/Javascript, and this change makes them the same, but keeps Ruby different.

The previous values were, I believe, picked after a lot of manual testing to get something that felt "right". What the "right" default threshold choice is for a language is dependent on how the parser the engines uses for that language ends up expressing ASTs. I.e. small parser differences can change how verbose a mass threshold ends up seeming.

Based on the classic code, it looks like this value was taken directly from there (and it didn't apply to Python in classic). I'd personally be pretty surprised if the AST results of all the languages was the same between classic & this engine: at the very least I'm pretty sure the JS parser behavior is different.

For that reason, I'm not sure the classic thresholds will make sense here, because they are likely to produce different results here than they did in classic. Do you have a sample corpus you've been using to verify same results for each of these languages between classic/these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wfleming Thanks for the tip about the thought behind the thresholds and parsing. I can investigate that.

I reference the classic rubric you mentioned in my commit, along with the Python source.

I've been keeping track of test repos here. (removed link)

In general, it seemed that on classic, duplication and complexity issues had much greater grade impact than they did on Platform.

I tested the current duplication setup on app - but could do some further testing with this exact setup on additional repos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's a question of effort vs accuracy, but if the intent is to get results similar to classic, we should probably have a small corpus of files for each language, get the overall mass of each file as calculated both by classic & by this engine, and compare those to see if the thresholds (and to a lesser extent per_point values) need to be scaled to get matching results.

BASE_POINTS = 1_500_000
POINTS_PER = 50_000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gordondiggs I'm slightly worried that using inferred constants makes the code harder to follow for future devs. Now that you see it, any doubts?

If you're not worried, I'm not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's any more confusing than using the inherited methods, which I don't find confusing at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of BASE_POINTS and MASS_OVERAGE_POINTS instead of POINTS_PER?

I didn't find the current name intuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds okay to me. Only arguments to consider against:

  1. The per naming is consistent with some of our usages in engines and classic:
    classic: cost_per
    eslint: cost_per
    rubocop: per_violation_points

  2. the total amount of overage points is really this number times the overage (mass - threshold).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps points_per_overage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me 👍


def initialize(engine_config:)
@engine_config = engine_config
end
Expand All @@ -37,6 +41,10 @@ def base_points
self.class::BASE_POINTS
end

def points_per
self.class::POINTS_PER
end

private

attr_reader :engine_config
Expand Down
2 changes: 0 additions & 2 deletions lib/cc/engine/analyzers/javascript/main.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@ class Main < CC::Engine::Analyzers::Base
"**/*.jsx"
]
LANGUAGE = "javascript"
DEFAULT_MASS_THRESHOLD = 40
BASE_POINTS = 3000

private

Expand Down
2 changes: 0 additions & 2 deletions lib/cc/engine/analyzers/php/main.rb
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ class Main < CC::Engine::Analyzers::Base
"**/*.inc",
"**/*.module"
]
DEFAULT_MASS_THRESHOLD = 10
BASE_POINTS = 4_000

private

Expand Down
2 changes: 0 additions & 2 deletions lib/cc/engine/analyzers/python/main.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ module Python
class Main < CC::Engine::Analyzers::Base
LANGUAGE = "python"
DEFAULT_PATHS = ["**/*.py"]
DEFAULT_MASS_THRESHOLD = 40
BASE_POINTS = 1000

private

Expand Down
8 changes: 2 additions & 6 deletions lib/cc/engine/analyzers/reporter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -60,19 +60,15 @@ def flay

attr_reader :engine_config, :language_strategy, :io

def mass_threshold
@mass_threshold ||= language_strategy.mass_threshold
end

def new_violation(issue)
hashes = flay.hashes[issue.structural_hash]
Violation.new(language_strategy.base_points, issue, hashes)
Violation.new(language_strategy, issue, hashes)
end

def flay_options
{
diff: false,
mass: mass_threshold,
mass: language_strategy.mass_threshold,
summary: false,
verbose: false,
number: true,
Expand Down
2 changes: 1 addition & 1 deletion lib/cc/engine/analyzers/ruby/main.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ class Main < CC::Engine::Analyzers::Base

]
DEFAULT_MASS_THRESHOLD = 18
BASE_POINTS = 10_000
POINTS_PER = 100_000
TIMEOUT = 300

private
Expand Down
22 changes: 17 additions & 5 deletions lib/cc/engine/analyzers/violation.rb
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,12 @@ module Analyzers
class Violation
attr_reader :issue

def initialize(base_points, issue, hashes)
@base_points = base_points
DEFAULT_POINTS = 1_500_000

def initialize(language, issue, hashes)
@base_points = language.base_points
@points_per = language.points_per
@threshold = language.mass_threshold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to assign the whole strategy here and call methods on the strategy elsewhere in the class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

@issue = issue
@hashes = hashes
end
Expand All @@ -32,9 +36,17 @@ def report_name
"#{current_sexp.file}-#{current_sexp.line}"
end

def calculate_points
if issue.mass >= threshold
base_points + (overage * points_per)
else
DEFAULT_POINTS
end
end

private

attr_reader :base_points, :hashes
attr_reader :base_points, :points_per, :threshold, :hashes

def current_sexp
@location ||= sorted_hashes.first
Expand All @@ -56,8 +68,8 @@ def name
end
end

def calculate_points
base_points * issue.mass
def overage
issue.mass - threshold
end

def format_location
Expand Down
2 changes: 1 addition & 1 deletion spec/cc/engine/analyzers/javascript/main_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
"path" => "foo.js",
"lines" => { "begin" => 1, "end" => 1 },
})
expect(json["remediation_points"]).to eq(297000)
expect(json["remediation_points"]).to eq(6400000)
expect(json["other_locations"]).to eq([
{"path" => "foo.js", "lines" => { "begin" => 2, "end" => 2} },
{"path" => "foo.js", "lines" => { "begin" => 3, "end" => 3} }
Expand Down
4 changes: 2 additions & 2 deletions spec/cc/engine/analyzers/php/main_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
"path" => "foo.php",
"lines" => { "begin" => 2, "end" => 6 },
})
expect(json["remediation_points"]).to eq(176000)
expect(json["remediation_points"]).to eq(3450000)
expect(json["other_locations"]).to eq([
{"path" => "foo.php", "lines" => { "begin" => 10, "end" => 14} },
])
Expand All @@ -61,7 +61,7 @@
end

def printed_issue
issue = {"type":"issue","check_name":"Identical code","description":"Similar code found in 1 other location","categories":["Duplication"],"location":{"path":"foo.php","lines":{"begin":2,"end":6}},"remediation_points":176000,"other_locations":[{"path":"foo.php","lines":{"begin":10,"end":14}}],"content":{"body": read_up}}
issue = {"type":"issue","check_name":"Identical code","description":"Similar code found in 1 other location","categories":["Duplication"],"location":{"path":"foo.php","lines":{"begin":2,"end":6}},"remediation_points":3450000,"other_locations":[{"path":"foo.php","lines":{"begin":10,"end":14}}],"content":{"body": read_up}}
issue.to_json + "\0\n"
end

Expand Down
2 changes: 1 addition & 1 deletion spec/cc/engine/analyzers/python/main_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
"path" => "foo.py",
"lines" => { "begin" => 1, "end" => 1 },
})
expect(json["remediation_points"]).to eq(54000)
expect(json["remediation_points"]).to eq(4000000)
expect(json["other_locations"]).to eq([
{"path" => "foo.py", "lines" => { "begin" => 2, "end" => 2} },
{"path" => "foo.py", "lines" => { "begin" => 3, "end" => 3} }
Expand Down
2 changes: 1 addition & 1 deletion spec/cc/engine/analyzers/ruby/main_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"path" => "foo.rb",
"lines" => { "begin" => 1, "end" => 5 },
})
expect(json["remediation_points"]).to eq(360000)
expect(json["remediation_points"]).to eq(3300000)
expect(json["other_locations"]).to eq([
{"path" => "foo.rb", "lines" => { "begin" => 9, "end" => 13} },
])
Expand Down
59 changes: 59 additions & 0 deletions spec/cc/engine/analyzers/violation_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
RSpec.describe CC::Engine::Analyzers::Violation, in_tmpdir: true do
include AnalyzerSpecHelpers

describe "#calculate_points" do
context "when issue mass exceeds threshold" do
it "calculates mass overage points" do
language = stub_language(base_points: 100, points_per: 5, mass_threshold: 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of changing the name to strategy and inlining instead of extracting a method?

strategy = double(base_points: 100, points_per: 5, mass_threshold: 10)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that much better. Thanks!

issue = double(:issue, mass: 15)
hashes = []

expected_points = 100 + 5 * (issue.mass - 10)

violation = CC::Engine::Analyzers::Violation.new(language, issue, hashes)
points = violation.calculate_points

expect(points).to eq(expected_points)
end
end

context "when issue mass is less than threshold" do
it "uses default" do
language = stub_language(base_points: 100, points_per: 5, mass_threshold: 18)
issue = double(:issue, mass: 15)
hashes = []

expected_points = CC::Engine::Analyzers::Violation::DEFAULT_POINTS

violation = CC::Engine::Analyzers::Violation.new(language, issue, hashes)
points = violation.calculate_points

expect(points).to eq(CC::Engine::Analyzers::Violation::DEFAULT_POINTS)
end
end

context "when issue mass equals threshold" do
it "calculates remediation points" do
language = stub_language(base_points: 100, points_per: 5, mass_threshold: 18)
issue = double(:issue, mass: language.mass_threshold)
hashes = []

expected_points = 100 + 5 * (issue.mass - language.mass_threshold)

violation = CC::Engine::Analyzers::Violation.new(language, issue, hashes)
points = violation.calculate_points

expect(points).to eq(expected_points)
end
end

def stub_language(args)
double(
:language,
base_points: args[:base_points],
points_per: args[:points_per],
mass_threshold: args[:mass_threshold]
)
end
end
end