-
Notifications
You must be signed in to change notification settings - Fork 25
Tune remediation points to match Classic #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3fbc621
to
771cf09
Compare
@codeclimate/review Ready for review! Fixed specs and lowered python threshold to 28 to match classic. With these tuning changes and those to See updated commit message for full summary of changes. Reference to rubric in classic for PHP, Ruby and JavaScript: And python: |
@@ -15,8 +15,9 @@ class Main < CC::Engine::Analyzers::Base | |||
"**/*.jsx" | |||
] | |||
LANGUAGE = "javascript" | |||
DEFAULT_MASS_THRESHOLD = 40 | |||
BASE_POINTS = 3000 | |||
DEFAULT_MASS_THRESHOLD = 28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it work to define the ones that are the same across the board in Base
?
[1] pry(main)> class Base
[1] pry(main)* FOO = 5
[1] pry(main)*
[1] pry(main)* def foo
[1] pry(main)* self.class::FOO
[1] pry(main)* end
[1] pry(main)* end
=> :foo
[2] pry(main)> class A < Base
[2] pry(main)* end
=> nil
[3] pry(main)> A.new.foo
=> 5
[4] pry(main)> class B < Base
[4] pry(main)* FOO = 10
[4] pry(main)* end
=> 10
[5] pry(main)> B.new.foo
=> 10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure that sounds good.
In general, it felt to me like there were opportunities to refactor in this code, but I wanted to avoid additional changes in this PR and refactor separately.
I'm happy with the threshold one though, adding that now!
base_points * issue.mass | ||
if issue.mass > threshold | ||
base_points + overage * points_per | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unnecessary (DEFAULT_POINTS
seems unnecessary in general). If the issue mass is not greater or equal than the threshold, it's not an issue at all, and doesn't need points. Is this maybe vestigial from how classic decided if something was worth reporting?
492ea23
to
cf5313b
Compare
Changes include: - Update PHP, Python, and JavaScript thresholds to match Classic - Increase base_points for all four languages - Add `points_per` value to each language - change calculation formula from - `base_points * issue.mass_threshold` to - `base_points + overage * points_per` Reference to rubric in classic for PHP, Ruby and JavaScript: https://github.com/codeclimate/analyzer/blob/master/lib/quality/rubric.rb#L12c And python: https://github.com/codeclimate/analyzer/blob/master/lib/quality/profile.rb#L41
e476954
to
5a3fad4
Compare
Punt on I'd be interested in doing some additional refactoring of existing duplication code in a separate PR to follow. |
describe "#calculate_points" do | ||
context "when issue mass exceeds threshold" do | ||
it "calculates mass overage points" do | ||
language = stub_language(base_points: 100, points_per: 5, mass_threshold: 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of changing the name to strategy
and inlining instead of extracting a method?
strategy = double(base_points: 100, points_per: 5, mass_threshold: 10)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that much better. Thanks!
A few nits. Looking good 👍 |
@@ -11,6 +11,10 @@ class Base | |||
::RubyParser::SyntaxError, | |||
] | |||
|
|||
DEFAULT_MASS_THRESHOLD = 28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to clarify that this is intentional and vetted: previously these were different between PHP & Python/Javascript, and this change makes them the same, but keeps Ruby different.
The previous values were, I believe, picked after a lot of manual testing to get something that felt "right". What the "right" default threshold choice is for a language is dependent on how the parser the engines uses for that language ends up expressing ASTs. I.e. small parser differences can change how verbose a mass threshold ends up seeming.
Based on the classic code, it looks like this value was taken directly from there (and it didn't apply to Python in classic). I'd personally be pretty surprised if the AST results of all the languages was the same between classic & this engine: at the very least I'm pretty sure the JS parser behavior is different.
For that reason, I'm not sure the classic thresholds will make sense here, because they are likely to produce different results here than they did in classic. Do you have a sample corpus you've been using to verify same results for each of these languages between classic/these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wfleming Thanks for the tip about the thought behind the thresholds and parsing. I can investigate that.
I reference the classic rubric you mentioned in my commit, along with the Python source.
I've been keeping track of test repos here. (removed link)
In general, it seemed that on classic, duplication and complexity issues had much greater grade impact than they did on Platform.
I tested the current duplication setup on app
- but could do some further testing with this exact setup on additional repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's a question of effort vs accuracy, but if the intent is to get results similar to classic, we should probably have a small corpus of files for each language, get the overall mass of each file as calculated both by classic & by this engine, and compare those to see if the thresholds (and to a lesser extent per_point
values) need to be scaled to get matching results.
This looks pretty great to me pending green builds |
@wfleming Thanks for feedback. After posting a question in dev, I might break this change up, and just ship the ruby update. I did some additional screening on our own (ruby) repos and those grades look consistent with Classic. I hadn't realized that the settings in Classic wouldn't translate directly for the other languages. Sounds like it requires more data gathering to assess, and determine desired grades. On the positive side: users have some flexibility to configure their own thresholds in |
Closing in favor of piecemeal approach, starting with Ruby: https://github.com/codeclimate/codeclimate-duplication/pull/67/files |
WIP - opening early for feedback
Changes include:
base_points
for all four languagespoints_per
value to each languagebase_points * issue.mass_threshold
to
base_points + overage * points_per
Reference to rubric in classic:
https://github.com/codeclimate/analyzer/blob/master/lib/quality/rubric.rb#L12c
Note: python is handled separately in classic. It's not included in
rubric file above ^.