Skip to content

Tune remediation points to match Classic #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

ABaldwinHunter
Copy link
Contributor

WIP - opening early for feedback

Changes include:

  • Increase base_points for all four languages
  • Add points_per value to each language
  • change calculation formula from
    base_points * issue.mass_threshold
    to
    base_points + overage * points_per

Reference to rubric in classic:
https://github.com/codeclimate/analyzer/blob/master/lib/quality/rubric.rb#L12c

Note: python is handled separately in classic. It's not included in
rubric file above ^.

@ABaldwinHunter ABaldwinHunter force-pushed the abh-points branch 2 times, most recently from 3fbc621 to 771cf09 Compare December 29, 2015 16:37
@ABaldwinHunter
Copy link
Contributor Author

@codeclimate/review Ready for review!

Fixed specs and lowered python threshold to 28 to match classic.

With these tuning changes and those to codeclimate-rubocop, app is a 3.31.

See updated commit message for full summary of changes.

Reference to rubric in classic for PHP, Ruby and JavaScript:
https://github.com/codeclimate/analyzer/blob/master/lib/quality/rubric.rb#L12c

And python:
https://github.com/codeclimate/analyzer/blob/master/lib/quality/profile.rb#L41

@@ -15,8 +15,9 @@ class Main < CC::Engine::Analyzers::Base
"**/*.jsx"
]
LANGUAGE = "javascript"
DEFAULT_MASS_THRESHOLD = 40
BASE_POINTS = 3000
DEFAULT_MASS_THRESHOLD = 28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work to define the ones that are the same across the board in Base?

[1] pry(main)> class Base
[1] pry(main)*   FOO = 5
[1] pry(main)*   
[1] pry(main)*   def foo
[1] pry(main)*     self.class::FOO
[1] pry(main)*   end  
[1] pry(main)* end  
=> :foo
[2] pry(main)> class A < Base
[2] pry(main)* end  
=> nil
[3] pry(main)> A.new.foo
=> 5
[4] pry(main)> class B < Base
[4] pry(main)*   FOO = 10
[4] pry(main)* end  
=> 10
[5] pry(main)> B.new.foo
=> 10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure that sounds good.

In general, it felt to me like there were opportunities to refactor in this code, but I wanted to avoid additional changes in this PR and refactor separately.

I'm happy with the threshold one though, adding that now!

@ABaldwinHunter ABaldwinHunter changed the title WIP - Tune remediation points to match Classic Tune remediation points to match Classic Dec 29, 2015
base_points * issue.mass
if issue.mass > threshold
base_points + overage * points_per
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessary (DEFAULT_POINTS seems unnecessary in general). If the issue mass is not greater or equal than the threshold, it's not an issue at all, and doesn't need points. Is this maybe vestigial from how classic decided if something was worth reporting?

Changes include:
- Update PHP, Python, and JavaScript thresholds to match Classic
- Increase base_points for all four languages
- Add `points_per` value to each language
- change calculation formula from
    - `base_points * issue.mass_threshold`
  to
   - `base_points + overage * points_per`

Reference to rubric in classic for PHP, Ruby and JavaScript:
https://github.com/codeclimate/analyzer/blob/master/lib/quality/rubric.rb#L12c

And python:
https://github.com/codeclimate/analyzer/blob/master/lib/quality/profile.rb#L41
@ABaldwinHunter
Copy link
Contributor Author

@codeclimate/review Ready for re-review!

Updates:

  • Refactor Violation class to initialize with language_strategy obj (fewer args)*
  • Add calculation specs

* thanks @dblandin / @wfleming for suggestion

@ABaldwinHunter
Copy link
Contributor Author

Punt on #format spec because already tested by main_specs and a bit hairy for integration.

I'd be interested in doing some additional refactoring of existing duplication code in a separate PR to follow.

describe "#calculate_points" do
context "when issue mass exceeds threshold" do
it "calculates mass overage points" do
language = stub_language(base_points: 100, points_per: 5, mass_threshold: 10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of changing the name to strategy and inlining instead of extracting a method?

strategy = double(base_points: 100, points_per: 5, mass_threshold: 10)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that much better. Thanks!

@dblandin
Copy link
Contributor

A few nits. Looking good 👍

@@ -11,6 +11,10 @@ class Base
::RubyParser::SyntaxError,
]

DEFAULT_MASS_THRESHOLD = 28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to clarify that this is intentional and vetted: previously these were different between PHP & Python/Javascript, and this change makes them the same, but keeps Ruby different.

The previous values were, I believe, picked after a lot of manual testing to get something that felt "right". What the "right" default threshold choice is for a language is dependent on how the parser the engines uses for that language ends up expressing ASTs. I.e. small parser differences can change how verbose a mass threshold ends up seeming.

Based on the classic code, it looks like this value was taken directly from there (and it didn't apply to Python in classic). I'd personally be pretty surprised if the AST results of all the languages was the same between classic & this engine: at the very least I'm pretty sure the JS parser behavior is different.

For that reason, I'm not sure the classic thresholds will make sense here, because they are likely to produce different results here than they did in classic. Do you have a sample corpus you've been using to verify same results for each of these languages between classic/these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wfleming Thanks for the tip about the thought behind the thresholds and parsing. I can investigate that.

I reference the classic rubric you mentioned in my commit, along with the Python source.

I've been keeping track of test repos here. (removed link)

In general, it seemed that on classic, duplication and complexity issues had much greater grade impact than they did on Platform.

I tested the current duplication setup on app - but could do some further testing with this exact setup on additional repos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's a question of effort vs accuracy, but if the intent is to get results similar to classic, we should probably have a small corpus of files for each language, get the overall mass of each file as calculated both by classic & by this engine, and compare those to see if the thresholds (and to a lesser extent per_point values) need to be scaled to get matching results.

@gdiggs
Copy link
Contributor

gdiggs commented Dec 30, 2015

This looks pretty great to me pending green builds

@ABaldwinHunter
Copy link
Contributor Author

@wfleming Thanks for feedback. After posting a question in dev, I might break this change up, and just ship the ruby update.

I did some additional screening on our own (ruby) repos and those grades look consistent with Classic.

I hadn't realized that the settings in Classic wouldn't translate directly for the other languages.
In preliminary testing, flask gets: a 2.7 on engines now, 2.38 on classic, 1.93 with point updates here only, 1.7 with point updates and threshold change.

Sounds like it requires more data gathering to assess, and determine desired grades.

On the positive side: users have some flexibility to configure their own thresholds in .codeclimate.yml.

@ABaldwinHunter
Copy link
Contributor Author

Closing in favor of piecemeal approach, starting with Ruby: https://github.com/codeclimate/codeclimate-duplication/pull/67/files

@ABaldwinHunter ABaldwinHunter deleted the abh-points branch January 20, 2016 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants