Skip to content

Move Rule into a dataclass #1029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Mar 24, 2021
Merged

Move Rule into a dataclass #1029

merged 33 commits into from
Mar 24, 2021

Conversation

rw-access
Copy link
Contributor

Issues

No issues, just trying to clean some things up

Summary

I moved rules and the schemas into a dataclass. I also changed the way hashes are built, so that you never modify the original rule file. I commented out some tests that are no longer relevant.

Also bumped to Python 3.8 so I could use typing.Literal and a few other additions.

Since this changes a lot of stuff under the hood, I'm hoping that we can get it merged sooner. I'm okay with temporarily suppressing testing if it means that we can avoid broken builds.

There's still a good bit to go.

Contributor checklist

@rw-access rw-access added test-suite unit and other testing components python Internal python for the repository labels Mar 10, 2021
@rw-access rw-access marked this pull request as ready for review March 10, 2021 18:18
Copy link
Contributor

@brokensound77 brokensound77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks for starting this!

As we talked about the other day, we should jump on a call and discuss this to see what options make the most sense and make sure we are aligned.

I def like that we are doing this in stages and think disabling some tests is fine, but I don't think we should merge it to main in a broken state. There is a lot that still needs to be verified/tested concerning packaging, the CLI, and the remaining tests. I think we should most likely merge this to an intermediate base branch so that we can keep to the scope of the intermediate PRs.

@@ -10,17 +10,18 @@
import os
import re
from collections import OrderedDict
from typing import Dict, List
from pathlib import Path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some other changes for rule_loader that seem in scope and make sense to update with this

 import functools
-import glob
-import io
 import os
 import re
 from collections import OrderedDict
@@ -47,7 +45,7 @@ def reset():
 
 
 @cached
-def load_rule_files(verbose=True, paths=None):
+def load_rule_files(verbose=True, paths=None) -> Dict[Path, dict]:
     """Load the rule YAML files, but without parsing the EQL query portion."""
     file_lookup = {}  # type: dict[str, dict]
 
@@ -55,15 +53,14 @@ def load_rule_files(verbose=True, paths=None):
         print("Loading rules from {}".format(RULES_DIR))
 
     if paths is None:
-        paths = sorted(glob.glob(os.path.join(RULES_DIR, '**', '*.toml'), recursive=True))
+        paths: List[Path] = sorted(Path(get_path(RULES_DIR)).rglob('*.toml'))
 
     for rule_file in paths:
         try:
             # use pytoml instead of toml because of annoying bugs
             # https://github.com/uiri/toml/issues/152
             # might also be worth looking at https://github.com/sdispater/tomlkit
-            with io.open(rule_file, "r", encoding="utf-8") as f:
-                file_lookup[rule_file] = pytoml.load(f)
+            file_lookup[rule_file] = pytoml.loads(rule_file.read_text())
         except Exception:
             print(u"Error loading {}".format(rule_file))
             raise
@@ -118,7 +115,10 @@ def load_rules(file_lookup=None, verbose=True, error=True):
 
         except Exception as e:
             failed = True
-            err_msg = "Invalid rule file in {}\n{}".format(rule_file, click.style(str(e), fg='red'))
+            relative_rule_path = rule_file.relative_to(RULES_DIR)
+            rule_id = rule_contents.get('rule', {}).get('rule_id', '')
+            colored_error = click.style(str(e), fg='red')
+            err_msg = f"Invalid rule file in: {relative_rule_path} - {rule_id}\n{colored_error}"
             errors.append(err_msg)
             if error:
                 if verbose:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are good changes, and the loader is next.
i'll punt until a follow up PR

Copy link
Contributor

@brokensound77 brokensound77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:soclose:

I added comments on changes from new commits


@classmethod
@cached
def __schema(cls):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from marshmallow import Schema
Suggested change
def __schema(cls):
def __schema(cls: ClassT) -> Type[Schema][ClassT]:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

40d9e58.
i changed it slightly, because i return the instantiated class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally was going to suggest that, but it has this error: Class 'SchemaMeta' does not define '__getitem__', so the '[]' operator cannot be used on its instances. It doesn't look like my suggestion is correctly hinting either, but it is an instantiated type

@threat-punter
Copy link
Contributor

I saw this in the tests. Just making sure it's on your radar.

================================================================================================================================= warnings summary ==================================================================================================================================
env/detection-rules-build/lib/python3.8/site-packages/marshmallow/fields.py:195: 35 warnings
tests/test_packages.py: 3 warnings
  /Users/threatpunter/Documents/GitHub/detection-rules/env/detection-rules-build/lib/python3.8/site-packages/marshmallow/fields.py:195: RemovedInMarshmallow4Warning: Passing field metadata as a keyword arg is deprecated. Use the explicit `metadata=...` argument instead.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================================================================================================================= 93 passed, 38 warnings in 29.71s ==========================================================================================================================
(.venv) $ 

@rw-access
Copy link
Contributor Author

@threat-punter looks like that'll be fixed when lovasoa/marshmallow_dataclass#119 is updated and we pull in the dependencies.

Copy link
Contributor

@brokensound77 brokensound77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. pull in updates from main and make release
  2. from main, make release

If those two package hashes match, I think that means we are good and I ✔️

Also, left a few other final comments as well

Thanks for enduring through this 🎈

self.assertEqual(1, len(changed_rules), 'Package version bumping is not detecting changed rules')
self.assertEqual(0, len(new_rules), 'Package version bumping is improperly detecting new rules')
self.assertEqual(2, package.rules[0].contents['version'], 'Package version not bumping on changes')
# def test_versioning_diffs(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make this test work again by loading an actual rule vs a test rule

Comment on lines +70 to +78
#
# def test_format_of_all_rules(self):
# """Test all rules."""
# rules = rule_loader.load_rules().values()
#
# for rule in rules:
# is_eql_rule = isinstance(rule.contents.data, EQLRuleData)
# self.compare_formatted(
# rule.rule_format(formatted_query=False), callback=nested_normalize, kwargs={'eql_rule': is_eql_rule})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from detection_rules import rule_loader
from detection_rules.rule import EQLRuleData
Suggested change
#
# def test_format_of_all_rules(self):
# """Test all rules."""
# rules = rule_loader.load_rules().values()
#
# for rule in rules:
# is_eql_rule = isinstance(rule.contents.data, EQLRuleData)
# self.compare_formatted(
# rule.rule_format(formatted_query=False), callback=nested_normalize, kwargs={'eql_rule': is_eql_rule})
def test_format_of_all_rules(self):
"""Test all rules."""
rules = rule_loader.load_rules().values()
for rule in rules:
is_eql_rule = isinstance(rule.contents.data, EQLRuleData)
self.compare_formatted(
rule.contents.to_dict(), callback=nested_normalize, kwargs={'eql_rule': is_eql_rule})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed this a different way, so i think we're good now 🤞

@rw-access
Copy link
Contributor Author

Package hash unchanged and no version bumps detected from master.
Current package hash was 826f795f49ef0731a15ddd945d5977adc789ecb3926e655578fbd0d570a19561

@rw-access rw-access merged commit c0af222 into elastic:main Mar 24, 2021
@rw-access rw-access deleted the rule-refactor branch March 24, 2021 16:24
This was referenced Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Internal python for the repository test-suite unit and other testing components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants