Skip to content

[Bug] Query validation failing to capture InSet edge case with ip field types #3540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Mikaayenson opened this issue Mar 26, 2024 · 4 comments · Fixed by #3572
Closed

[Bug] Query validation failing to capture InSet edge case with ip field types #3540

Mikaayenson opened this issue Mar 26, 2024 · 4 comments · Fixed by #3572
Assignees
Labels
bug Something isn't working

Comments

@Mikaayenson
Copy link
Contributor

Mikaayenson commented Mar 26, 2024

Summary

Related to #3534

When ip fields are used with in_set fields that are type keyword, failures occur in elasticsearch.

For example:

network where destination.address in ("127.0.0.1", "::1")

Our internal validation does not catch this.

The validation issue is a combination of two things:

  1. The type of ip fields are internally set to String here within detection-rules
  2. We don't yet have an ip type_hint to correlate in eql here class TypeHint(Enum):

After fixing those two small updates, we then would run into issues where all ip types can not be compared to type string, which should not occur. Rather the edge case is that the combination of the in operation with mismatched types needs to be handled, most likely in def in_sets, which later performs validation in validate_type.

The dilemma is that If we set correct the type hint for ip fields, the validation we fail as expected for in_set checks, however by field type hint information is already set and will also fail in other locations where this check is relaxed and allows ip / string type comparisons. On the other hand, if we leave type hints as-is, and try to cover this edge case within in_set, the type hint is already

Note: ATM, the only occurance of this within our repos was resolved in 3534.
Note: I observed there are some integrations where fields like destination.ip are type keyword and some are type ip. Look at the integrations-schemas zip to see examples. They are all type ip in the network integration.
Note: To get the error to finally bubble up, we also need to make sure calls to elasticsearch_type_family in integrations doesn't auto convert the type of ip fields to keyword.

Additional Details

The proposed solution is to do two things:

  1. Allow the regular way of validating to occur with no changes to the field type_hint which converts ip's to keywords.
  2. Within the in_set validation, pull the original field value from the schema, and get its actual type which is a type ip. Then use the ip type_hint to validate the fields within the in_set.
  3. Normalize the kql_schema to just schema
  4. Ensure the integration schemas built dont modify the original field type.

I've attached git diff's to help with the implementation.

Detection Rules Git Diff

diff --git a/detection_rules/ecs.py b/detection_rules/ecs.py
index 1418c9647..878e03175 100644
--- a/detection_rules/ecs.py
+++ b/detection_rules/ecs.py
@@ -176,19 +176,24 @@ class KqlSchema2Eql(eql.Schema):
     }
 
     def __init__(self, kql_schema):
-        self.kql_schema = kql_schema
         eql.Schema.__init__(self, {}, allow_any=True, allow_generic=False, allow_missing=False)
+        self.schema = kql_schema
+
 
     def validate_event_type(self, event_type):
         # allow all event types to fill in X:
         #   `X` where ....
         return True
 
+
+    def flatten(self):
+        pass
+
     def get_event_type_hint(self, event_type, path):
         from kql.parser import elasticsearch_type_family
 
         dotted = ".".join(path)
-        elasticsearch_type = self.kql_schema.get(dotted)
+        elasticsearch_type = self.schema.get(dotted)
         es_type_family = elasticsearch_type_family(elasticsearch_type)
         eql_hint = self.type_mapping.get(es_type_family)
 
diff --git a/detection_rules/rule_validators.py b/detection_rules/rule_validators.py
index 6631db83e..eea122ffc 100644
--- a/detection_rules/rule_validators.py
+++ b/detection_rules/rule_validators.py
@@ -214,7 +214,7 @@ class EQLValidator(QueryValidator):
     def text_fields(self, eql_schema: Union[ecs.KqlSchema2Eql, endgame.EndgameSchema]) -> List[str]:
         """Return a list of fields of type text."""
         from kql.parser import elasticsearch_type_family
-        schema = eql_schema.kql_schema if isinstance(eql_schema, ecs.KqlSchema2Eql) else eql_schema.endgame_schema
+        schema = eql_schema.schema if isinstance(eql_schema, ecs.KqlSchema2Eql) else eql_schema.endgame_schema
 
         return [f for f in self.unique_fields if elasticsearch_type_family(schema.get(f)) == 'text']
 

EQL Git Diff

diff --git a/eql/parser.py b/eql/parser.py
index 8df9cf1..52ebc1e 100644
--- a/eql/parser.py
+++ b/eql/parser.py
@@ -808,6 +808,19 @@ class LarkToEQL(Interpreter):
             if not inner.validate_type(outer):
                 raise self._type_error(inner, outer, error_message)
 
+        if self._elasticsearch_syntax and hasattr(outer, "type_info"):
+            # Check edge case of in_set and ip/string comparison
+            outer_type = outer.type_info
+            type_hint = self._schema.schema.get(str(outer.node), "unknown")
+            if hasattr(self._schema, "type_mapping") and type_hint == "ip":
+                outer.type_info = TypeHint.IP
+                for inner in container:
+                    if not inner.validate_type(outer):
+                        raise self._type_error(inner, outer, error_message)
+
+            # reset the type
+            outer.type_info = outer_type
+
         # This will always evaluate to true/false, so it should be a boolean
         term = ast.InSet(outer.node, [c.node for c in container])
         nullable = outer.nullable or any(c.nullable for c in container)
diff --git a/eql/types.py b/eql/types.py
index 07d6af2..3f09646 100644
--- a/eql/types.py
+++ b/eql/types.py
@@ -7,6 +7,7 @@ class TypeHint(Enum):
 
     Array = "array"
     Boolean = "boolean"
+    IP = "ip"
     Numeric = "number"
     Null = "null"
     Object = "object"

@Mikaayenson Mikaayenson added the bug Something isn't working label Mar 26, 2024
@eric-forte-elastic eric-forte-elastic self-assigned this Mar 26, 2024
@brokensound77
Copy link
Contributor

does elasticsearch (or endpoint) support IP_TYPE for in?

@Mikaayenson
Copy link
Contributor Author

@brokensound77 Here's the original thread which I forgot to add. Elasticsearch supports IP_TYPE. Believe the actual is just that the types of the key:value need to match. all IP_TYPE or all keywords etc.

@Mikaayenson
Copy link
Contributor Author

Speaking with @brokensound77 lets try to create an eql.Types.IP = str and check the type name in InSet

@Mikaayenson
Copy link
Contributor Author

As an alternative to subclassing and overriding all the methods from eql, we could monkey patch the in_set function where the original patch suggestions existed.

Here's an example:

in_set Patch

diff --git a/detection_rules/rule_validators.py b/detection_rules/rule_validators.py
index 6631db83e..61a785a81 100644
--- a/detection_rules/rule_validators.py
+++ b/detection_rules/rule_validators.py
@@ -4,7 +4,7 @@
 # 2.0.
 
 """Validation logic for rules containing queries."""
-from functools import cached_property
+from functools import cached_property, wraps
 from typing import List, Optional, Tuple, Union
 
 import eql
@@ -21,6 +21,9 @@ from .rule import (EQLRuleData, QueryRuleData, QueryValidator, RuleMeta,
                    TOMLRuleContents, set_eql_config)
 from .schemas import get_stack_schemas
 
+from eql.parser import _parse as base_parse, LarkToEQL, NodeInfo, TypeHint
+from eql import ast
+
 EQL_ERROR_TYPES = Union[eql.EqlCompileError,
                         eql.EqlError,
                         eql.EqlParseError,
@@ -31,6 +34,42 @@ EQL_ERROR_TYPES = Union[eql.EqlCompileError,
 KQL_ERROR_TYPES = Union[kql.KqlCompileError, kql.KqlParseError]
 
 
+BaseInSetMethod = LarkToEQL.in_set  # preserve the original method for later use
+
+def custom_in_set(self, node):
+    """Override and address the limitations of the eql in_set method."""
+    print("custom_in_set")
+    # return BaseInSetMethod(self, node)
+    outer, container = self.visit(node.child_trees)  # type: (NodeInfo, list[NodeInfo])
+
+    if not outer.validate_type(TypeHint.primitives()):
+        # can't compare non-primitives to sets
+        raise self._type_error(outer, TypeHint.primitives())
+
+    # Check that everything inside the container has the same type as outside
+    error_message = "Unable to compare {expected_type} to {actual_type}"
+    for inner in container:
+        if not inner.validate_type(outer):
+            raise self._type_error(inner, outer, error_message)
+
+    # This will always evaluate to true/false, so it should be a boolean
+    term = ast.InSet(outer.node, [c.node for c in container])
+    nullable = outer.nullable or any(c.nullable for c in container)
+    return NodeInfo(term, TypeHint.Boolean, nullable=nullable, source=node)
+
+def custom_base_parse_decorator(func):
+    """Override and address the limitations of the eql in_set method."""
+    @wraps(func)
+    def wrapper(query, start=None, **kwargs):
+        LarkToEQL.in_set = custom_in_set
+        print("custom_base_parse_decorator")
+        result = func(query, start=start, **kwargs)
+        LarkToEQL.in_set = BaseInSetMethod
+        return result
+    return wrapper
+
+eql.parser._parse = custom_base_parse_decorator(base_parse)
+
 class KQLValidator(QueryValidator):
     """Specific fields for KQL query event types."""
 

This is much smaller of a change.

Note: There are still other things to do:

  1. Make the change in the new in_set method to fix the issue.
  2. Probably clean up the names and how its written.
  3. May not want this in rule_validators. Not sure where the best place is if not here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants