Skip to content

[Ingest][convert-processor] Added MAC type in convert processor #96183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

vinit-chauhan
Copy link

This PR here is to add a new mac type in the convert processor. It works similarly to an IP processor which would assert the format of the MAC Address.

It is seen that Elasticsearch encourages users to store the MAC Address in the RFC 7042 format Here. This PR will add the capability to assert and convert the MAC Address to the suggested format and throws an error if MAC Address is invalid.

@github-actions
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added v8.9.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels May 16, 2023
@pxsalehi pxsalehi added :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed needs:triage Requires assignment of a team area label labels Jun 2, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Jun 2, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@joegallo
Copy link
Contributor

joegallo commented Jun 9, 2023

Thank you for the pull request, @vinit-chauhan!

I did a bit of history spelunking, and the reason there's an ip type for convert at all is a little bit of a journey. #36145 is the original issue for it, and #36145 (comment) in particular is illuminating. #69989 was the pull request that added the functionality.

Long story short, it was added as a convenience, yes, but more importantly it was added for feature parity with the pre-existing ip type on the beats version of convert. That's also the source of our slightly niche behavior here, specifically, that we're only validating the format of the ip address string and not actually doing a data-type conversion (i.e. to a proper java.net.InetAddress).

Another consideration is that Elasticsearch has a proper ip mapping type, but there's no such support for mac addresses in Elasticsearch (they'd just be strings, I suppose).

With the history/origin of the ip type for convert processors and the lack of mac address support elsewhere in Elasticsearch in mind, I don't think this is an enhancement we want to consider at this time.


On the bright side, that doesn't mean you can't validate whether Strings look like mac addresses in an ingest pipeline! Here's a script processor that's a reasonably faithful translation of your code:

PUT _ingest/pipeline/test-pipeline-1
{
  "processors" : [
    {
      "script": {
        "description": "Validate that the field 'foo' is a mac address, converting the format slightly along the way",
        "lang": "painless",
        "source": """
        def val = ctx.foo;
        def sb = new StringBuilder();
        val = val.toUpperCase();
        val = val.replaceAll(/[^a-fA-F0-9]/, m -> "");
        if ((val.length() == 12 || val.length() == 16) == false) {
          throw new IllegalArgumentException('[' + val + '] is not a valid MAC Address');
        }
        
        for (int i = 0; i < val.length(); i++) {
          if (i != 0 && i % 2 == 0) {
            sb.append('-');
          }
          sb.append(val.charAt(i));
        }
        
        ctx.foo = sb.toString();
        """
      }
    }
  ]
}

invoking it like this:

POST /_ingest/pipeline/test-pipeline-1/_simulate
{
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "foo": "0123456789ab"
      }
    }
  ]
}

gives the following result:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "index",
        "_id" : "id",
        "_version" : "-3",
        "_source" : {
          "foo" : "01-23-45-67-89-AB"
        },
        "_ingest" : {
          "timestamp" : "2023-06-09T19:44:56.809747Z"
        }
      }
    }
  ]
}

@joegallo joegallo closed this Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management Meta label for data/management team v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants