Top level domain extract ingest processor in the default and final pipeline #65722
Labels
:Data Management/Ingest Node
Execution or management of Ingest Pipelines including GeoIP
>enhancement
Hi,
Security use cases require operators to always find logs when searching for a domain during incident response. However, in some feeds like webproxy the domain may be prefixed with "www.", whereas in others like dns this isn't the case. As domain.name or dns.question.name is an extract match keyword field, operators need to do multiple searches to make sure they find all the hits.
Packetbeat does top level domain extract to set the following ECS fields. This is more difficult in other non-beats feeds which get ingested via logstash or a custom importer.
There is a logstash plugin but it's a bit of a mess. As noted in this git ticket the documentation looks like it's for a different plugin.
logstash-plugins/logstash-filter-tld#11
https://www.elastic.co/guide/en/logstash/current/plugins-filters-tld.html#plugins-filters-tld-periodic_flush
It would be good if there was an ingest processor which could do TLD extract in the default or final pipeline. Doing this will make it a lot easier to normalise domains in non-beats feeds, and allow operators to find all hits through the *.registered_domain fields.
The processor would probably have to use the public suffix list to do TLD extract.
https://publicsuffix.org/
The rules for doing TLD extract with the public suffix list can be found here
https://publicsuffix.org/list/
The text was updated successfully, but these errors were encountered: