Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the creation of custom data streams #1152

Open
leandrojmp opened this issue Sep 25, 2023 · 3 comments
Open

Allow the creation of custom data streams #1152

leandrojmp opened this issue Sep 25, 2023 · 3 comments

Comments

@leandrojmp
Copy link

leandrojmp commented Sep 25, 2023

Hello,

Currently the Elasticsearch output in Logstash does not allow the creation of a custom data stream type, if you use the data_stream_* settings of the output it will validate the data_stream_type and it will only allow the following values:

  • logs
  • metrics
  • synthetics
  • traces

All of those types are also used by Elastic Agent and have system managed templates and lifecycle policies, so to use data stream now in logstash you would need to create some template for the type you want but make sure that this template will not override the system templates, this makes things more complex and there is always the risk of human error that would override the templates used by Elastic Agent and break things.

To be able to use custom data streams in logstash you need a trick on the output like the example below:

output {
    elasticsearch {
        hosts => ["HOSTS"]
        index => "data-stream-name"
        action => "create"
        http_compression => true
        data_stream => false
        manage_template => false
        ilm_enabled => false
        cacert => 'ca.crt'
        user => 'USER'
        password => 'PASSWORD'
    }
}

While this works, Logstash should allow the creation of data streams of custom types, which is not possible now.

@yaauie
Copy link
Contributor

yaauie commented Oct 12, 2023

Thoughts on implementation options:

  1. migrate config option's validation to a regexp like \A(?!\.{1,2}$)[[:lower:][:digit:]][[:lower:][:digit:]\._+]{0,252}\Z the successfully rejects known-invalid index prefixes while letting likely-valid ones through (limitation: composed index name length cannot be validated solely from a single component).
  2. add a validator to Validator Support mixin that does the same as (1) more readably/efficiently
  3. validate the composed index name for data stream in #initialize or #register if and only if data_stream is effectively true

@robbavey
Copy link
Contributor

@jsvd Is this an actual issue that we should prioritize/work on, or is there a workaround that we can use?

@leandrojmp
Copy link
Author

Hello @robbavey, just a feedback as a user.

Currently Logstash is only capable of create data streams that follows the elastic naming scheme, <type>-<dataset>-<namespace>, so with data_stream as true, the user is limited to create data stream using one of the available types and also needs to provide a dataset and a namespace.

In my case my data streams follows another naming pattern, I do not use type or namespace, some data streams have a prefix in the name and others are just the dataset name.

So the configuration I shared works because logstash just send the request and the index is created as data stream because it is defined in the index template.

Being honest, just having these steps in the documentation could be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants