Skip to content

unable to read the whole file when pipeline get reload #290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kaisecheng opened this issue Apr 14, 2021 · 0 comments · Fixed by #307
Closed

unable to read the whole file when pipeline get reload #290

kaisecheng opened this issue Apr 14, 2021 · 0 comments · Fixed by #307
Assignees

Comments

@kaisecheng
Copy link

kaisecheng commented Apr 14, 2021

When Logstash start with --config.reload.automatic, the file input can ingest all data without any reload
However, if pipeline got reload in the middle of ingestion, let's say have already read 300 out of 600 lines, Logstash read the first 300 lines again and leave the rest unread.

  • Version: 4.2.4
  • LS Version: 7.12
  • Operating System: macOS
  • Config File (if you have sensitive info, please remove it):
- pipeline.id: SDH_650
  pipeline.workers: 1
  pipeline.batch.size: 5
  config.string: |
    input {
        file {
            path => "/650/merged.csv"
            mode  => "read"
            start_position => "beginning"
        }
    }

    filter {
        csv {
            separator => ","
            columns => ["id", "host", "fqdn", "IP", "mac", "role", "type", "make", "model", "oid", "fid", "time"]
            remove_field => ["path", "host", "message", "@version" ]   
        }
    }

    output {
        elasticsearch { index => "650" }
        stdout { codec => rubydebug }
    }
  • Sample Data:
"464783b9468bed39b19aff0c98128af4f26c3b972092cb26ede33b28ace57bad","aff4.bc","aff4.bc.org","127.0.0.1","cb:91:bc:28:3b:be","MOBILE DEVICE","TABLET","make","model","DHS","","2000-03-09 02:36:17.154791"
"464783b9468bed39b19aff0c98128af4f26c3b972092cb26ede33b28ace57bad","aff4.bc","aff4.bc.org","127.0.0.1","cb:91:bc:28:3b:be","MOBILE DEVICE","TABLET","make","model","DHS","","2000-03-10 02:36:17.154791"
"464783b9468bed39b19aff0c98128af4f26c3b972092cb26ede33b28ace57bad","aff4.bc","aff4.bc.org","127.0.0.1","cb:91:bc:28:3b:be","MOBILE DEVICE","TABLET","make","model","DHS","","2000-03-11 02:36:17.154791"
  • Steps to Reproduce:
  1. run the pipeline in 7.12 with auto-reload > bin/logstash --config.reload.automatic
  2. change the pipeline.workers from 1 to 2 during ingestion
  3. change the pipeline.workers multiple times during ingestion
  4. check data in elasticsearch. You will find duplication of the head of csv, while the tail of csv is missing

Currently the workaround is use tail mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants