unable to read the whole file when pipeline get reload #290

kaisecheng · 2021-04-14T21:01:26Z

When Logstash start with --config.reload.automatic, the file input can ingest all data without any reload
However, if pipeline got reload in the middle of ingestion, let's say have already read 300 out of 600 lines, Logstash read the first 300 lines again and leave the rest unread.

Version: 4.2.4
LS Version: 7.12
Operating System: macOS
Config File (if you have sensitive info, please remove it):

- pipeline.id: SDH_650
  pipeline.workers: 1
  pipeline.batch.size: 5
  config.string: |
    input {
        file {
            path => "/650/merged.csv"
            mode  => "read"
            start_position => "beginning"
        }
    }

    filter {
        csv {
            separator => ","
            columns => ["id", "host", "fqdn", "IP", "mac", "role", "type", "make", "model", "oid", "fid", "time"]
            remove_field => ["path", "host", "message", "@version" ]   
        }
    }

    output {
        elasticsearch { index => "650" }
        stdout { codec => rubydebug }
    }

Sample Data:

"464783b9468bed39b19aff0c98128af4f26c3b972092cb26ede33b28ace57bad","aff4.bc","aff4.bc.org","127.0.0.1","cb:91:bc:28:3b:be","MOBILE DEVICE","TABLET","make","model","DHS","","2000-03-09 02:36:17.154791"
"464783b9468bed39b19aff0c98128af4f26c3b972092cb26ede33b28ace57bad","aff4.bc","aff4.bc.org","127.0.0.1","cb:91:bc:28:3b:be","MOBILE DEVICE","TABLET","make","model","DHS","","2000-03-10 02:36:17.154791"
"464783b9468bed39b19aff0c98128af4f26c3b972092cb26ede33b28ace57bad","aff4.bc","aff4.bc.org","127.0.0.1","cb:91:bc:28:3b:be","MOBILE DEVICE","TABLET","make","model","DHS","","2000-03-11 02:36:17.154791"

Steps to Reproduce:

run the pipeline in 7.12 with auto-reload > bin/logstash --config.reload.automatic
change the pipeline.workers from 1 to 2 during ingestion
change the pipeline.workers multiple times during ingestion
check data in elasticsearch. You will find duplication of the head of csv, while the tail of csv is missing

Currently the workaround is use tail mode

The text was updated successfully, but these errors were encountered:

roaksoax added the int-shortlist label Apr 26, 2021

andsel self-assigned this Mar 30, 2022

andsel mentioned this issue Apr 1, 2022

Fix ReadFile handler to consider the value stored in sincedb on plugin restart #307

Merged

4 tasks

andsel closed this as completed in #307 Jun 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to read the whole file when pipeline get reload #290

unable to read the whole file when pipeline get reload #290

kaisecheng commented Apr 14, 2021 •

edited

Loading

unable to read the whole file when pipeline get reload #290

unable to read the whole file when pipeline get reload #290

Comments

kaisecheng commented Apr 14, 2021 • edited Loading

kaisecheng commented Apr 14, 2021 •

edited

Loading