Lost data on rotation #151

PhaedrusTheGreek · 2017-01-10T16:36:52Z

Version: LS 5.1.1
Operating System: RHEL 7.3
Steps to Reproduce:

Create this file /etc/logrotate.d/test:

rotate 4
/var/log/test
{
        missingok
        size 1k
        notifempty
        sharedscripts
}

Create script.sh to generate some data & rotate it

for (( i=0 ; i <= 320000 ; i++ ))
do
 echo "$i this is a bunch of test data blah blah"  >> /var/log/test

 #slow it down a little
 if ! ((i % 10000)); then
  sleep 1
 fi

 if ! ((i % 100000)); then
   /usr/sbin/logrotate /etc/logrotate.d/test &
 fi
done

touch /var/log/test
Start Logstash with the following config.

input {
    file {
           path => "/var/log/test"
           sincedb_path => "/dev/null"
           start_position => "beginning"
           ignore_older => 0
           close_older => 5
           discover_interval => 1
    }
}

output {
 file {
   path => "/tmp/output"
   codec =>  line { format => "custom format: %{message}" }
 }
}

Once logstash is started - start script.sh, and wait for it to finish
compare size of logs with ingested data:

[root]# cat test* | wc -l
320001
[root]# cat /tmp/output | wc -l
318503

*** note that Filebeat tested without data loss ***

filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/test
output.file:
  path: "/tmp/filebeat"
  filename: filebeat
  number_of_files: 500

The text was updated successfully, but these errors were encountered:

LionelCons · 2017-06-29T06:18:44Z

Any news/progress on this one?

jordansissel · 2017-06-29T07:38:17Z

This is likely due to the fact that Logstash will only look for new files when it reaches the end of the current file.

In this scenario, the script is writing 100,000 records before rotation. On my machine, this is taking roughly 11 seconds to between each logrotation (there's a sleep in your script every 10,000, so this makes sense)

This means Logstash needs to process 10,000 lines per second at least in order to keep up. If it cannot keep up, the new file checks will stall.

When I tested this, I noticed that Logstash would miss one of the rotations probably because it was running behind.

Testing the script to demonstrate this. Logstash had log.level:debug, and I see this:

% tail -f logs/logstash-plain.log| grep 'old inode'
[2017-06-28T18:20:12,998][DEBUG][logstash.inputs.file     ] each: new inode: /tmp/test.log: old inode was ["1452461", 0, 36], new is ["1452470", 0, 36]
[2017-06-28T18:20:31,506][DEBUG][logstash.inputs.file     ] each:

Only two new rotations were observed in the above run.

inodes for the logs:

% ls -i1 /tmp/test.log*
1452472 /tmp/test.log
1452471 /tmp/test.log.1
1452470 /tmp/test.log.2
1451500 /tmp/test.log.3
1452461 /tmp/test.log.4

The first entry in our log was this:

old inode was ["1452461", 0, 36], new is ["1452470", 0, 36]

This "old" is test.log.4, and the "new" is test.log.2. You can see test.log.3 was missed here.

The second entry:

old inode was ["1452470", 0, 36], new is ["1452472", 0, 36]

The old here is test.log.2, and the new is test.log. test.log.1 was missed.

If we ask Logstash about how fast it's going using the stats api, we can see it processing (at least on my laptop here, the test environment) slower than the logs are being emitted into the log file by script.sh

% while true; do curl -s localhost:9600/_node/stats?pretty | jq .pipeline.events.out; sleep 1; done | awk 'a { print $1 - a } { a = $1 }'
8413
9142
5127
4554
9436
9127
8927
9056
9065
8434

The above numbers are roughly events-per-second. In the script.sh which writes 10,000 events per second, and rotates every 10 iterations (100,000 events), Logstash is not keeping up fast enough on my workstation and misses rotations because of the way the file input is implemented.

We can try to improve the implementation problem (file discovery happens after file reads), but ultimately, Logstash is processing slower than you are producing events. Even if we make the file discovery handled outside of the main file input thread, Logstash is still consuming slower than you are processing, which means that your logrotation will not really be saving you space, because while you might logrotate to delete test.log.4, Logstash will still be reading from it (making it not deleted) until it's hit EOF and the close_after timeout has occurred. This may or may not be a problem for you depending on how far behind Logstash is.

I would note that Logstash should be able to read files faster than this, but ultimately we cannot, in software, behave well in a scenario where: Logstash is consuming at a lower rate than the data is being produced, and the data being produced is being deleted quickly -- there will always be data loss in these scenarios or undesirable behavior (deleted files still occupying disk space because Logstash has them open with the intent to read them fully)

jordansissel · 2017-06-29T07:39:16Z

Any news/progress on this one?

@LionelCons I need more details if you want a good answer. I don't know what progress you are asking about. If you believe you are encountering this issue, providing details on what you observe and also your configuration(s) would be wonderful in helping identify if your issue is the same as the one reported in this ticket.

jordansissel · 2017-06-29T07:41:11Z

note that Filebeat tested without data loss

Filebeat probably doesn't lose data here because it has file discovery ("prospecting") executing separately (in a coroutine or thread) from file reading ("harvesting"). This allows it (as I say in my comment above) to find new files during processing of existing known files.

It would still have the problem I described that a prospected-but-not-harvested file will still occupy disk space after logrotate has deleted it -- until filebeat finishes reading it and closes the file as idle.

LionelCons · 2017-06-29T08:01:45Z

It seems that my problem (incomplete lines read from Kafka logs) most likely comes inodes being re-used.

This seems to be a known issue: jordansissel/ruby-filewatch#79.

@jordansissel: Could you comment on this PR from your ruby-filewatch?

Tsukiand mentioned this issue Sep 10, 2018

SinceDB not updated properly with logstash-input-file(4.1.3) #207

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lost data on rotation #151

Lost data on rotation #151

PhaedrusTheGreek commented Jan 10, 2017

LionelCons commented Jun 29, 2017

jordansissel commented Jun 29, 2017

jordansissel commented Jun 29, 2017

jordansissel commented Jun 29, 2017 •

edited

Loading

LionelCons commented Jun 29, 2017

Lost data on rotation #151

Lost data on rotation #151

Comments

PhaedrusTheGreek commented Jan 10, 2017

LionelCons commented Jun 29, 2017

jordansissel commented Jun 29, 2017

jordansissel commented Jun 29, 2017

jordansissel commented Jun 29, 2017 • edited Loading

LionelCons commented Jun 29, 2017

jordansissel commented Jun 29, 2017 •

edited

Loading