"Do something" when the file is done being read #52

talevy · 2015-06-16T17:19:36Z

Either rename the file when done processing, or add ability to specify a path for them to be moved to.

(possible options)

jsvd · 2015-06-16T17:21:35Z

what is the criteria to consider a file to have been fully processed? isn't it normal to be hitting EOF constantly as the file is being appended to?

jordansissel · 2015-06-16T17:21:41Z

"done processing" currently isn't in the vocabulary of the file input. Files, today, are treated as data streams that live forever.

We'll have to figure out what "done" really means. I also don't necessarily want to turn Logstash into a log rotator, since that's less about "inputting files" and more about "managing log file lifecycles" - I'm open to discussion, though. :)

jordansissel · 2015-06-16T17:23:36Z

I will say, though, that many users request this kind of feature, I think there may be similar tickets elsewhere asking for things like:

"logstash should exit when it finishes processing files"
"logstash should delete files when done with them"
"logstash should close files when done with them"
"logstash should {some task} when done reading files"

We currently have no definition for "done" on an infinte stream. I wonder if the behavioral differences will make us need a new plugin to handle things like "read this file once and delete it when done"-style of things.

dev-head · 2015-07-02T23:02:48Z

perhaps done can be a set of configurable options that allow the users to specify when something is considered done.

Using the below config, the file is "done" when the last event received in the watched file is older than ten minutes and when it is "done", execute /my_script.sh and maybe pass the file path to the script automatically. This could open up some interesting new use cases.

done_when: 'last_event'
done_by: '10 minutes'
done_do: '/my_script.sh'

yehosef · 2015-07-07T07:06:49Z

Instead of looking and when the events are received - you can define a "done" timeframe for a log file and just look at the modified timestamp. Meaning let's say I'm starting to process a dir with lots of logs, if I have a log file that wasn't modified for 10 days, once I hit the EOF, I can assume I'm done - I shouldn't have to wait.

There may also need to be other rules. eg, if I have a logs being rotated, once I finish an old log I could mark it as done after 20 minutes of inactivity - because there is another file receiving the current writes. But if there is only one file, then even after 20 minutes of inactivity, I may not want to mark it as "done" and process it.

morallo · 2015-07-16T19:57:36Z

In my case, I want to process a lot of 1 line JSON reports that are totally static, and just have logstash monitor a folder to catch new files.

What about an option "static" => true to specify when the files are created and won't change over time? When this option is enabled, you can consider the processing done when you reach EOF.

dev-head · 2015-07-16T22:34:23Z

@morallo

file size might be a concern though, on a large file Logstash might get to eof before that write really finishes.

MarkusMayer · 2015-07-17T05:22:23Z

@morallo @dev-head we have a similar thing here. Watching a relatively large directory structure where our app is continuously creating files (500k/day). Each file is written only once and is between 2k and ~1M in size. At the moment logstash isn't working stable in that scenario (at least in our setup under windows that is) which is (I assume) due to the fact that it has to keep watching all the files at the same time...
If you could for instance have something like @morallo 's static parameter and logstash would only start ingesting files older than X seconds, or with modified timestamp older than X seconds,... this would be really cool for our use case.

morallo · 2015-07-18T05:13:26Z

What about static and timeout parameters. Consider it done when you reach EOF if file size didn't change and no new events for timeout seconds.

jsvd · 2015-09-08T15:48:28Z

@MarkusMayer what I did at my old job was having logrotate rename old files to .old after 1 day, so I set up file input as:

file { path => "/srv/data/*.log" }

And then had files like /srv/data/20150905.log which where being monitored by logstash, and rotated ones like /srv/data/20150901.log.old which were not.

By removing the file from logstash's watch, the resources associated with it will be freed, only the inode record will persist in sincedb.

magnusbaeck · 2015-09-08T16:02:16Z

By removing the file from logstash's watch, the resources associated with it will be freed, only the inode record will persist in sincedb.

Won't filewatch in fact either

notice that /srv/data/20150905.log no longer exists and delete the sincedb entry, or
notice that the inode number of /srv/data/20150905.log doesn't match the one in sincedb and delete and recreate the entry with the current inode number?

jsvd · 2015-09-08T16:07:16Z

delete should close the fd and remove it from some of the datastructures, but not the sincedb https://github.com/jordansissel/ruby-filewatch/blob/master/lib/filewatch/tail.rb#L95-L102

I don't understand the second hypothesis, to better fit @MarkusMayer scenario, /srv/data would have tons of
/srv/data/20150905.#{random_id}.log (multiple files a day) and rotated /srv/data/20150905.#{random_id}.log.old that don't match the file { path => "/srv/data/*.log" }

magnusbaeck · 2015-09-08T16:21:14Z

Ah, you're right! I only read the producing end in watch.rb and didn't study what actually happens when a delete request is received.

The second bullet applies in the scenario you described (except that, again, the original sincedb entry won't be delete), not in Markus's.

MarkusMayer · 2015-09-09T05:30:57Z

@jsvd @magnusbaeck thanks for your feedback and your idea jsvd. When I came across my scenario ingesting the files continously with the file plugin did not work (it just kept stopping after some time and refused to ingest new files). However at that time I used 1.5.rc2 (filed any issue elastic/logstash#2882 which apparently got solved). To my shame I have to admit that I never got around to retest it with a current release. After reading a bit on how the file plugin works I just thought that my scenario isn't what the the thing was intended to be used for and followed a completely different path. Still using file input for our other "regular" log files though.

syepes · 2015-10-05T22:18:29Z

Absolutely, one of those key missing features.

jamesblackburn · 2015-10-26T16:06:59Z

Exiting when 'done' is also useful for end-to-end testing of a large logstash config. It would be nice to start it up on a directory of canned data and assert the output is as we expect.

magnusbaeck · 2015-10-26T18:07:28Z

Exiting when 'done' is also useful for end-to-end testing of a large logstash config. It would be nice to start it up on a directory of canned data and assert the output is as we expect.

True, but wouldn't the stdin input be pretty useful for that already?

(Within a week or two I hope to open source a tool to assist with exactly that, feeding Logstash canned data and asserting that we get the expected results.)

jamesblackburn · 2015-10-26T18:14:52Z

I've used stdin for now, there are a few issues:

Have to frig the type or tags differently for different input (for logstash config that exepects it)
Fix for config that expects a particular %{path}

I've written a few lines of Python to drive logstash over a directory of logs and assert the json output is as we would expect:
https://gist.github.com/jamesblackburn/2e895f8b843011709094

psaiz · 2015-11-10T19:43:32Z

I have the same issue described here. I used to have logstash to get the data from a file, and logrotate to handle the renaming/removal. Then, I ran into troubles if, for whatever reason, logstash dies or gets too slow. Logrotate would continue to happily rotate the input files, and that turned into events being lost.

It would be great to have something like what dev_head suggested, so that when logstash finishes consuming a file, we could do an action.

mlsquires · 2015-12-10T19:35:16Z

My use case is the same as other people have described - for logstash configuration testing I want to process one or more specific static files - I know ahead of time that they won't be open-ended streams.

magnusbaeck · 2015-12-10T19:57:58Z

I wonder, instead of making the file output capable of doing a million things upon hitting EOF, should it instead emit a separate event of a particular type? Then we could use the existing arsenal of Logstash plugins to act upon that event and e.g. delete the file. In fact, we could emit events for non-EOF progress too, allowing progress feedback without having to monitor the sincedb files and correlate them to files via inode numbers. This kind of out of band metaevents could probably apply to other plugins too.

(Within a week or two I hope to open source a tool to assist with exactly that, feeding Logstash canned data and asserting that we get the expected results.)

This tool is now available here: https://github.com/magnusbaeck/logstash-filter-verifier

guyboertje · 2015-12-16T12:04:00Z

I think this feature belongs to the Batch File Processing requirement.

yehosef · 2016-01-25T08:22:55Z

I just commented on that tickets - #48 (comment)
Where I explain why this should not be in a different plugin (IMO).

I think the batch use case is a simple "when you hit the EOF, do something". We don't need time limits or anything more fancy - it can be as simple as "eof_script" which points to a bash script to run (passing in the filename) when the eof if the file is hit. You could cover other common simple use cases, delete file, emit event, etc. The script is a catchall that could cover any other need and would be easy to implement (it seems).

The advantage of keeping it in the file plugin is that you don't have to duplicated logic (file types, multiple lines, keep track of read location - even for single file need to keep the current pointer in case logstash dies..)

I haven't considered using logstash in a long time because I wrote my own script for handling batch files. I had a need that was a little outside of my use case so I thought I'd look is Logstash 2 had fixed this.

rodgermoore · 2016-02-24T14:37:15Z

jordansissel commented on Jun 16, 2015

I will say, though, that many users request this kind of feature, I think there may be similar tickets elsewhere asking for things like:

"logstash should exit when it finishes processing files"

"logstash should delete files when done with them"

"logstash should close files when done with them"

"logstash should {some task} when done reading files"

+1 for this!

Desperately waiting for this kind of functionality. For me "logstash should exit when it finishes processing files" is most valuable.

Don't get me wrong, without these features Logstash still is a kick-log tool 😄

suyograo · 2016-04-26T17:04:20Z

This will be implemented as part of #48 as a new plugin

jordansissel mentioned this issue Jul 7, 2015

Ability to run action after log file process (improvement) elastic/logstash#3377

Closed

jordansissel changed the title ~~Mark files as read when done processing~~ "Do something" when the file is done being read Jul 7, 2015

suyograo closed this as completed Apr 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Do something" when the file is done being read #52

"Do something" when the file is done being read #52

talevy commented Jun 16, 2015

jsvd commented Jun 16, 2015

jordansissel commented Jun 16, 2015

jordansissel commented Jun 16, 2015

dev-head commented Jul 2, 2015

yehosef commented Jul 7, 2015

morallo commented Jul 16, 2015

dev-head commented Jul 16, 2015

MarkusMayer commented Jul 17, 2015

morallo commented Jul 18, 2015

jsvd commented Sep 8, 2015

magnusbaeck commented Sep 8, 2015

jsvd commented Sep 8, 2015

magnusbaeck commented Sep 8, 2015

MarkusMayer commented Sep 9, 2015

syepes commented Oct 5, 2015

jamesblackburn commented Oct 26, 2015

magnusbaeck commented Oct 26, 2015

jamesblackburn commented Oct 26, 2015

psaiz commented Nov 10, 2015

mlsquires commented Dec 10, 2015

magnusbaeck commented Dec 10, 2015

guyboertje commented Dec 16, 2015

yehosef commented Jan 25, 2016

rodgermoore commented Feb 24, 2016

suyograo commented Apr 26, 2016 •

edited

Loading

"Do something" when the file is done being read #52

"Do something" when the file is done being read #52

Comments

talevy commented Jun 16, 2015

jsvd commented Jun 16, 2015

jordansissel commented Jun 16, 2015

jordansissel commented Jun 16, 2015

dev-head commented Jul 2, 2015

yehosef commented Jul 7, 2015

morallo commented Jul 16, 2015

dev-head commented Jul 16, 2015

MarkusMayer commented Jul 17, 2015

morallo commented Jul 18, 2015

jsvd commented Sep 8, 2015

magnusbaeck commented Sep 8, 2015

jsvd commented Sep 8, 2015

magnusbaeck commented Sep 8, 2015

MarkusMayer commented Sep 9, 2015

syepes commented Oct 5, 2015

jamesblackburn commented Oct 26, 2015

magnusbaeck commented Oct 26, 2015

jamesblackburn commented Oct 26, 2015

psaiz commented Nov 10, 2015

mlsquires commented Dec 10, 2015

magnusbaeck commented Dec 10, 2015

guyboertje commented Dec 16, 2015

yehosef commented Jan 25, 2016

rodgermoore commented Feb 24, 2016

suyograo commented Apr 26, 2016 • edited Loading

suyograo commented Apr 26, 2016 •

edited

Loading