Skip to content

Add better support for file rotation #192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jul 5, 2018

Conversation

guyboertje
Copy link
Contributor

@guyboertje guyboertje commented May 24, 2018

TIL The difference in Ruby between IO#stat and IO.stat, former takes a stat on the io (file) object and the latter takes a stat on a path string (as does File::Stat.new and Pathname#stat).

When an open file is rotated the watched_file path may be pointing at a different inode with new content, and if so, the opened file is still pointing to the original inode and content. We have used IO.stat or File::Stat.new (since 2012) on the path regardless of whether the file is open or closed - this is not ideal.

This PR adds the ability to do path based stat on closed files and path plus io based stat on open files. By incrementing pending_inode_count when we detect inode changes between the two kinds of stat calls and we also cache the previous size of the old content and the new content.
When the inode is different between the two stat structures we know that the file is rotated. When we detect this we can fully read the old content as we still have an open file handle. At the end of the full read we close the file io and reset the watched_file and sincedb.
NOTE: we do detect inode changes for watched_files that are not active (i.e. closed, ignored, watched) but we can't know the renamed path of the old content in order to read any content we may not have processed while the watched_file was not active. A WARN message is logged in this case. In practice this only likely to occur on not active files when tailing more than 4095 (or max_open_files if set) files AND the file is rotated before active files can be put into the closed state - files become eligible for close when the last read was more thanclose_older ago AND the file is fully read to the last stat size.
Handle race conditions better:

  1. stat is taken bang in the middle of the rotation rename cascade, f -> f.1 | stat | tmp -> f
  2. Discover finds the renamed path before the delete reaction can be done (this only occurs when the renamed path is inside the path glob pattern- rare).

Most of this code has been tested by a discuss user who had problems with rotations.

Fixes #188

@guyboertje
Copy link
Contributor Author

I am not sure about Windows and io.stat. I need to test.

@guyboertje
Copy link
Contributor Author

@colinsurprenant
Please suspend review, half of this code is useless under Windows.
I need to circle back. 🤠 🔴 🚶

new_discovery = true
watched_file = WatchedFile.new(pathname, pathname.stat, @settings)
logger.debug("Discoverer new discovery: #{watched_file.path}, inode: #{watched_file.sincedb_key.inode}, discover_files: #{path} (exclude is #{@exclude.inspect})")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: you are using the structured logs notations in most places, maybe here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to change the logging to structured but its a PR all of its own.

@colinsurprenant
Copy link
Contributor

I just did a shallow review and changes LGTM. I haven't run the tests nor tried to more deeply analyse because there are lots of changes and complexity is high. I will trust your testing and spec coverage.
Just a really minor comment about the debug logging format.

@colinsurprenant
Copy link
Contributor

@guyboertje ah damn ... I did not see your last comment before starting. oh well.

@guyboertje
Copy link
Contributor Author

After some investigation in GCP on a Windows Server 2008 box I now know:

  • Currently we hold the file open for 1 hour (AND the file must be fully read) unless the use specified a different close_older.
  • JRuby reports increases in size when doing a stat on the IO object.
  • Windows does not allow renaming or deleting a file when an open file handle exists - so no rotation is possible until we close the file (we don't have burning fires on this).
  • In Windows the stat struct returned by either path or io based method always has the inode at 0 - this is not good and the reason we have the FFI functions in winhelper.rb.
  • I am looking into a mechanism that can get the windows inode equivalent for a path and io using JRuby jnr. If successful then most of this PR will not need to change.

Guy Boertje added 8 commits June 19, 2018 10:16
to better handle when a file is rotated that has never been active.
Change the way @previous_inode_size is cached, it needs to be set only
when an inode change is detected.
Change discover associate behaviour if the matched watched_file is open.
Add spec for when a file is rotated that has never been active.
Alter specs because we can rely on ordering with "file_sort_by"
especially in the case where the rotated files are discoverable - in
this case we were not the moving the state and the sincedb record
correctly.
Next - abstract the *nix and Windows stat calls and structure into
two classes with the same API.
Move all tail processing state iteration into its own method
Clean up rspec sequencing usage.
Remove all Mutex use in favour of AtomicBoolean, AtomicArray, AtomicHash
Add wait for completely_stopped before removing sincedb in file_tail spec
@guyboertje
Copy link
Contributor Author

guyboertje commented Jun 29, 2018

Arrrrrggghhhhhhh!!!! @£$%$%$
(J)Ruby IO stat

So I backed out the IO stat code. Now travis is green, biggest yay evar!

Now to run the tests on Windows to make sure the windows execution path is not broken.

@colinsurprenant Nearly there.

@guyboertje
Copy link
Contributor Author

@colinsurprenant
Please resume reviewing.

ENV LOGSTASH_SOURCE 1
RUN jruby -S gem install bundler
RUN jruby -S bundle install --jobs=3 --retry=3
RUN jruby -S bundle exec rake vendor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we be using Gradle ? also why is this needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from @jsvd work after the jruby S3 bucket went AWOL on the weekend.
Behind the scenes it does use gradle so I suppose we could run gradle here.
I don't want to touch it now, it works, we can circle back after confirming with Joao about this. It is a pattern we need to apply to all (most) plugins - probably scripted.

Step 11/11 : RUN jruby -S bundle exec rake vendor
 ---> Running in a4c0913880da
Downloading https://services.gradle.org/distributions/gradle-4.5.1-bin.zip
.....................................................................
Unzipping /usr/share/logstash/.gradle/wrapper/dists/gradle-4.5.1-bin/a5vbgfvpwtoqz8v2cdivxz28k/gradle-4.5.1-bin.zip to /usr/share/logstash/.gradle/wrapper/dists/gradle-4.5.1-bin/a5vbgfvpwtoqz8v2cdivxz28k
Set executable permissions for: /usr/share/logstash/.gradle/wrapper/dists/gradle-4.5.1-bin/a5vbgfvpwtoqz8v2cdivxz28k/gradle-4.5.1/bin/gradle
Starting a Gradle Daemon (subsequent builds will be faster)
:cleanGemjar UP-TO-DATE
:clean UP-TO-DATE
:compileJava
Download https://repo.maven.apache.org/maven2/org/jruby/jruby-complete/9.1.13.0/jruby-complete-9.1.13.0.pom
Download https://repo.maven.apache.org/maven2/org/jruby/jruby-artifacts/9.1.13.0/jruby-artifacts-9.1.13.0.pom
Download https://repo.maven.apache.org/maven2/org/jruby/jruby-parent/9.1.13.0/jruby-parent-9.1.13.0.pom
Download https://repo.maven.apache.org/maven2/org/sonatype/oss/oss-parent/7/oss-parent-7.pom
Download https://repo.maven.apache.org/maven2/org/jruby/jruby-complete/9.1.13.0/jruby-complete-9.1.13.0.jar
:processResources NO-SOURCE
:classes
:jar
:sourcesJar
:copyGemjar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I see.
I just created elastic/logstash/issues/9823 - we can move that discussion over there (or open a specific issue about it and link it there)

require "java"
require_relative "../../lib/jars/filewatch-#{jar_version}.jar"
fullpath = gem_root_dir.join("lib/jars/filewatch-#{jar_version}.jar").expand_path.to_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we rely on jar-dependencies to load this jar file? I believe this is what filter-date does to load its own compiles jar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look but TBH jar-dependencies is so opaque/obscure to me that I think it is fine for external deps from maven but I trust the current mechanism more.

@@ -0,0 +1,8 @@
# encoding: utf-8

ENV["LOG_AT"].tap do |level|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this documented somewhere so that future devs know about this if useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. I suppose it should be in devutils but until now I have been reluctant to make changes in that direction.
It is useful in this plugin with --fail-fast and trace logging to see what is going on. Before, the tests might fail in the suite but not when run individually.
I am not sure how useful it is to other plugins that don't have extensive trace logging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure - please don't touch devutils if only for getting rid of it. What I was saying is that since this could be useful to other developers that will work on this plugin and will need to run tests, it could be documented somewhere like a local CONTRIBUTING file or unsure where else but the idea is that if it can be useful to other, they must know about it in some way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand now. I will add something when building the final changelog PR.

@colinsurprenant
Copy link
Contributor

Same disclaimer: for the exception of a few minor comments, given the amount and complexity of the changes this is shallow review LGTM and I will rely on @guyboertje extensive testing.

@guyboertje
Copy link
Contributor Author

Ok. I'm going to squash merge this.

@guyboertje guyboertje merged commit 498eb5e into logstash-plugins:master Jul 5, 2018
@guyboertje guyboertje deleted the fix/rotation branch July 5, 2018 18:05
@fbaligand
Copy link

Great news @guyboertje !
Could you release it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Associating a new watched_file, need to check if old watched_file is open or buffer has txt
3 participants