Skip to content

Deleted output files consuming filesystem space #81

Open
@bpschuck

Description

@bpschuck

If an output file is deleted (in our case by being gzip'ed), Logstash continues to hold the file open. Over time these zombie files fill up the filesystem space until Logstash is restarted.

  • Version: 5.5.1
  • Operating System: Ubuntu 16.04 (xenial)
  • Config File (if you have sensitive info, please remove it):
    Input is a group of Kafka queues/topics.
    Output file defined as
    path => "${OUTPUTDIR}/%{kf_topic}/%{table_name}-%{kf_topic}-%{logstash_host}-%{+YYYY-MM-dd-HH}.json"
  • Steps to Reproduce:
    A cron a job is executed at 04 minutes pasty every hour. In that job, for each file that is not from the current hour, the file is moved to a filename with the current date appended (YYmmdd_HHMMSS), compressed usinf Gzip, then transferred to Amazon S3 storage. The Kafka queues are log data from devices, and each hourly output file can get quite large. Over time, if Logstash is not restarted, the system will believe the file system is filling up due to many deleted files still held open by the Logstash process. After deleting files, execute lsof -nP -p $(pgrep -d , java) | grep '(deleted)' to view them. A df will show filesystem space being utilized, yet the output of du will not include the sizes of these files. Restart Logstash and the output of the lsof command will now show zero deleted files, and the output of df will no longer report the utilization of the zombie files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions