Skip to content

refactor(filesystem): fix recursive expand of nodes #957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jul 8, 2023

Conversation

ghostbuster91
Copy link
Contributor

@ghostbuster91 ghostbuster91 commented May 28, 2023

This relates to #701

This is not a full fix for the builtin expand_all_nodes method but rather a fix that enables fixing it. It also enables fixing function from wiki to recursively expand nodes under the cursor (more important in my opinion).

List of changes broken down by commit:

  1. Add callback to toggle_directory - this is the most important. Luckily it is not a breaking change because callback was previously nil anyway.
    Justification:
    fs.toggle_directory calls fs_scan.get_items that calls sync_scan that for the
    scan_mode=="shallow" calls sync_scan and then job_complete.

    Here is where the problem starts - job_complete checks if git_status_async is set and
    if so it registers a callback in git.mark_ignored method. That callback is important
    because it eventually calls render_context that updates the nodes. That callback will be
    called at some point later in time, but the execution continues in the meantime. Because
    of that it is not guaranteed that we will have updated results by the time we start
    expanding nodes.

  2. Small refactoring - method extraction. Can be removed from the list of changes if desired.

  3. Small fix for the recursive traversals when deep strategy was used.

I did not fix the expand_all_nodes because I didn't know how can I obtain the root element of the whole filesystem tree. The root element is needed so that we can start expanding from it rather than expanding from its children like it is now:

  for _, node in ipairs(state.tree:get_nodes()) do
    expand_node(node)
  end

This is important so that we can have a single callback that we pass to toggle_directory. Otherwise, this method will spawn as many processes as there are nodes on the top level. Tbh I would remove this method.

And here is a fixed method to expand recursively nodes under the cursor recursive_open:

    local function expand_all(state, root)
        local toggle_directory = function(_, node)
            node:expand()
        end

        local expand_node
        expand_node = function(node)
            local id = node:get_id()
            if node.type == "directory" and not node:is_expanded() then
                toggle_directory(state, node)
                node = state.tree:get_node(id)
            end
            local children = state.tree:get_nodes(id)
            if children then
                for _, child in ipairs(children) do
                    if child.type == "directory" then
                        expand_node(child)
                    end
                end
            end
        end

        expand_node(root)
        renderer.redraw(state)
    end

    local function open_dir(state, dir_node, callback)
        local fs = require "neo-tree.sources.filesystem"
        fs.toggle_directory(state, dir_node, nil, true, true, callback)
    end

    -- Expand a node and all its children
    -- does not handle max_depth
    local function recursive_open(state, node, max_depth)
        open_dir(state, node, function()
            expand_all(state, node)
        end)
    end

Tested both with deep and shallow strategies.

@ghostbuster91 ghostbuster91 changed the base branch from v2.x to main May 28, 2023 19:29
@cseickel
Copy link
Contributor

cseickel commented May 29, 2023

I did not fix the expand_all_nodes because I didn't know how can I obtain the root element of the whole filesystem tree.

I think you can get the root with state.tree:get_node(state.path), because the ID's of the nodes is just the absolute path.

Would you like to continue and fix that expand all nodes built-in as well or just merge what you have now?

BTW - I do agree with all of your function extractions. Some of the functions have grown way too big and breaking them up into smaller functions with descriptive names is always good.

@ghostbuster91
Copy link
Contributor Author

I think you can get the root with state.tree:get_node(state.path), because the ID's of the nodes is just the absolute path.

👍 Thanks

Would you like to continue and fix that expand all nodes built-in as well or just merge what you have now?

I am happy to work on it further. In fact I realized that there might be some cases where the expanding might not work correctly.

@ghostbuster91
Copy link
Contributor Author

ghostbuster91 commented May 30, 2023

Ok so here is what I discover/realized:
Even when expanding nodes under cursor, some child nodes might already be loaded/expanded.
Consider following situation:

dir1 <- cursor here
\- child1 - loaded
\- child2 - not loaded
\- child3 - loaded and expanded
  \- file.lua
  \- grandchild1 - not loaded

If dir1 is unloaded then the situation is simple, we enter this path:

  if node.loaded == false then
    local id = node:get_id()
    state.explicitly_opened_directories[id] = true
    renderer.position.set(state, nil)
    fs_scan.get_items(state, id, path_to_reveal, callback, true, recursive)

If dir1 is loaded then we need to basically call that method for each of its unloaded children. The problem is that this method is async.

I am still quite new to lua and its concurrency features/gotchas but from what I saw, this seems like a job for plenary.async.

I tried rewriting expand_all_nodes to use it, but for some reason the execution of coroutines seems to be stopped before all of them have a chance to finish.

M.expand_all_nodes = function(state, toggle_directory)
  local async = require("plenary.async")

  -- toggle_directory = async.wrap(toggle_directory, 3)
  local expand_node

  local expand_async = function (node)
    local id = node:get_id()
    local toggle_async = async.wrap(function (callbackF)
      if node.type == "directory" and not node:is_expanded() then
          log.debug("before toggle " .. node.name)
          toggle_directory(state, node, function ()
              log.debug("after toggle " .. node.name)
              node = state.tree:get_node(id)
              callbackF(node)
          end)
      else
        callbackF(node)
      end
    end,1)
    toggle_async()
    log.debug("after toggle_async " .. node.name)
    local children = state.tree:get_nodes(id)
    local tasks = {}
    if children then
      log.debug("children of " .. node.name .. " - "  .. #children)
      for _, child in ipairs(children) do
        if child.type == "directory" then
          local task = async.wrap(function (callback2)
            expand_node(child, callback2)
          end, 1)
          table.insert(tasks, task)
        else
          log.debug("skipping child " .. vim.inspect(child.name))
        end
      end
    else
      log.debug("no kids")
    end
    if #tasks > 0 then
		async.util.join(tasks)
	end
  end



  expand_node = function(node, callback)
    async.run(function ()
        log.debug("async run in expand_node " .. node.name )
        return expand_async(node)
    end, callback)
  end

  local tasks = {}
  for _, node in ipairs(state.tree:get_nodes()) do
    local task = async.wrap(function (callback)
        log.debug("async expand node" .. node.name)
        expand_node(node, callback)
    end, 1)
    table.insert(tasks, task)
  end
    log.debug("run all on top")
    async.util.run_all(
      tasks,
      function ()
        log.debug("run all on top - finish")
        vim.schedule_wrap(function()
          renderer.redraw(state)
        end)
      end
    )
end

here is how logs ends

[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:123: after toggle scripts
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:132: after toggle_async scripts
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:136: children of scripts - 1
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:144: skipping child "test.sh"
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:123: after toggle plugin
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:132: after toggle_async plugin
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:136: children of plugin - 1
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:144: skipping child "neo-tree.vim"
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:123: after toggle lua
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:132: after toggle_async lua
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:136: children of lua - 2
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:144: skipping child "neo-tree.lua"
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:159: async run in expand_node neo-tree
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:121: before toggle neo-tree
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:123: after toggle doc
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:132: after toggle_async doc
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:136: children of doc - 1
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:144: skipping child "neo-tree.txt"
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:123: after toggle tests
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:132: after toggle_async tests
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:136: children of tests - 3
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:144: skipping child "mininit.lua"
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:159: async run in expand_node neo-tree
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:121: before toggle neo-tree
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:159: async run in expand_node utils
[DEBUG Tue 30 May 2023 10:43:39 AM CEST] .//lua/neo-tree/sources/common/commands.lua:121: before toggle utils

so the final callback log.debug("run all on top - finish") is never ran.

Any idea what could be the reason or how to do it without plenary.async?

My plan further is to rewrite expand_all_nodes all the way down til the git.ignore simplifying all the things and see if that will help (right now it uses some very generic functions and it makes it harder to reason about).

@cseickel
Copy link
Contributor

TBH I don't have enough time left today to load recursive async code into my working memory and reason about where the problem is. My gut is telling me it would be better to add directories to be scanned to a collection as they are discovered and keep looping until that collection is exhausted.

The async part probably breaks that technique... Maybe every time an async methods adds a new directory to that collection, the loop can be restarted if it had previously been exhausted and completed. This may be a situation where OOP is better than functional. Neo-tree has more functional style code but we do use OOP at times where it makes sense.

Anyway, I would be thrilled if you rewrote it because I know you write good code. This section of the code has gotten messy and confusing.

@ghostbuster91
Copy link
Contributor Author

Anyway, I would be thrilled if you rewrote it because I know you write good code. This section of the code has gotten messy and confusing.

Crap, now I will have to deliver on these expectations xD

The async part probably breaks that technique...

Yeah, that wouldn't work. I need to dive more into plenary async and threading in lua.

This may be a situation where OOP is better than functional. Neo-tree has more functional style code but we do use OOP at times where it makes sense.

Idk. I have been writing functional code for the last couple of years and this seems fine like every other place. I am glad that you took that approach. It is much easier to reason about that codebase than the nvim-tree one (at least for me). Sure there are some messy parts but nothing that cannot be cleaned up.

In general I am happy to continue working on it but it will take more time than I initially anticipated.

@ghostbuster91
Copy link
Contributor Author

just a small update from my side:

Kooha-2023-06-02-10-46-16.webm

The code seems to work now. Though I have made some significant changes. I am not pushing it yet as I need to clean it.

After that I will publish it and describe my changes with recommendations of moving forward. In general I would avoid using callbacks as they are hard to follow and error-prone (e.g. calling it twice or forgot to call it at all).

@ghostbuster91
Copy link
Contributor Author

@cseickel I have it almost ready. Just one small bug left. Could you tell me how node.is_loaded gets changed?

@cseickel
Copy link
Contributor

cseickel commented Jun 3, 2023

@cseickel I have it almost ready. Just one small bug left. Could you tell me how node.is_loaded gets changed?

@ghostbuster91 that is_loaded property is junk that got slipped in somehow. It could be that it was just a typo. You are probably looking for node.loaded which is meaningful and used in many places. That is set here:

node.loaded = true

@ghostbuster91
Copy link
Contributor Author

ghostbuster91 commented Jun 3, 2023

A general comment to my change:

I think that we should migrate the codebase to plenary.async as relying on callbacks is too error prone and not easy to follow.

Thanks to that we should be able to get rid of context that is just a bunch of variables that are set in some contexts and in others not.

This will makes the functions signatures clearer and allow to better reason about the codebase.

Now, I don't expect anyone to do that work. However, I like neo-tree idea to that degree that I am willing to do the most of heavy lifting provided that you are interested in such changes.

Update:
I forgot to add comment about plenary.scan. So a lot of functions that we have here to scan directories are already in plenary. In fact, judging by comments in the code, some of the functions in neotree codebase were written based on the ones in plenary.

That is probably because the ones in plenary are not flexible enough. Like e.g. you can scan_dir, and you can do that with respect to git_ignore but you will only get results that are not ignored, while neotree needs all the results, just enriched with information about git_ignore.

I would like to propose contributing following changes to plenary:

  1. add new feature to scan_dir so that it uses git_ignore to enrich its results
  2. add new feature to scan_dir so that it can scan stop scanning when given predicate is met. This will be useful for implementing functions like expand 5 levels deep.

wdyt?

And btw the PR is ready for review.

@cseickel
Copy link
Contributor

cseickel commented Jun 3, 2023

One general note about plenary: it's a great library and I appreciate the work that has been put into it, but be careful that you don't think of it as a "standard library" like what you would get from other languages. It's not the .Net Standard Library or Lodash, it's a young project maintained by a handful of very busy open source contributors.

Plenary async methods seem to be pretty good and wouldn't mind a refactor that leaned on those, but be careful and make sure that performance does not suffer. I usually use the Linux project to test out changes to the filesystem scanning code when I am worried about performance. Also, be sure that you consider that there may be some shared code with other sources that needs to be maintained.


On the subject of the Plenary dir scanning functions, they are banned from this project. The first version used them of course, but over time I realized that it is slow and just too simple to work for what we need. Also, the gitignored filter is completely wrong. Gitignore is a very complicated thing and I stand by the decision to just relay that logic to the git cmd and not try to re-implement it ourselves.

I think that for a plugin that revolves around scanning the filesystem, it makes sense that we would roll our own functions for that one thing. We just need that level of control and customization.

Copy link
Contributor

@cseickel cseickel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, just a few comments. The important thing is to return the common.commnds.expand_all_nodes function to being a generic implementation.

@@ -228,7 +257,9 @@ local function async_scan(context, path)
local scan_tasks = {}
for _, p in ipairs(context.paths_to_load) do
local scan_task = async.wrap(function(callback)
scan_dir_async(context, p, callback)
async.run(function ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this before. Is this like a pipe() function from rxjs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, it is rather like subscribe. It allows you to execute an "async" function from the regular(non-async) context.

@ghostbuster91
Copy link
Contributor Author

One general note about plenary: it's a great library and I appreciate the work that has been put into it, but be careful that you don't think of it as a "standard library" like what you would get from other languages. It's not the .Net Standard Library or Lodash, it's a young project maintained by a handful of very busy open source contributors.

I understand that, but there seem to be no other alternative, and my mental capacity is not big enough to load all the callbacks at once to reason about the code.

Plenary async methods seem to be pretty good and wouldn't mind a refactor that leaned on those, but be careful and make sure that performance does not suffer. I usually use the Linux project to test out changes to the filesystem scanning code when I am worried about performance.

That is a great idea!

On the subject of the Plenary dir scanning functions, they are banned from this project. The first version used them of course, but over time I realized that it is slow and just too simple to work for what we need. Also, the gitignored filter is completely wrong. Gitignore is a very complicated thing and I stand by the decision to just relay that logic to the git cmd and not try to re-implement it ourselves.

Understood, thanks for the explanation!

I think that for a plugin that revolves around scanning the filesystem, it makes sense that we would roll our own functions for that one thing. We just need that level of control and customization.

Yep, that makes sense. At some point we might contribute it back to plenary if they will be generic enough.

@cseickel
Copy link
Contributor

cseickel commented Jun 4, 2023

I understand that, but there seem to be no other alternative, and my mental capacity is not big enough to load all the callbacks at once to reason about the code.

I completely appreciate that. I've been there many times where there are sections of a project that are too complicated to fit in my working memory and I just have to refactor it. TBH - this code has been incrementally changed many times and by multiple people and is overdue for a refactor. The last time I had to change something it took me a while to get my bearings too.

local mark_ignored_async = async.wrap(function (_state, _all_items, _callback)
git.mark_ignored(_state, _all_items, _callback)
end, 3)
local all_items = mark_ignored_async(state, context.all_items)
Copy link
Contributor Author

@ghostbuster91 ghostbuster91 Jun 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QUESTION here I am always using async version of git_ignore, in contrary to job_complete that respects require("neo-tree").config.git_status_async and acts accordingly.

I could merge these two together, but I am hesitant as the old one calls render_context which in the context of my function is unnecessary (at least in this place).

Nevertheless, I still can add config.git_status_asyn to job_complete_async though I am not sure what are the benefits of that as everything is "async" in this callstack. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember how everything works off the top of my head and I don't think I have enough time to dig in now, but I can say that if the filesystem is being loaded async it probably doesn't make sense to load git_status synchronously. I can see the other way around where the fs is scanned sync but git status is async.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable 👍

@ghostbuster91
Copy link
Contributor Author

So far all of my attempts of running it on a big repository like qmk or nixpkgs have failed due to not enough ram. I am quite surprised by that, so I will take a closer look at it.

@cseickel
Copy link
Contributor

cseickel commented Jun 5, 2023

So far all of my attempts of running it on a big repository like qmk or nixpkgs have failed due to not enough ram. I am quite surprised by that, so I will take a closer look at it.

I've hit many difficult memory leaks in this project. I start with looking for closures on state, git status/ignored info, and diagnostics.

There have also been memory leaks before in treesitter, neovim, and Nui. If this is redrawing after every recursive load, you could be hitting memory leaks external to your code.

@ghostbuster91
Copy link
Contributor Author

ghostbuster91 commented Jun 7, 2023

I found the memory issue, it was actually my fault as I was doing some redundant recursion.

Now, I am facing another problem. My current solution works like that:

  1. async scan recursively
  2. mark-gitignored
  3. expand all nodes
  4. render

There is bit of a lag during the render, as there is a lot to render (I am testing it on a qmk repository that has 50k items in the keyboard directory). I guess it would fell more responsive if we rendered it more frequently. I think that this is how it was before, so maybe I will revert some of my changes.

However, the bigger issue is with mark-gitignored. Neovim refuses to spawn that many jobs, and kills the neotree thread. Batching elements into smaller chunks before passing them to mark_gitingored solves this problem to some degree, but neovim is still unresponsive during that time.

So, first I will try to redesign my approach a bit, and then I will look into mark-gitignored as I think that we could spawn less processes.

The good thing is that I feel that I am making some progress, so hopefully this will be finished one day :)

@cseickel
Copy link
Contributor

cseickel commented Jun 7, 2023

There is certainly a point at which a recursive expand is just unreasonable. The important thing is that it performs well in the normal operation where the next level may have a lot of items and with group_empty_dirs enabled.

For a full expand of a huge tree, it's OK if it takes long so long as there is some protection against errors. It would be OK to bail out with a warning at some point if the task has been running for too long or if you just exceed some number of folders. You could also report progress back to the neo-tree window (not exactly sure what would be the right way to do that) or to a new floating window that you pop-up just for this task after x seconds.

I would be hesitant to render the incremental progress for each folder because the renders would probably take up a lot of resources themselves. Maybe you can render once per level each time you reach a new max depth for the overall tree.

@pysan3
Copy link
Collaborator

pysan3 commented Jun 9, 2023

Hi @ghostbuster91, sorry for the interruption.

I've been working on this issue #906 where the core problem is that the detecting mechanism of whether to dig into child dirs when group_empty_dirs == true stops when there is filtered_items.never_show files.

This PR #908 is a fix for this specific problem, but as @cseickel mentioned, it would be nicer if the new algorithm implemented here can fix the problem in the first place.

Do you think it is possible?


More details

@ghostbuster91
Copy link
Contributor Author

@pysan3 For now I am mostly focused only on recursive expansion of all nodes to the very bottom. For that purpose I created a separate chain of methods that performs that expansion in more efficient way.

I guess once I finish it it will be possible to unify the way how we toggle a single directory with a recursive expansion. Though I would say this will be a different work stream.

Ideally, we will have expand_until(node => node.is_not_empty) and others high level methods. But that is rather far away.

Given the scope of your PR I think that you should not wait for me, since I am not sure when I will be able to finish it (I still want to dig into git_ignore stuff). And porting/adapting your changes will be rather easy.

@ghostbuster91
Copy link
Contributor Author

I would be hesitant to render the incremental progress for each folder because the renders would probably take up a lot of resources themselves.

I agree.

Maybe you can render once per level each time you reach a new max depth for the overall tree.

That could be an interesting feature, but let's save it for later.

The important thing is that it performs well in the normal operation where the next level may have a lot of items and with group_empty_dirs enabled.

👍 Let's stick to that.

@cseickel I came up with two ideas that could speed up recursive expand process that I would like to discuss:

Scan parent instead of children

given following directory structure:

- dir1
\- dir2
\- dir3
\- dir4

It is not a surprise that it is faster to perform single fs_scan on dir1 loading it and all each children recursively than to spawn 3 separate fs_scan coroutines to scan each of its children separately. This is more noticeable especially for directories with a lot of direct child directories.

The current algorithm works like that:
If the dir1 is loaded then it will check each of its children. For any that is not loaded it wil start recursive fs_scan. If the dir1 is not loaded it will start recursive fs_scan only for it.

Instead of that we could always perform recursive fs_scan on the parent. The obvious downside of that is that we'd perform that fs_scan every time we expand that node recursively. This could be mitigated by introduction of new flag on the node level - fully_loaded, which would denote if the node and all of each children has been loaded. Though, invalidation of such flag can be problematic so I am not sure if we would like to go that route.

Smarter git_ignore

The current way of checking git_ignored files works like that:
First we group items by directories, and then for each directory we call git -C $directory --chekc-ignore $items (which is a system process - heavy). We do this as the git repository doesn't have to be necessarily on the root level, but rather can be arbitrarily nested within the directory tree. There can be also multiple git repositories.

However, once we detect that a given directory belongs to a git repository, then I think we could check all descendant directories against that one directory. That would greatly decrease the amount of jobs to spawn in certain situations (e.g. when the root directory is the git root).

Now the question is how to detect that. I see two options - we could either save that information during the fs_scan as a node level flag - inside_git_repo, or we could spawn the jobs starting from the top level directories and only descend if they are not git repositories.

Summary

The second optimization is much more worth looking into as it can save a ton of resources. Having said that, I see them as additional improvements to the recursive expand feature and don't consider them critical.

Apart from testing the changes with group_empty_dir improvements made by @pysan3, would you like me to do anything else before we merge it?

@cseickel
Copy link
Contributor

Scan parent instead of children

...
Instead of that we could always perform recursive fs_scan on the parent. The obvious downside of that is that we'd perform that fs_scan every time we expand that node recursively. This could be mitigated by introduction of new flag on the node level - fully_loaded, which would denote if the node and all of each children has been loaded. Though, invalidation of such flag can be problematic so I am not sure if we would like to go that route.

One thing you have to consider is that there is no way to know if files have changed since the last scan so every node has to be loaded from disk every time with no caching of any kind. The loaded flag is only for lazy loading.

I think if we wanted to cut down the number of workers, I would refactor to have a specific number of parallel workers that simply work out of a central queue of discovered directories and pop the next arbitrary directory to scan out of that queue whenever they finish, rather than creating exactly one per directory. I didn't personally add the one worker per directory strategy, but I think it was just a simple way of parallelizing the job.

Smarter git_ignore

The current way of checking git_ignored files works like that: First we group items by directories, and then for each directory we call git -C $directory --chekc-ignore $items (which is a system process - heavy). We do this as the git repository doesn't have to be necessarily on the root level, but rather can be arbitrarily nested within the directory tree. There can be also multiple git repositories.

However, once we detect that a given directory belongs to a git repository, then I think we could check all descendant directories against that one directory. That would greatly decrease the amount of jobs to spawn in certain situations (e.g. when the root directory is the git root).

Now the question is how to detect that. I see two options - we could either save that information during the fs_scan as a node level flag - inside_git_repo, or we could spawn the jobs starting from the top level directories and only descend if they are not git repositories.

The git-ignore code was certainly not written with recursive expand in mind. That being said, this part is complicated and I would hate to complicate it further for a situation I see as an edge use case. It's OK if it is slow to recursively expand on a very large repo. I'd rather that than possibly add bugs for the average use case of s single level expand.

Summary

The second optimization is much more worth looking into as it can save a ton of resources. Having said that, I see them as additional improvements to the recursive expand feature and don't consider them critical.

Apart from testing the changes with group_empty_dir improvements made by @pysan3, would you like me to do anything else before we merge it?

I agree that you don't need to go further on optimization. My only concern is to ensure that if someone does recursively expand on a large repo, that neovim does not crash or completely freeze for a long period of time.

@ghostbuster91
Copy link
Contributor Author

ghostbuster91 commented Jun 13, 2023

Thanks for the detailed response, I appreciate that!

One thing you have to consider is that there is no way to know if files have changed since the last scan so every node has to be loaded from disk every time with no caching of any kind. The loaded flag is only for lazy loading.

Don't we spawn a fs-watcher for every loaded directory that should inform us if anything changes? That way we could invalidate fully_loaded flag if changes were detected.

The git-ignore code was certainly not written with recursive expand in mind. That being said, this part is complicated and I would hate to complicate it further for a situation I see as an edge use case. It's OK if it is slow to recursively expand on a very large repo. I'd rather that than possibly add bugs for the average use case of s single level expand.

That is a fair point. I might give it later a try to simplify it, because as you said it is complicated.

I agree that you don't need to go further on optimization. My only concern is to ensure that if someone does recursively expand on a large repo, that neovim does not crash or completely freeze for a long period of time.

Well, to be fair, that current behavior is exactly like that when you use git_async=false with recursive expand on big repository. So we are not introducing any regression but rather we are simply fixing the case when git_async=true.

I can later give it a try and make both of them more user-friendly but I would like not to growth the scope this PR that much.

@cseickel
Copy link
Contributor

Don't we spawn a fs-watcher for every loaded directory that should inform us if anything changes? That way we could invalidate fully_loaded flag if changes were detected.

I suppose that is true when that option is enabled, and even if it's not, it's no change to current behavior since the nodes will be discarded on each explicit refresh.


Do you think this is ready for final review and testing?

@ghostbuster91
Copy link
Contributor Author

Do you think this is ready for final review and testing?

I didn't have time yet to test it with group_empty and changes made by @pysan3. I will let you know once I finish it.

@cseickel
Copy link
Contributor

cseickel commented Jul 5, 2023

Do you think this is ready for final review and testing?

I didn't have time yet to test it with group_empty and changes made by @pysan3. I will let you know once I finish it.

Hey @ghostbuster91 I just want to check in on this. If you're out of time you don't need to worry about the fixes for group_empty as they can be handled separately. I originally called that out just to stop @pysan3 from doing too much work with a code base that was being refactored at the same time. We can easily just land this and then continue work on #908 separately.

@pysan3
Copy link
Collaborator

pysan3 commented Jul 5, 2023

We can easily just land this and then continue work on #908 separately.

Indeed, sorry for the interruption @ghostbuster91.

I now think this PR is not that related to mine so please just ignore my comments on this thread lol. I am thinking of revising my PR after v3.0 is released so it'll probly be quite in the future.

@ghostbuster91
Copy link
Contributor Author

ghostbuster91 commented Jul 5, 2023

Yes, sorry guys, I got totally absorbed by other things. I will try to finish this by the end of the week. At this point I just wanted to test it with the current implementation of group_empty. Hopefully I won't find any issues and we will be able to close it 🤞

@ghostbuster91
Copy link
Contributor Author

Just tested it with current group_empty - didn't spot any problems. I also pushed two small refactors.

@ghostbuster91
Copy link
Contributor Author

ghostbuster91 commented Jul 8, 2023

One more thing, when there is not enough file descriptors left in the system, plenary job will fail with following error:

Error executing luv callback:
.../myNeovimPackages/start/plenary.nvim/lua/plenary/job.lua:404: Failed to spawn process: {
  _additional_on_exit_callbacks = {},
  _shutdown_check = <userdata 1>,
  _stderr_results = {},
  _stdout_results = {},
  _user_on_exit = <function 1>,
  _user_on_start = <function 2>,
  args = { "-C", "/home/kghost/workspace/diffx/core/target/js-2.13/streams/_global/update/_global/streams", "check-ignore", "--stdin" },
  command = "git",
  enable_handlers = true,
  enable_recording = true,
  interactive = true,
  pid = "EMFILE: too many open files",
  stderr = <userdata 2>,
  stdin = <userdata 3>,
  stdout = <userdata 4>,
  user_data = {},
  writer = { "/home/kghost/workspace/diffx/core/target/js-2.13/streams/_global/update/_global/streams/out" },
  <metatable> = <1>{
    __index = <table 1>,
    _create_uv_options = <function 3>,
    _execute = <function 4>,
    _pipes_are_closed = <function 5>,
    _prepare_pipes = <function 6>,
    _reset = <function 7>,
    _shutdown = <function 8>,
    _stop = <function 9>,
    add_on_exit_callback = <function 10>,
    after = <function 11>,
    after_failure = <function 12>,
    after_success = <function 13>,
    and_then = <function 14>,
    and_then_on_failure = <function 15>,
    and_then_on_failure_wrap = <function 16>,
    and_then_on_success = <function 17>,
    and_then_on_success_wrap = <function 18>,
    and_then_wrap = <function 19>,
    chain = <function 20>,
    chain_status = <function 21>,
    co_wait = <function 22>,
    is_job = <function 23>,
    join = <function 24>,
    new = <function 25>,
    pid = <function 26>,
    result = <function 27>,
    send = <function 28>,
    shutdown = <function 29>,
    start = <function 30>,
    stderr_result = <function 31>,
    sync = <function 32>,
    wait = <function 33>
  }
}
stack traceback:
	.../myNeovimPackages/start/plenary.nvim/lua/plenary/job.lua:404: in function '_execute'
	.../myNeovimPackages/start/plenary.nvim/lua/plenary/job.lua:447: in function 'start'
	../neo-tree.nvim/lua/neo-tree/git/ignored.lua:138: in function 'mark_ignored'
	...ree.nvim/lua/neo-tree/sources/filesystem/lib/fs_scan.lua:113: in function 'returned_function'
	...mPackages/start/plenary.nvim/lua/plenary/async/async.lua:31: in function 'callback_or_next'
	...mPackages/start/plenary.nvim/lua/plenary/async/async.lua:45: in function 'saved_callback'
	...ackages/start/plenary.nvim/lua/plenary/async/control.lua:126: in function 'tx'
	...imPackages/start/plenary.nvim/lua/plenary/async/util.lua:66: in function 'callback'
	...mPackages/start/plenary.nvim/lua/plenary/async/async.lua:25: in function 'callback_or_next'
	...mPackages/start/plenary.nvim/lua/plenary/async/async.lua:45: in function 'callback'
	...ree.nvim/lua/neo-tree/sources/filesystem/lib/fs_scan.lua:195: in function <...ree.nvim/lua/neo-tree/sources/filesystem/lib/fs_scan.lua:186>
stack traceback:
	[C]: in function 'error'
	.../myNeovimPackages/start/plenary.nvim/lua/plenary/job.lua:404: in function '_execute'
	.../myNeovimPackages/start/plenary.nvim/lua/plenary/job.lua:447: in function 'start'
	../neo-tree.nvim/lua/neo-tree/git/ignored.lua:138: in function 'mark_ignored'
	...ree.nvim/lua/neo-tree/sources/filesystem/lib/fs_scan.lua:113: in function 'returned_function'
	...mPackages/start/plenary.nvim/lua/plenary/async/async.lua:31: in function 'callback_or_next'
	...mPackages/start/plenary.nvim/lua/plenary/async/async.lua:45: in function 'saved_callback'
	...ackages/start/plenary.nvim/lua/plenary/async/control.lua:126: in function 'tx'
	...imPackages/start/plenary.nvim/lua/plenary/async/util.lua:66: in function 'callback'
	...mPackages/start/plenary.nvim/lua/plenary/async/async.lua:25: in function 'callback_or_next'
	...mPackages/start/plenary.nvim/lua/plenary/async/async.lua:45: in function 'callback'
	...ree.nvim/lua/neo-tree/sources/filesystem/lib/fs_scan.lua:195: in function <...ree.nvim/lua/neo-tree/sources/filesystem/lib/fs_scan.lua:186>

We could match this field in the error
pid = "EMFILE: too many open files",
and display a nice informative message to the user as well as gracefully tear down the whole process. Probably an idea for another PR though.

@cseickel
Copy link
Contributor

cseickel commented Jul 8, 2023

One more thing, when there is not enough file descriptors left in the system, plenary job will fail with following error:

Yeah, that's a known issue for another PR. #997

@cseickel cseickel changed the title Fix recursive expanding of nodes refactor(filesystem): fix recursive expand of nodes Jul 8, 2023
@cseickel cseickel merged commit d77fd94 into nvim-neo-tree:main Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants