-
Notifications
You must be signed in to change notification settings - Fork 251
refactor(filesystem): fix recursive expand of nodes #957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think you can get the root with Would you like to continue and fix that expand all nodes built-in as well or just merge what you have now? BTW - I do agree with all of your function extractions. Some of the functions have grown way too big and breaking them up into smaller functions with descriptive names is always good. |
👍 Thanks
I am happy to work on it further. In fact I realized that there might be some cases where the expanding might not work correctly. |
Ok so here is what I discover/realized:
If dir1 is unloaded then the situation is simple, we enter this path: if node.loaded == false then
local id = node:get_id()
state.explicitly_opened_directories[id] = true
renderer.position.set(state, nil)
fs_scan.get_items(state, id, path_to_reveal, callback, true, recursive) If I am still quite new to lua and its concurrency features/gotchas but from what I saw, this seems like a job for I tried rewriting M.expand_all_nodes = function(state, toggle_directory)
local async = require("plenary.async")
-- toggle_directory = async.wrap(toggle_directory, 3)
local expand_node
local expand_async = function (node)
local id = node:get_id()
local toggle_async = async.wrap(function (callbackF)
if node.type == "directory" and not node:is_expanded() then
log.debug("before toggle " .. node.name)
toggle_directory(state, node, function ()
log.debug("after toggle " .. node.name)
node = state.tree:get_node(id)
callbackF(node)
end)
else
callbackF(node)
end
end,1)
toggle_async()
log.debug("after toggle_async " .. node.name)
local children = state.tree:get_nodes(id)
local tasks = {}
if children then
log.debug("children of " .. node.name .. " - " .. #children)
for _, child in ipairs(children) do
if child.type == "directory" then
local task = async.wrap(function (callback2)
expand_node(child, callback2)
end, 1)
table.insert(tasks, task)
else
log.debug("skipping child " .. vim.inspect(child.name))
end
end
else
log.debug("no kids")
end
if #tasks > 0 then
async.util.join(tasks)
end
end
expand_node = function(node, callback)
async.run(function ()
log.debug("async run in expand_node " .. node.name )
return expand_async(node)
end, callback)
end
local tasks = {}
for _, node in ipairs(state.tree:get_nodes()) do
local task = async.wrap(function (callback)
log.debug("async expand node" .. node.name)
expand_node(node, callback)
end, 1)
table.insert(tasks, task)
end
log.debug("run all on top")
async.util.run_all(
tasks,
function ()
log.debug("run all on top - finish")
vim.schedule_wrap(function()
renderer.redraw(state)
end)
end
)
end here is how logs ends
so the final callback Any idea what could be the reason or how to do it without My plan further is to rewrite |
TBH I don't have enough time left today to load recursive async code into my working memory and reason about where the problem is. My gut is telling me it would be better to add directories to be scanned to a collection as they are discovered and keep looping until that collection is exhausted. The async part probably breaks that technique... Maybe every time an async methods adds a new directory to that collection, the loop can be restarted if it had previously been exhausted and completed. This may be a situation where OOP is better than functional. Neo-tree has more functional style code but we do use OOP at times where it makes sense. Anyway, I would be thrilled if you rewrote it because I know you write good code. This section of the code has gotten messy and confusing. |
Crap, now I will have to deliver on these expectations xD
Yeah, that wouldn't work. I need to dive more into plenary async and threading in lua.
Idk. I have been writing functional code for the last couple of years and this seems fine like every other place. I am glad that you took that approach. It is much easier to reason about that codebase than the nvim-tree one (at least for me). Sure there are some messy parts but nothing that cannot be cleaned up. In general I am happy to continue working on it but it will take more time than I initially anticipated. |
just a small update from my side: Kooha-2023-06-02-10-46-16.webmThe code seems to work now. Though I have made some significant changes. I am not pushing it yet as I need to clean it. After that I will publish it and describe my changes with recommendations of moving forward. In general I would avoid using callbacks as they are hard to follow and error-prone (e.g. calling it twice or forgot to call it at all). |
@cseickel I have it almost ready. Just one small bug left. Could you tell me how |
@ghostbuster91 that neo-tree.nvim/lua/neo-tree/ui/renderer.lua Line 1091 in 8c89efb
|
A general comment to my change: I think that we should migrate the codebase to plenary.async as relying on callbacks is too error prone and not easy to follow. Thanks to that we should be able to get rid of This will makes the functions signatures clearer and allow to better reason about the codebase. Now, I don't expect anyone to do that work. However, I like neo-tree idea to that degree that I am willing to do the most of heavy lifting provided that you are interested in such changes. Update: That is probably because the ones in plenary are not flexible enough. Like e.g. you can scan_dir, and you can do that with respect to git_ignore but you will only get results that are not ignored, while neotree needs all the results, just enriched with information about git_ignore. I would like to propose contributing following changes to plenary:
wdyt? And btw the PR is ready for review. |
One general note about plenary: it's a great library and I appreciate the work that has been put into it, but be careful that you don't think of it as a "standard library" like what you would get from other languages. It's not the .Net Standard Library or Lodash, it's a young project maintained by a handful of very busy open source contributors. Plenary async methods seem to be pretty good and wouldn't mind a refactor that leaned on those, but be careful and make sure that performance does not suffer. I usually use the Linux project to test out changes to the filesystem scanning code when I am worried about performance. Also, be sure that you consider that there may be some shared code with other sources that needs to be maintained. On the subject of the Plenary dir scanning functions, they are banned from this project. The first version used them of course, but over time I realized that it is slow and just too simple to work for what we need. Also, the gitignored filter is completely wrong. Gitignore is a very complicated thing and I stand by the decision to just relay that logic to the git cmd and not try to re-implement it ourselves. I think that for a plugin that revolves around scanning the filesystem, it makes sense that we would roll our own functions for that one thing. We just need that level of control and customization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, just a few comments. The important thing is to return the common.commnds.expand_all_nodes
function to being a generic implementation.
@@ -228,7 +257,9 @@ local function async_scan(context, path) | |||
local scan_tasks = {} | |||
for _, p in ipairs(context.paths_to_load) do | |||
local scan_task = async.wrap(function(callback) | |||
scan_dir_async(context, p, callback) | |||
async.run(function () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen this before. Is this like a pipe()
function from rxjs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, it is rather like subscribe
. It allows you to execute an "async" function from the regular(non-async) context.
I understand that, but there seem to be no other alternative, and my mental capacity is not big enough to load all the callbacks at once to reason about the code.
That is a great idea!
Understood, thanks for the explanation!
Yep, that makes sense. At some point we might contribute it back to plenary if they will be generic enough. |
I completely appreciate that. I've been there many times where there are sections of a project that are too complicated to fit in my working memory and I just have to refactor it. TBH - this code has been incrementally changed many times and by multiple people and is overdue for a refactor. The last time I had to change something it took me a while to get my bearings too. |
local mark_ignored_async = async.wrap(function (_state, _all_items, _callback) | ||
git.mark_ignored(_state, _all_items, _callback) | ||
end, 3) | ||
local all_items = mark_ignored_async(state, context.all_items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QUESTION here I am always using async version of git_ignore, in contrary to job_complete
that respects require("neo-tree").config.git_status_async
and acts accordingly.
I could merge these two together, but I am hesitant as the old one calls render_context
which in the context of my function is unnecessary (at least in this place).
Nevertheless, I still can add config.git_status_asyn
to job_complete_async
though I am not sure what are the benefits of that as everything is "async" in this callstack. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember how everything works off the top of my head and I don't think I have enough time to dig in now, but I can say that if the filesystem is being loaded async it probably doesn't make sense to load git_status synchronously. I can see the other way around where the fs is scanned sync but git status is async.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable 👍
So far all of my attempts of running it on a big repository like qmk or nixpkgs have failed due to not enough ram. I am quite surprised by that, so I will take a closer look at it. |
I've hit many difficult memory leaks in this project. I start with looking for closures on There have also been memory leaks before in treesitter, neovim, and Nui. If this is redrawing after every recursive load, you could be hitting memory leaks external to your code. |
I found the memory issue, it was actually my fault as I was doing some redundant recursion. Now, I am facing another problem. My current solution works like that:
There is bit of a lag during the render, as there is a lot to render (I am testing it on a qmk repository that has 50k items in the However, the bigger issue is with So, first I will try to redesign my approach a bit, and then I will look into The good thing is that I feel that I am making some progress, so hopefully this will be finished one day :) |
There is certainly a point at which a recursive expand is just unreasonable. The important thing is that it performs well in the normal operation where the next level may have a lot of items and with group_empty_dirs enabled. For a full expand of a huge tree, it's OK if it takes long so long as there is some protection against errors. It would be OK to bail out with a warning at some point if the task has been running for too long or if you just exceed some number of folders. You could also report progress back to the neo-tree window (not exactly sure what would be the right way to do that) or to a new floating window that you pop-up just for this task after x seconds. I would be hesitant to render the incremental progress for each folder because the renders would probably take up a lot of resources themselves. Maybe you can render once per level each time you reach a new max depth for the overall tree. |
Hi @ghostbuster91, sorry for the interruption. I've been working on this issue #906 where the core problem is that the detecting mechanism of whether to dig into child dirs when This PR #908 is a fix for this specific problem, but as @cseickel mentioned, it would be nicer if the new algorithm implemented here can fix the problem in the first place. Do you think it is possible? More details |
@pysan3 For now I am mostly focused only on recursive expansion of all nodes to the very bottom. For that purpose I created a separate chain of methods that performs that expansion in more efficient way. I guess once I finish it it will be possible to unify the way how we toggle a single directory with a recursive expansion. Though I would say this will be a different work stream. Ideally, we will have Given the scope of your PR I think that you should not wait for me, since I am not sure when I will be able to finish it (I still want to dig into git_ignore stuff). And porting/adapting your changes will be rather easy. |
I agree.
That could be an interesting feature, but let's save it for later.
👍 Let's stick to that. @cseickel I came up with two ideas that could speed up recursive expand process that I would like to discuss: Scan parent instead of childrengiven following directory structure:
It is not a surprise that it is faster to perform single The current algorithm works like that: Instead of that we could always perform recursive fs_scan on the parent. The obvious downside of that is that we'd perform that fs_scan every time we expand that node recursively. This could be mitigated by introduction of new flag on the node level - Smarter git_ignoreThe current way of checking git_ignored files works like that: However, once we detect that a given directory belongs to a git repository, then I think we could check all descendant directories against that one directory. That would greatly decrease the amount of jobs to spawn in certain situations (e.g. when the root directory is the git root). Now the question is how to detect that. I see two options - we could either save that information during the fs_scan as a node level flag - SummaryThe second optimization is much more worth looking into as it can save a ton of resources. Having said that, I see them as additional improvements to the recursive expand feature and don't consider them critical. Apart from testing the changes with |
One thing you have to consider is that there is no way to know if files have changed since the last scan so every node has to be loaded from disk every time with no caching of any kind. The I think if we wanted to cut down the number of workers, I would refactor to have a specific number of parallel workers that simply work out of a central queue of discovered directories and pop the next arbitrary directory to scan out of that queue whenever they finish, rather than creating exactly one per directory. I didn't personally add the one worker per directory strategy, but I think it was just a simple way of parallelizing the job.
The git-ignore code was certainly not written with recursive expand in mind. That being said, this part is complicated and I would hate to complicate it further for a situation I see as an edge use case. It's OK if it is slow to recursively expand on a very large repo. I'd rather that than possibly add bugs for the average use case of s single level expand.
I agree that you don't need to go further on optimization. My only concern is to ensure that if someone does recursively expand on a large repo, that neovim does not crash or completely freeze for a long period of time. |
Thanks for the detailed response, I appreciate that!
Don't we spawn a fs-watcher for every loaded directory that should inform us if anything changes? That way we could invalidate
That is a fair point. I might give it later a try to simplify it, because as you said it is complicated.
Well, to be fair, that current behavior is exactly like that when you use I can later give it a try and make both of them more user-friendly but I would like not to growth the scope this PR that much. |
I suppose that is true when that option is enabled, and even if it's not, it's no change to current behavior since the nodes will be discarded on each explicit refresh. Do you think this is ready for final review and testing? |
I didn't have time yet to test it with |
Hey @ghostbuster91 I just want to check in on this. If you're out of time you don't need to worry about the fixes for |
Indeed, sorry for the interruption @ghostbuster91. I now think this PR is not that related to mine so please just ignore my comments on this thread lol. I am thinking of revising my PR after v3.0 is released so it'll probly be quite in the future. |
Yes, sorry guys, I got totally absorbed by other things. I will try to finish this by the end of the week. At this point I just wanted to test it with the current implementation of |
Just tested it with current |
One more thing, when there is not enough file descriptors left in the system, plenary job will fail with following error:
We could match this field in the error |
Yeah, that's a known issue for another PR. #997 |
This relates to #701
This is not a full fix for the builtin
expand_all_nodes
method but rather a fix that enables fixing it. It also enables fixing function from wiki to recursively expand nodes under the cursor (more important in my opinion).List of changes broken down by commit:
Add callback to
toggle_directory
- this is the most important. Luckily it is not a breaking change because callback was previouslynil
anyway.Justification:
fs.toggle_directory
callsfs_scan.get_items
that callssync_scan
that for thescan_mode=="shallow"
callssync_scan
and thenjob_complete
.Here is where the problem starts -
job_complete
checks ifgit_status_async
is set andif so it registers a callback in
git.mark_ignored
method. That callback is importantbecause it eventually calls
render_context
that updates the nodes. That callback will becalled at some point later in time, but the execution continues in the meantime. Because
of that it is not guaranteed that we will have updated results by the time we start
expanding nodes.
Small refactoring - method extraction. Can be removed from the list of changes if desired.
Small fix for the recursive traversals when
deep
strategy was used.I did not fix the
expand_all_nodes
because I didn't know how can I obtain the root element of the whole filesystem tree. The root element is needed so that we can start expanding from it rather than expanding from its children like it is now:This is important so that we can have a single callback that we pass to
toggle_directory
. Otherwise, this method will spawn as many processes as there are nodes on the top level. Tbh I would remove this method.And here is a fixed method to expand recursively nodes under the cursor
recursive_open
:Tested both with
deep
andshallow
strategies.