-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Address #10134 OOM/timeout #10173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address #10134 OOM/timeout #10173
Conversation
Thank you, the changelog looks good to me :D |
This helps with my synthetic minimized example indeed, but doesn't help at all with my actual project I'll see if I can create a more representative minimized example that OOMs with this branch; or, if you'd prefer to look yourself, just run |
The problem is caused by the domain of the analysis being stored per basic block. With the current implementation this results in the memory usage being A fix would be to use a copy-on-write bitset. A hybrid between a small sparse bitset and the full COW bitset would probably be best. |
Not
So, to be more precise, you mean instead of |
Yes. That should have been
Something like that, yes. Each basic block is unlikely to produce many borrows, so this will significantly cut down the number of cloned bitsets. |
I'll give it a shot. |
In the branch from #10144, I pushed what I think you were describing @Jarcho. This approach helps with the memory, but seemingly not with the time required. On my laptop, @mwkmwkmwk's first example takes about 10 seconds. For her second example, I kill it after a couple of minutes. Part of the problem with her first example is that are are many tiny borrower sets. So the loop that is currently in That is why I think we need an approach where There could be something obvious that I am not seeing, though. |
You could try optimizing a join on the bottom value into a clone instead of looping over every value. Might be worth reference counting the map itself. Going by the MIR output of the second example less than half the blocks result in a borrow. Using Reference counting only the dense bit set might also provide some performance benefit, but it could also make things worse. The second example is bordering on unreasonable. Clippy should still run quickly on it, but the single function takes over two seconds for me to compile. Adding one more |
These sound like good suggestions, but I am afraid they wouldn't address the time required by @mwkmwkmwk's first example. What if
What do you think of either of these ideas? (EDIT: I'm considering the memory issue essentially resolved by your suggestions here.) |
To make the idea concrete, I implemented the first bullet and pushed it to the branch from #10144. |
☔ The latest upstream changes (presumably #10192) made this pull request unmergeable. Please resolve the merge conflicts. |
Close this? (FWIW, I think reverting #9701 was the right decision.) |
If it can be made to work it's worth doing. |
One other idea I had was this: rust-clippy/clippy_utils/src/mir/possible_borrower.rs Lines 255 to 256 in 4fe3727
We could require the caller to provide an upper bound on the number of borrowers (e.g., 2), and incorporate this into the analysis. This would effectively take the complexity from nm^2 to nm .
The downside would be it could restrict What do you think? |
I don't see how that could work. Just because a type has only two borrowers at a single point in time doesn't mean it didn't have hundreds of borrowers earlier. We still need to keep track of all known borrowers to know when something would drop down to only a specific number. You could limit the locals tracked to only the one we care about. This would mean rerunning the analysis for each local we need to check, but it might end up being faster.
I'd rather not take this approach given the current approach does handle processing these cases. |
Sorry for not being clear. What I meant was: the state for each local would hold up to say two borrowers, but there would be an additional representation to mean "three or more." Something like: struct PossibleBorrowerState {
map: FxIndexMap<Local, Option<FxHashSet<Local>>>, // each set's size is at most `max_borrowers`,
// `None` means `max_borrowers` was exceeded
max_borrowers: usize,
} Joining "three or more" with anything else results in another "three or more." At any given program point, when we ask, "Are the borrowers of X at most Y and Z?" if X's state is "three or more," the answer is immediately "no." (Come to think of it, if a future lint did need the full set of borrowers, it could just set the limit exorbitantly high.)
I'm hoping we don't have to go this route.
👍 |
That's how interpreted your suggestion. It misses any of the more interesting cases where something was at one borrowed, and now no longer is. Since borrowers are currently not removed once added this would break quite a few cases. |
I'm not following. I don't think the code did this before or after #9701. In both cases, the borrower sets only grew. What am I missing? EDIT: I will try to implement it. Either it will work, or I will find my mistake. |
The borrower set is trimmed down afterwards with |
I see what you are saying now. |
It might be better to look into getting the borrow checker usable from clippy rather than rewriting a limited version of it. |
Do you have any thoughts on how one would go about that, or what the result would look like? |
The borrow checker starts at |
@Jarcho I could use your input. I have been looking into how we might incorporate In particular, a One way out would be to change Another way out I think would be to use pinning. (I haven't played with pinning before, so I'm speculating a bit.) But I think that would also require unsafe code. I can't find any unsafe code in Clippy currently. So a question I have is: is adding unsafe code a "red line" not to be crossed in Clippy? Alternatively, do you think Or is there another option I haven't thought of? |
Have you checked if |
I'm still trying to figure that out. I was coding up what I thought might work when I ran into this problem. I will reply again when I know more. |
@Jarcho The code I just pushed does not compile---it requires changes to rustc. Those changes can be found in this branch. Before I open a PR to the Rust repo, could I get a preliminary, "yes, this stands a chance of being merged into Clippy?" (There is no rush.) This code reconstructs the MIR the borrow checker is run on, and from that reconstructs some artifacts the borrow checker would have produced, notably:
These artifacts are used most crucially here (e.g., if there were a bug that would cause this approach to fall apart, I would expect it to be that code). As alluded to above, the borrower checker is run on an earlier version of the MIR than the one returned by I tried running the borrower checker on the It would seem unreasonable to expect every I think that hits the high points. |
I'll try to look at that this week. |
Given that this requires rebuilding the MIR, I'd rather not merge this as is. I think the better approach here is to add proper MIR lint passes into rustc (or something resembling them). That is, however, a rather large project which would require coordinating with the compiler team. I'm not totally against the idea, but I would rather investigate alternatives first. I'll bring this up at the next clippy meeting in two weeks. Anyways, thanks for looking into this. Back to the initial problem, you could try merging the The basic idea here is that once the local's storage is dead there can't be any live borrows, so we can stop keeping track of them. Since the majority of borrowed locals will have a rather narrow live range this should result in significantly less work on each join, and possibly also less joins in general (loops are pretty much always run over at least twice now). |
You probably know this already, but there's already something like this: https://doc.rust-lang.org/beta/nightly-rustc/rustc_mir_transform/pass_manager/trait.MirLint.html I worry that using an additional lint pass in Clippy could create churn though. For example, an idea I was considering for the future was to use the borrow check results to identify unnecessary lifetime annotations (e.g., Another idea to avoid reconstruction could be to add a flag to the compiler configuration to say "preserve the borrower check results" and then expose those structures through a query.
👍
This idea makes total sense. But if it's okay, I'd like to wait and see how we will proceed using the borrow check results before continuing with the existing |
☔ The latest upstream changes (presumably #10313) made this pull request unmergeable. Please resolve the merge conflicts. |
Result of the meeting: Assuming the compiler team is ok with the |
Thanks, @Jarcho. I will try to push the compiler changes through. |
Did some quick perf testing. Rebuilding and borrow checking every function in
Worst case of 25% slower should be fine. Given that this will get better when This doesn't include any impact from the rustc changes, so this could end up being worse. |
Thanks, @Jarcho! I hope it wasn't to onerous to run these tests. |
@Jarcho This is still a WIP. No action is needed at this time. |
I'm going to close this. I may revisit this at some point in the future. |
This is an attempt to address the OOM/timeout exhibited by @mwkmwkmwk's example in #10134.
As I see it, the reasons for the OOM/timeout are the following:
possible_borrower
at eachjoin
initerate_to_fixpoint
(essentially, at each basic block) was proportional to the number of locals.One way out of this situation is to make the work done at each
join
not proportional to the number of locals. The current PR proposes a data structure that achieves this for certain usage patterns, e.g., certain MIR dataflow problems. In particular, it avoids an OOM/timeout on @mwkmwkmwk's example and does not seem to degrade performance on Clippy's existing tests.The data structure supports an in-place union operation similar to
BitSet
'sunion
. However, unlikeBitSet
'sunion
, which returns true only when new bits were added, this data structure'sunion
will sometimes delay computation and returntrue
immediately even when no new elements were added. Delayed computations are represented in a tree-like data structure, and when certain conditions are met, trees are flattened into actual sets.Additional details are provided in
lazy_set
's module level documentation.r? @Jarcho
(@xFrednet I copied your edits to #10144's changelog, but changed the PR number.)
changelog: Enhancement: [
redundant_clone
]: Improved performance and memory usage#10173