-
Notifications
You must be signed in to change notification settings - Fork 710
Cabal takes 3 minutes to resolve dependencies #7466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I hope somebody comes up with a hint to speed it up, particularly when repeating the builds. I'd also try with a nightly build of dev version of cabal (#7458 (comment)). If no easy solution, it's definitely worth investigating where the time is spent (unless it's widely known for projects of that size with freeze files). I'd guess, in the solver, because checking satisfiability of constraints (freeze file) is a special, but not too special, kind of constraint solving, which is costly even without backtracking. (BTW, did you try |
The cached build plan should be reused if no flags or metadata changed, and if it isn't that's a bug.
This may be the reason, possibly exacerbated by the lack of bounds in dependencies too. Why is it needed?
What do you mean by this? |
"I'm noticing that the cabal solver takes 2-3 minutes while stack is pretty much instantaneous. I figure I must not be aware of some cabal feature that stack is leveraging." There is no such feature. Rather, stack is simply not running a solver at all. Note that your procedure is asking cabal to resolve unrestricted dependencies on an entire stack LTS, not just the packages you actually depend on. The correct way to convert a stack project to cabal imho is simply to use the generated cabal file directly. You can get the current bounds by entering a stack shell and running Further you only need to handle the direct dependencies in a coherent way -- the indirect dependencies will have their correct version picked by the solver automatically. |
Wait, really? I, mean, the constraints are there, but I thought the goals were only the stuff in the dep tree. I tried running |
@Mikolaj, I ran
This doesn't sound right to me. I thought the point of having
How does "not running a solver at all" work? |
Pinning all transitive dependencies versions is a valid use case, even if extreme. However, it does not require pinning versions of all LTS libraries nor explicitly requiring that all transitive dependencies get built, if any of that is going on here. @dfithian: would you care to share more of your project structure with us? A github repository with a minimal reproduction example would be best. Just a skeleton that is enough to exhibit the problem. |
"How does "not running a solver at all" work?" It works because stackage sets out fixed version bounds for everything. By adding allow-older and allow-newer you're rejecting those bounds, thus obviating the point of using the freeze file at all. The command to generate and freeze working bounds is something resembling what I gave above -- using It seems you're suggesting your project literally has 200 packages using one cabal.project file? In this case I see why editing each cabal file directly could be somewhat irritating and you would prefer a freeze file. In that case, you can keep the freeze file, but simply do not add allow-older and allow-newer. Or once you have a working build, generate a new freeze file that captures the dependencies as calculated (using Again, the main point is that if you want to pin all the deps, you should not have allow-older and allow-newer since their purpose is to "unpin" everything again. |
I managed to get a freeze file with One thing I noticed is that if I disallowed backtracking ( I'll work on getting a reproduction case. I suppose if I had one I'd probably have a better issue than just "help!" 😛 . |
Sorry, I still don't understand what you're trying to do here. Why are you worried about the runtime of So yes, cabal may take some time to resolve dependencies, but what workflow do you have where you want it to keep resolving dependencies repeatedly? If you don't have such a workflow, then there is no bug or issue here, just the general fact that running a solver over a lot of dependencies takes time, and one may wish to adopt any number of workflows that avoid doing so repeatedly. Edit: It seems like the issue you may be getting at is just that when you have a cabal.project that spans 200 packages and with a lot of dependencies that resolution takes more time than you would expect, even with everything ostensibly pinned through a freeze file? In any case, at this point I'll stand by for a reproduction case, since I confess I'm having a hard time understanding the precise issue... (Basically, a lot of work has been put into solver performance over the years, and there may be specific issues that could use improvement, but "solver is slow" is not a ticket that can lead to much concrete action, whereas 'solver is being run in case X Y Z and should be bypassed' may stand a better chance. I also wonder how much of the issue is a function of the number of deps, and how much is a function of having, again, as I understand it, 200 packages controlled in a single cabal project?) |
Sorry, I mean running |
Okay, I can see where you're coming from when you say "yeah resolving dependencies takes a long time, but it shouldn't get re-run". However, I see this problem in two main places for us.
All said, it's fine if it's not a bug, since we currently have a working path with stack with the same dependencies. I guess I am mostly just confused how I can't get to the same performance as stack. Then again, that may be the magic of stack! |
Thank you for further specifying. I think there is an issue here from what you've described. And it seems to be that even when dependencies are fully pinned via a freeze file, generating a build plan can take a surprising amount of time, at least in your specific setting. We think this is not due to the number of dependencies pinned, as some tests seem to show otherwise. But it is not clear if it is due to the number of transitive deps, or if it is due to having, by your description, some hundreds of packages sharing a single cabal.project. I think that's something very much worth improving! Again, the reason stack does not encounter this is it literally does not run a solver. It just assumes the pinned dependencies are correct, and if there is some issue that occurs in the course of using them, it lets the tooling it invokes complain. Meanwhile. here, it seems the issue is that the cabal solver appears to run even when there is nothing to solve. In most packages, that runtime seems to be pretty efficient, but again, potentially due to the number of packages grouped under a single cabal file (or due to their dependencies), it is not efficient. I think this is worth figuring out, and having a reproducible public test-case is the best way to get there. |
Hi, I'd be interested to know whether this might be related to #7472, especially since Do you have |
@gbaz @Mikolaj We're going to create a project with anonymized packages (and no source code) to reproduce. It may take a little while. @michaelpj Indeed we do,
=> 5 |
@dfithian , I think this project replicates the issue: https://github.com/peterbecich/cabal-resolver-issue#readme |
@peterbecich: yay, this is impressive, thank you. Do I read correctly that not using Some low hanging fruits, if anybody would like to help: try with master branch, try with/without the freeze file. |
@michaelpj Yes, indeed we do have at least one package that has a @Mikolaj One of the areas we're seeing slowness is in our nix builds, which uses haskell.nix. As part of its set up, it uses |
Ah. The ticket I linked hits haskell.nix much worse (input-output-hk/haskell.nix#1156). We're working on a fix, I'll be interested to see if that helps you. |
@dfithian: perhaps you know the answers? Anybody else having a problem with
|
Hopefully this is a good summary. Both After taking a look at @peterbecich's repo, I wrote the script in this PR, but we haven't audited the output yet. I experienced about 40 seconds of resolve time during |
Alright we audited the output and merged it in: https://github.com/peterbecich/cabal-resolver-issue
|
I made an attempt at something here, but I don't think its going to yield significant results: #7491 If one considers how much work is actually being done here even if one thinks of "solving" purely as "validation", 30s seems entirely reasonable to me -- this is walking over 200+ interlinked dep graphs and ensuring they're all coherent, and then producing configurations/build-plans for all of them. 3 minutes seemed concerning, but the 30s cost in the repro repository seems ok. I suspect there's no low-hanging algorithmic fruit to be plucked here, especially since with that PR there are drastically fewer local backjumps but the sameish solving cost. (I.e. the "solving" cost really is just validation cost on a large universe of packages). That is to say that I'm interested in if the experts think the PR is worth it, but outside of that I imagine the next step is just the usual micro-optimizations on the solver and related code -- i.e. cutting down allocations by unpacking data structures, using more efficient representations, guided by profiling, etc. The repo is still a very good testbed for the solver, and much appreciated, but unless there's some condition in which that 30s turns into 3m, I don't see any immediate problems here outside of "everything could be somewhat faster with a lot of careful optimization and profiling". |
Is it possible to disable the solver entirely? I know currently the only option for |
mmm stack actually does it, but it has stackage (and stack.yaml explicit versions for hackage) behind to provide a coherent set of dependencies. EDIT: I am sure this idea has been discussed before so i guess it wasn ot implemented due to some reason 🤔 |
Yes, exactly my argument. In the example (#7466 (comment)), we froze every package we knew about. Our intention is to use stackage LTS (and in lieu of that, pin dependencies manually). |
Right. The problem is that stack does not validate the plan actually works. As my experiment shows (although I would be curious if it yields an improvement on the actual and not test case!) the cost of just constructing and validating a plan for a large universe of packages and dependencies is itself not trivial. Stack simply doesn't validate the plan -- it just assumes that everything is fine, and then you only notice a failure if something actually fails in the middle of executing the plan and performing the actual builds. It might be possible to teach cabal how to avoid building/validating a plan altogether as suggested, but for any but power-user cases I'd very much want to discourage this. |
I ran the reproduction with testing disabled, and it seemed like more than half of the solver's time was spent descending the search tree without needing to backtrack. So the solver is probably taking a long time to simply validate the large number of dependencies and constraints, as @gbaz suggested. I think this would require profiling next. One section of the log showed backtracking, when the solver first tried using Cabal-3.4.0.0 for servant-openapi3's setup, descended over 600 levels, and then had to backtrack and use the installed Cabal-3.2.1.0. The solver should never need to backtrack with a freeze file. The problem is that #3502 wasn't fully implemented, and the freeze file doesn't contain separate constraints for each setup or build tool dependency. In this case, adding "servant-openapi3:setup.Cabal ==3.2.1.0" to cabal.project.freeze avoids the backtracking. The 3 minute run time with build tool dependencies may also be explained by #3502 if they significantly complicate the dependency solving problem. |
Some profiling of a run on the test repo:
I think the cata implementation seems potentially inefficient -- it uses clever recursion patterns but might more profitably be written directly? Ditto with ana. The two PSQ operations that are costly tend to indicate to me that our PSQ usage might be improved if we switch to using vectors rather than lists as the underlying structure? Or maybe just arrays to avoid growing the dep footprint... Some low hanging fruit appears to be The call to
That invariant check does a full traversal of the whole index! I'm guessing we're not building cabal with assertions explicitly turned off, which could account for this? |
Ok, so adding ghc options to disable assertions and adding optimization levels to ghc opts did not seem to fix this?? However, explicitly turning the invariant check into a no-op did work, so I wonder if there's a problem here that's led to some of the slowdown... |
Finally on
Note that there's an autoderived binary instance here, which could well be pretty inefficient. In this particular case, maybe moving to a handwritten binary instance could help? |
Nice find! We could call expensiveAssert so that the invariant is only enabled with the package flag debug-expensive-assertions, which is used in CI. |
vis a vis |
Testing further reveals indeed that if I get rid of the The sizes of said files in the cache:
So, hrm, I suspect the const of deserializing them is maybe unavoidable. That said, I think the logic for checking when cached things are out of date could more effectively short-circuit reading these files when it knows they'll be invalid regardless. It just so happens that when these files are not so lolhueg this additional cost probably wasn't noticed. Will investigate further... |
Does I'm getting these sorts of errors:
|
@newhoggy sounds like it may be a bug -- this ticket is not exactly the right place to discuss it though. I suggest you create a small repro and file a new ticket? |
Done #7504 |
The above PR significantly reduces the binary deserialization time (in the above profile attributed to With that fix (and the insert assert fix) the profile now looks like this:
So the only reasonable thing to handle next is going to be handling |
With that last merge, I think I'm comfortable closing this ticket. There might be more performance to gain, but certainly this is enough for "one round" of work. |
It appears that when cabal/cabal-install/src/Distribution/Client/FileMonitor.hs Lines 466 to 468 in c4fb167
like so
the issue disappears. With that change, running I do not know the ramifications of disabling GHC 8.10.7 |
@peterbecich: thank you for the experiment. This is important, people worldwide suffer from the rebuilds (and supposedly stack rebuilds in this case, too, so that's really a universal problem). Does anybody know what the |
BTW, as written here #3343, the worry is that cabal clobbers files when building with different flags and that is why it needs to rebuild them from scratch just in case. I guess there are cases where adding tests changes dependencies and so the other components are rebuilt and so their products clobbered and so need to be rebuilt afterwards. Not sure if cabal tries to mitigate that somehow, e.g., checking build plans, hashes, etc. to avoid rebuilding on a smaller scale than whole build profiles. |
The fix in #7516 as well as others are about as much performance as I think can reasonably be gained. I think what you're proposing here is wrong -- it simply does not regenerate the build plan when the tests enabled flag is changed. But the desired behavior is that enabling or disabling tests should regenerate the plan, because the dependencies chosen are different if test stanzas are taken into account or not. (i.e. its not that checking the value for change is expensive, its that when we don't check the value for change, we just use a cached plan generated under different flags -- which is a wrong one.) |
I will note there's one future idea that could improve things still further, which is to have not just a single cache file, but for some subset of flags, a cache file per flag set. That way we would still need to solve at least once in each case, but toggling the flag back and forth further could make use of the correct cached plans. This would need some careful architecture/thinking as currently cache files don't have a way to be "per-X" for various input parameters. This work could be tackled in conjunction with #7502 |
@gbaz: however, can't we generate both buildplans, take the intersection of packages they depend on, hash versions of these packages in both buildplans and if the hashes match, not wipe out the stuff when switching between the buildplans? |
How do I put this politely? That's completely not feasible given my understanding of how the solver and buildplans work. Further, even if it could work, it would still be less nice in all axes than simply caching a plan with and without the flag. |
@gbaz: to be clear, we are blue sky brainstorming, as opposed to considering re-opening this issue you so skilfully put to death and demanding that you realise our deranged dreams. ;D In fact, I was hoping something like the hashing of deps and deciding if we can just let them stay is already present in some dark corner of cabal, just not firing up in this particular case. @grayjay: is there any such functionality hidden somewhere or a ready toolbox from which @peterbecich could concoct a correct solution along these veins? |
I'll leave this issue be after this, but again, there's no reason to do any of that complicated hash-sharing stuff when we could just have one cache per flag set for some common flags. Its a much simpler architectural approach, and gets the same benefit, if not more. Under this proposed more complicated solution, one would need to run the solver once in each configuration anyway. |
You are right that after one recompilation with tests and one without, we are better off just caching both separately. And that if the intersection of the build plans differs, we've only wasted time computing the hashes. However, I think users care about the common case of compatible buildplans and of building once without tests (and generating some executables or packages), once with tests (and running the tests) and wiping out the repo altogether. I imagine some CI scripts or OS package builders doing just that. |
Well in that case they can always configure with |
Agreed. Perhaps we should publicize that. I think people may 1. not think ahead 2. be scared that BTW, about hashes vs caches, I guess these are not mutually exclusive solutions: one can have separate cache for builds with tests enabled, but just copy over (preserving time) artifacts from no-tests builds. However, copying files takes time, so this is less attractive. I agree I'd rather have cashes over hashes and making |
BTW, I'm probably spewing nonsense: it seems we already have fine-grained recompilation avoidance. I've just tested the happy path and no file was rebuilt (so copying artifacts would not be needed), only the build plan was regenerated. Is this optimization failing sometimes so that I hear people complaining? @peterbecich: where is "long delay" that you managed to hack around coming from? Is any file rebuilt in the bad and in the good case? |
Describe the bug
I'm converting a stack project to cabal. The project has over 200 packages. I'm noticing that the cabal solver takes 2-3 minutes while stack is pretty much instantaneous. I figure I must not be aware of some cabal feature that stack is leveraging, and that this in fact is not a bug, but I thus far have been unable to contact the maintainers.
To Reproduce
Expected behavior
There should be a way to resolve dependencies quickly. In lieu of that, perhaps there's a way that I'm missing to tell cabal to resolve dependencies once, and then cache the result (because the "Resolving dependencies" step happens whenever
cabal build all
is run).System information
Additional context
cabal v2-freeze
, the dependency resolution time is 90 seconds. That's a lot better, but still very slow.allow-newer
/allow-older
has no effect.reject-unconstrained-dependencies: all
fails becauserts
is not pinned, but it's not available in Hackage as far as I can tell 😕max-backjumps: 0
does not improve the performanceThe text was updated successfully, but these errors were encountered: