Skip to content

Cabal takes 3 minutes to resolve dependencies #7466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dfithian opened this issue Jun 29, 2021 · 53 comments
Closed

Cabal takes 3 minutes to resolve dependencies #7466

dfithian opened this issue Jun 29, 2021 · 53 comments

Comments

@dfithian
Copy link

dfithian commented Jun 29, 2021

Describe the bug
I'm converting a stack project to cabal. The project has over 200 packages. I'm noticing that the cabal solver takes 2-3 minutes while stack is pretty much instantaneous. I figure I must not be aware of some cabal feature that stack is leveraging, and that this in fact is not a bug, but I thus far have been unable to contact the maintainers.

To Reproduce

$ curl https://www.stackage.org/lts-17.10/cabal.config > cabal.project.freeze
$ cabal v2-update
# add `allow-older: all` and `allow-newer: all` to `cabal.project.freeze` and iterate on the solver output
$ cabal v2-build all
Resolving dependencies # takes 2-3 min

Expected behavior
There should be a way to resolve dependencies quickly. In lieu of that, perhaps there's a way that I'm missing to tell cabal to resolve dependencies once, and then cache the result (because the "Resolving dependencies" step happens whenever cabal build all is run).

System information

  • Mac OS Catalina
  • Cabal is 3.4.0.0 (installed with ghcup)
  • GHC is 8.10.4 (installed with ghcup)

Additional context

  • When I run cabal v2-freeze, the dependency resolution time is 90 seconds. That's a lot better, but still very slow.
  • Removing wildcards from allow-newer/allow-older has no effect.
  • Adding reject-unconstrained-dependencies: all fails because rts is not pinned, but it's not available in Hackage as far as I can tell 😕
  • Adding max-backjumps: 0 does not improve the performance
@Mikolaj
Copy link
Member

Mikolaj commented Jun 29, 2021

I hope somebody comes up with a hint to speed it up, particularly when repeating the builds. I'd also try with a nightly build of dev version of cabal (#7458 (comment)).

If no easy solution, it's definitely worth investigating where the time is spent (unless it's widely known for projects of that size with freeze files). I'd guess, in the solver, because checking satisfiability of constraints (freeze file) is a special, but not too special, kind of constraint solving, which is costly even without backtracking.

(BTW, did you try cabal build . or cabal build exe:foo? I seem to remember there was a difference once I used it.)

@fgaz
Copy link
Member

fgaz commented Jun 29, 2021

There should be a way to resolve dependencies quickly. In lieu of that, perhaps there's a way that I'm missing to tell cabal to resolve dependencies once, and then cache the result (because the "Resolving dependencies" step happens whenever cabal build all is run).

The cached build plan should be reused if no flags or metadata changed, and if it isn't that's a bug.
What did you change to cause the second planning?

add allow-older: all to cabal.project.freeze

This may be the reason, possibly exacerbated by the lack of bounds in dependencies too. Why is it needed?

update any static deps in cabal.project.freeze

What do you mean by this?

@gbaz
Copy link
Collaborator

gbaz commented Jun 29, 2021

"I'm noticing that the cabal solver takes 2-3 minutes while stack is pretty much instantaneous. I figure I must not be aware of some cabal feature that stack is leveraging."

There is no such feature. Rather, stack is simply not running a solver at all. Note that your procedure is asking cabal to resolve unrestricted dependencies on an entire stack LTS, not just the packages you actually depend on.

The correct way to convert a stack project to cabal imho is simply to use the generated cabal file directly. You can get the current bounds by entering a stack shell and running cabal gen-bounds among other ways. Having done so, you can then put those bounds directly into your cabal file, and then alter them upwards as you desire.

Further you only need to handle the direct dependencies in a coherent way -- the indirect dependencies will have their correct version picked by the solver automatically.

@fgaz
Copy link
Member

fgaz commented Jun 29, 2021

@gbaz

Note that your procedure is asking cabal to resolve unrestricted dependencies on an entire stack LTS, not just the packages you actually depend on.

Wait, really? I, mean, the constraints are there, but I thought the goals were only the stuff in the dep tree.

I tried running curl https://www.stackage.org/lts-17.10/cabal.config > cabal.project.freeze; cabal build --dry on a project with a small dep tree and it took 4 seconds on an ancient machine (exactly the same as without the freeze file).

@dfithian
Copy link
Author

@Mikolaj, I ran cabal build all --dry-run, since I wanted to build all the libraries and executables. Running cabal build <package> --dry-run takes the same amount of time as all.

The correct way to convert a stack project to cabal imho is simply to use the generated cabal file directly.

This doesn't sound right to me. I thought the point of having cabal.project, along with a freeze file, was to guarantee that you knew each version pin. Traversing 200+ directories and pinning each version in each cabal file is a bit of a blocker.

There is no such feature. Rather, stack is simply not running a solver at all.

How does "not running a solver at all" work?

@Mikolaj
Copy link
Member

Mikolaj commented Jun 30, 2021

Pinning all transitive dependencies versions is a valid use case, even if extreme. However, it does not require pinning versions of all LTS libraries nor explicitly requiring that all transitive dependencies get built, if any of that is going on here.

@dfithian: would you care to share more of your project structure with us? A github repository with a minimal reproduction example would be best. Just a skeleton that is enough to exhibit the problem.

@gbaz
Copy link
Collaborator

gbaz commented Jun 30, 2021

"How does "not running a solver at all" work?"

It works because stackage sets out fixed version bounds for everything. By adding allow-older and allow-newer you're rejecting those bounds, thus obviating the point of using the freeze file at all. The command to generate and freeze working bounds is something resembling what I gave above -- using gen-bounds or the like.

It seems you're suggesting your project literally has 200 packages using one cabal.project file? In this case I see why editing each cabal file directly could be somewhat irritating and you would prefer a freeze file.

In that case, you can keep the freeze file, but simply do not add allow-older and allow-newer. Or once you have a working build, generate a new freeze file that captures the dependencies as calculated (using cabal freeze or the like).

Again, the main point is that if you want to pin all the deps, you should not have allow-older and allow-newer since their purpose is to "unpin" everything again.

@dfithian
Copy link
Author

I managed to get a freeze file with allow-newer: none and allow-older: none, but it didn't help performance at all. When I tried cabal freeze it took the same amount of time to resolve as it did yesterday (around 90 seconds).

One thing I noticed is that if I disallowed backtracking (max-backjumps: 0) the solver step failed, but it didn't actually tell me what it tried to backjump with (even with --verbose). Perhaps there's some cycle that's confusing it and this info could help.

I'll work on getting a reproduction case. I suppose if I had one I'd probably have a better issue than just "help!" 😛 .

@gbaz
Copy link
Collaborator

gbaz commented Jun 30, 2021

Sorry, I still don't understand what you're trying to do here. Why are you worried about the runtime of cabal freeze at all? That should only need to be run once and for all, not repeatedly. Note that if you have a cabal.config as described, this is a constraint set generated by v1-freeze. Running cabal freeze from a modern cabal will do a v2-freeze that captures a very similar constraint set into a cabal.project.freeze file. You only need one or the other of these files, not both. And once you have one and you're happy with it, you don't need to keep generating it.

So yes, cabal may take some time to resolve dependencies, but what workflow do you have where you want it to keep resolving dependencies repeatedly?

If you don't have such a workflow, then there is no bug or issue here, just the general fact that running a solver over a lot of dependencies takes time, and one may wish to adopt any number of workflows that avoid doing so repeatedly.

Edit: It seems like the issue you may be getting at is just that when you have a cabal.project that spans 200 packages and with a lot of dependencies that resolution takes more time than you would expect, even with everything ostensibly pinned through a freeze file? In any case, at this point I'll stand by for a reproduction case, since I confess I'm having a hard time understanding the precise issue...

(Basically, a lot of work has been put into solver performance over the years, and there may be specific issues that could use improvement, but "solver is slow" is not a ticket that can lead to much concrete action, whereas 'solver is being run in case X Y Z and should be bypassed' may stand a better chance. I also wonder how much of the issue is a function of the number of deps, and how much is a function of having, again, as I understand it, 200 packages controlled in a single cabal project?)

@dfithian
Copy link
Author

Sorry, I mean running cabal build all --dry-run after a cabal freeze took the same amount of time, 90 seconds.

@dfithian
Copy link
Author

Okay, I can see where you're coming from when you say "yeah resolving dependencies takes a long time, but it shouldn't get re-run". However, I see this problem in two main places for us.

  1. The solver re-runs when we run --enable-tests versus --disable-tests. That one seems kind of minor, because we could just always run with --enable-tests.
  2. The solver runs in our CI build as well (run using haskell.nix). The solver is getting run whenever CI runs. I think your response to that might be "that's haskell.nix's problem, not ours", which I think is fine, but my reasoning is that if I can get the solver time down it doesn't matter anymore.
  3. Correct me if I'm wrong, but I think the solver is going to run when a new package is added to cabal.project, or a new dependency is added to a .cabal file, or someone rebases (and someone else did one of those things). This happens daily or more per developer in a company of our size.

All said, it's fine if it's not a bug, since we currently have a working path with stack with the same dependencies. I guess I am mostly just confused how I can't get to the same performance as stack. Then again, that may be the magic of stack!

@gbaz
Copy link
Collaborator

gbaz commented Jul 1, 2021

Thank you for further specifying. I think there is an issue here from what you've described. And it seems to be that even when dependencies are fully pinned via a freeze file, generating a build plan can take a surprising amount of time, at least in your specific setting. We think this is not due to the number of dependencies pinned, as some tests seem to show otherwise. But it is not clear if it is due to the number of transitive deps, or if it is due to having, by your description, some hundreds of packages sharing a single cabal.project. I think that's something very much worth improving!

Again, the reason stack does not encounter this is it literally does not run a solver. It just assumes the pinned dependencies are correct, and if there is some issue that occurs in the course of using them, it lets the tooling it invokes complain. Meanwhile. here, it seems the issue is that the cabal solver appears to run even when there is nothing to solve. In most packages, that runtime seems to be pretty efficient, but again, potentially due to the number of packages grouped under a single cabal file (or due to their dependencies), it is not efficient. I think this is worth figuring out, and having a reproducible public test-case is the best way to get there.

@michaelpj
Copy link
Collaborator

Hi, I'd be interested to know whether this might be related to #7472, especially since cabal freeze didn't help much.

Do you have build-tool-depends dependencies on local packages? And do you see messages like the one in that issue in the log of cabal configure -v3?

@dfithian
Copy link
Author

dfithian commented Jul 2, 2021

@gbaz @Mikolaj We're going to create a project with anonymized packages (and no source code) to reproduce. It may take a little while.

@michaelpj Indeed we do, hspec-discover.

cabal configure -v3 > out.txt
cat out.txt | grep 'stanza "test"' | wc -l

=> 5

@peterbecich
Copy link
Member

@dfithian , I think this project replicates the issue: https://github.com/peterbecich/cabal-resolver-issue#readme

@Mikolaj
Copy link
Member

Mikolaj commented Jul 5, 2021

@peterbecich: yay, this is impressive, thank you.

Do I read correctly that not using cabal configure, but using cabal build all --dry solves your problem altogether? BTW, what purpose the cabal configure serves in your use case? How does it fit in your workflow?

Some low hanging fruits, if anybody would like to help: try with master branch, try with/without the freeze file.

@purefn
Copy link

purefn commented Jul 9, 2021

@michaelpj Yes, indeed we do have at least one package that has a build-tool-depends where one or more of the tools is in our cabal project. One package has 5 of them in the dependencies of build-tool-depends.

@Mikolaj One of the areas we're seeing slowness is in our nix builds, which uses haskell.nix. As part of its set up, it uses cabal v2-freeze https://github.com/input-output-hk/haskell.nix/blob/8e146f8d7deeaccf269af73aed80c4bf8f790fe6/lib/call-cabal-project-to-nix.nix#L479

@michaelpj
Copy link
Collaborator

Ah. The ticket I linked hits haskell.nix much worse (input-output-hk/haskell.nix#1156). We're working on a fix, I'll be interested to see if that helps you.

@Mikolaj
Copy link
Member

Mikolaj commented Jul 9, 2021

@dfithian: perhaps you know the answers?

Anybody else having a problem with cabal configure?

@peterbecich: yay, this is impressive, thank you.

Do I read correctly that not using cabal configure, but using cabal build all --dry solves your problem altogether? BTW, what purpose the cabal configure serves in your use case? How does it fit in your workflow?

Some low hanging fruits, if anybody would like to help: try with master branch, try with/without the freeze file.

@dfithian
Copy link
Author

dfithian commented Jul 9, 2021

Hopefully this is a good summary.

Both cabal configure and cabal build all --dry-run have the same initial performance issue of a very long time spent resolving dependencies. As mentioned above, this wouldn't be a huge issue except that we observe it while both switching between --enable-tests and --disable-tests and (this is the bigger issue) as @purefn mentioned in haskell.nix.

After taking a look at @peterbecich's repo, I wrote the script in this PR, but we haven't audited the output yet. I experienced about 40 seconds of resolve time during cabal build all --dry-run. Notably, my script overwrites build-tool-depends, so I'm not sure if it would be even longer if I also obfuscated those fields instead of setting them to the empty list. Maybe, since it's 40 seconds already, we could try to figure out those performance issues first. I'm hoping one of us can audit the output soon, push it up, and then start iterating with the master branch and with/without the freeze file.

@dfithian
Copy link
Author

dfithian commented Jul 14, 2021

Alright we audited the output and merged it in: https://github.com/peterbecich/cabal-resolver-issue

test case time cabal build all --dry-run
existing cabal.project.freeze file 33.64s user 6.09s system 70% cpu 56.381 total
regenerating cabal.project.freeze with cabal freeze 31.57s user 3.24s system 73% cpu 47.132 total
with main cabal branch 31.07s user 2.47s system 81% cpu 41.134 total
with main cabal branch, regenerating cabal.project.freeze 38.07s user 3.28s system 85% cpu 48.574 total

@gbaz
Copy link
Collaborator

gbaz commented Jul 23, 2021

I made an attempt at something here, but I don't think its going to yield significant results: #7491

If one considers how much work is actually being done here even if one thinks of "solving" purely as "validation", 30s seems entirely reasonable to me -- this is walking over 200+ interlinked dep graphs and ensuring they're all coherent, and then producing configurations/build-plans for all of them. 3 minutes seemed concerning, but the 30s cost in the repro repository seems ok. I suspect there's no low-hanging algorithmic fruit to be plucked here, especially since with that PR there are drastically fewer local backjumps but the sameish solving cost. (I.e. the "solving" cost really is just validation cost on a large universe of packages).

That is to say that I'm interested in if the experts think the PR is worth it, but outside of that I imagine the next step is just the usual micro-optimizations on the solver and related code -- i.e. cutting down allocations by unpacking data structures, using more efficient representations, guided by profiling, etc.

The repo is still a very good testbed for the solver, and much appreciated, but unless there's some condition in which that 30s turns into 3m, I don't see any immediate problems here outside of "everything could be somewhat faster with a lot of careful optimization and profiling".

@dfithian
Copy link
Author

Is it possible to disable the solver entirely? I know currently the only option for --solver is modular. Could there also be --solver=none? Does that even make sense?

@jneira
Copy link
Member

jneira commented Jul 23, 2021

mmm stack actually does it, but it has stackage (and stack.yaml explicit versions for hackage) behind to provide a coherent set of dependencies.
maybe cabal could do the same, providing a freeze file with all constraints fixed?

EDIT: I am sure this idea has been discussed before so i guess it wasn ot implemented due to some reason 🤔

@dfithian
Copy link
Author

Yes, exactly my argument. In the example (#7466 (comment)), we froze every package we knew about. Our intention is to use stackage LTS (and in lieu of that, pin dependencies manually).

@gbaz
Copy link
Collaborator

gbaz commented Jul 24, 2021

Right. The problem is that stack does not validate the plan actually works. As my experiment shows (although I would be curious if it yields an improvement on the actual and not test case!) the cost of just constructing and validating a plan for a large universe of packages and dependencies is itself not trivial.

Stack simply doesn't validate the plan -- it just assumes that everything is fine, and then you only notice a failure if something actually fails in the middle of executing the plan and performing the actual builds.

It might be possible to teach cabal how to avoid building/validating a plan altogether as suggested, but for any but power-user cases I'd very much want to discourage this.

@grayjay
Copy link
Collaborator

grayjay commented Jul 29, 2021

I ran the reproduction with testing disabled, and it seemed like more than half of the solver's time was spent descending the search tree without needing to backtrack. So the solver is probably taking a long time to simply validate the large number of dependencies and constraints, as @gbaz suggested. I think this would require profiling next.

One section of the log showed backtracking, when the solver first tried using Cabal-3.4.0.0 for servant-openapi3's setup, descended over 600 levels, and then had to backtrack and use the installed Cabal-3.2.1.0. The solver should never need to backtrack with a freeze file. The problem is that #3502 wasn't fully implemented, and the freeze file doesn't contain separate constraints for each setup or build tool dependency. In this case, adding "servant-openapi3:setup.Cabal ==3.2.1.0" to cabal.project.freeze avoids the backtracking.

The 3 minute run time with build tool dependencies may also be explained by #3502 if they significantly complicate the dependency solving problem.

@gbaz
Copy link
Collaborator

gbaz commented Jul 30, 2021

Some profiling of a run on the test repo:

cata             Distribution.Solver.Modular.Tree        src/Distribution/Solver/Modular/Tree.hs:188:1-44                 11.9   19.6
insert           Distribution.Solver.Types.PackageIndex  src/Distribution/Solver/Types/PackageIndex.hs:(190,1)-(198,36)   10.7    5.2
srcpkgPackageId  Distribution.Solver.Types.SourcePackage src/Distribution/Solver/Types/SourcePackage.hs:21:5-19            9.4    0.0
binaryGetMD5     Distribution.Utils.MD5                  src/Distribution/Utils/MD5.hs:(54,1)-(57,28)                      8.4   18.1
mapWithKey       Distribution.Solver.Modular.WeightedPSQ src/Distribution/Solver/Modular/WeightedPSQ.hs:(72,1)-(73,71)     7.5    5.9
structuredEncode Distribution.Utils.Structured           src/Distribution/Utils/Structured.hs:271:1-52                     6.4    5.4
ana              Distribution.Solver.Modular.Tree        src/Distribution/Solver/Modular/Tree.hs:199:1-36                  4.7    4.4
maximumBy        Distribution.Solver.Modular.PSQ         src/Distribution/Solver/Modular/PSQ.hs:(87,1)-(88,47)             2.5    0.0

I think the cata implementation seems potentially inefficient -- it uses clever recursion patterns but might more profitably be written directly? Ditto with ana.

The two PSQ operations that are costly tend to indicate to me that our PSQ usage might be improved if we switch to using vectors rather than lists as the underlying structure? Or maybe just arrays to avoid growing the dep footprint...

Some low hanging fruit appears to be srcpkgPackageId -- its just a record accessor, but its called 36,817,506 times, which seems an awful lot since this is invocations from insert on the pkgindex, and that only gets called 306 times!

The call to insert itself confuses me. It should be incredibly cheap since its just a map insert, but when assertions are on you get the extremely expensive invariant to check on each insert:

invariant :: Package pkg => PackageIndex pkg -> Bool

That invariant check does a full traversal of the whole index!

I'm guessing we're not building cabal with assertions explicitly turned off, which could account for this?
I would suggest that we audit the code such that we turn off or make optional all invariant checks.

@gbaz
Copy link
Collaborator

gbaz commented Jul 30, 2021

Ok, so adding ghc options to disable assertions and adding optimization levels to ghc opts did not seem to fix this?? However, explicitly turning the invariant check into a no-op did work, so I wonder if there's a problem here that's led to some of the slowdown...

@gbaz
Copy link
Collaborator

gbaz commented Jul 30, 2021

Finally on binaryGetMD5 I strongly suspect that's a red herring that's actually eating the cost of the entire structured decode in particular of the large fileMonitorState for the huge repo.

Note that there's an autoderived binary instance here, which could well be pretty inefficient. In this particular case, maybe moving to a handwritten binary instance could help?

@grayjay
Copy link
Collaborator

grayjay commented Jul 31, 2021

The call to insert itself confuses me. It should be incredibly cheap since its just a map insert, but when assertions are on you get the extremely expensive invariant to check on each insert:

Nice find! We could call expensiveAssert so that the invariant is only enabled with the package flag debug-expensive-assertions, which is used in CI.

@gbaz
Copy link
Collaborator

gbaz commented Jul 31, 2021

vis a vis WeightedPSQ.mapWithKey I'm actually now dubious its a genuine cost center -- it really is just a list map. I suspect that function given as an argument is getting attributed to it. The two central costs are in validateLinking and validateTree, and its just part of the cata, recurring down further into branches. So its back to the cata recursion pattern itself being rather expensive. I wonder if it also does too much work here since its a bottom-up traversal, which may get in the way of "short cuts", and I also wonder if cata is the right choice at all given that these are actually monadic traversals in the Validate monad.

@gbaz
Copy link
Collaborator

gbaz commented Aug 2, 2021

Testing further reveals indeed that if I get rid of the structured stuff, the full cost getting attributed to binaryGetMD5 doesn't go away, instead its just attributed to structuredDecode directly -- so the structure stuff is a red herring, but the binary instances as a whole are still a cost. Also I now recognize that this isn't just the FileMonitor structure or whatever that's getting (de)serialized -- its a whole variety of cache files.

The sizes of said files in the cache:

28M Aug  2 19:04 improved-plan
15M Aug  2 19:04 elaborated-plan
1.2M Aug  2 19:04 plan.json
457K Aug  2 19:04 source-hashes
9.7M Aug  2 19:04 solver-plan
12K Aug  2 18:53 compiler
2.1M Aug  2 18:53 config

So, hrm, I suspect the const of deserializing them is maybe unavoidable. That said, I think the logic for checking when cached things are out of date could more effectively short-circuit reading these files when it knows they'll be invalid regardless. It just so happens that when these files are not so lolhueg this additional cost probably wasn't noticed. Will investigate further...

@newhoggy
Copy link

newhoggy commented Aug 3, 2021

Does cabal gen-bounds work for multi-projects?

I'm getting these sorts of errors:

$ cabal gen-bounds
Resolving dependencies...
cabal: Could not resolve dependencies:
[__0] trying: cardano-ledger-core-0.1.0.0 (user goal)
[__1] unknown package: small-steps (dependency of cardano-ledger-core)
[__1] fail (backjumping, conflict set: cardano-ledger-core, small-steps)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: cardano-ledger-core, small-steps

@gbaz
Copy link
Collaborator

gbaz commented Aug 3, 2021

@newhoggy sounds like it may be a bug -- this ticket is not exactly the right place to discuss it though. I suggest you create a small repro and file a new ticket?

@newhoggy
Copy link

newhoggy commented Aug 3, 2021

Done #7504

@gbaz
Copy link
Collaborator

gbaz commented Aug 6, 2021

The above PR significantly reduces the binary deserialization time (in the above profile attributed to binaryGetMD5).

With that fix (and the insert assert fix) the profile now looks like this:

cata               Distribution.Solver.Modular.Tree        src/Distribution/Solver/Modular/Tree.hs:188:1-44                15.7   25.7
mapWithKey         Distribution.Solver.Modular.WeightedPSQ src/Distribution/Solver/Modular/WeightedPSQ.hs:(72,1)-(73,71)   10.1    7.8
structuredEncode   Distribution.Utils.Structured           src/Distribution/Utils/Structured.hs:273:1-52                    9.3    6.3
ana                Distribution.Solver.Modular.Tree        src/Distribution/Solver/Modular/Tree.hs:199:1-36                 6.7    5.8
maximumBy          Distribution.Solver.Modular.PSQ         src/Distribution/Solver/Modular/PSQ.hs:(87,1)-(88,47)            3.8    0.0
debugNoWrap        Distribution.Simple.Utils               src/Distribution/Simple/Utils.hs:(540,1)-(548,17)                3.6    3.5
binaryGetMD5       Distribution.Utils.MD5                  src/Distribution/Utils/MD5.hs:(54,1)-(57,28)                     3.4    5.2

So the only reasonable thing to handle next is going to be handling cata itself, and replacing it (at least in important places) with a direct traversal.

@gbaz
Copy link
Collaborator

gbaz commented Aug 12, 2021

With that last merge, I think I'm comfortable closing this ticket. There might be more performance to gain, but certainly this is enough for "one round" of work.

@peterbecich
Copy link
Member

peterbecich commented Sep 26, 2021

It appears that when checkValueChange cachedKey is disabled here

= checkValueChange cachedKey
`mplusMaybeT`
checkFileChange cachedFileStatus cachedKey cachedResult

like so

          = -- checkValueChange cachedKey
            --   `mplusMaybeT`
            checkFileChange cachedFileStatus cachedKey cachedResult

the issue disappears.

With that change, running cabal build all --enable-tests and then cabal build all --disable-tests does not incur a long delay. This is also true for running cabal build all --disable-tests and then cabal build all --enable-tests. I have tested this with https://github.com/peterbecich/cabal-resolver-issue

I do not know the ramifications of disabling checkValueChange cachedKey. Building the cabal-resolver-issue example project appears to complete without error.


GHC 8.10.7
Cabal built from c4fb167

@Mikolaj
Copy link
Member

Mikolaj commented Sep 27, 2021

@peterbecich: thank you for the experiment. This is important, people worldwide suffer from the rebuilds (and supposedly stack rebuilds in this case, too, so that's really a universal problem). Does anybody know what the checkValueChange does here and if there are conditions under which it could be safely skipped?

@Mikolaj
Copy link
Member

Mikolaj commented Sep 27, 2021

BTW, as written here #3343, the worry is that cabal clobbers files when building with different flags and that is why it needs to rebuild them from scratch just in case. I guess there are cases where adding tests changes dependencies and so the other components are rebuilt and so their products clobbered and so need to be rebuilt afterwards. Not sure if cabal tries to mitigate that somehow, e.g., checking build plans, hashes, etc. to avoid rebuilding on a smaller scale than whole build profiles.

@gbaz
Copy link
Collaborator

gbaz commented Sep 27, 2021

The fix in #7516 as well as others are about as much performance as I think can reasonably be gained.

I think what you're proposing here is wrong -- it simply does not regenerate the build plan when the tests enabled flag is changed. But the desired behavior is that enabling or disabling tests should regenerate the plan, because the dependencies chosen are different if test stanzas are taken into account or not.

(i.e. its not that checking the value for change is expensive, its that when we don't check the value for change, we just use a cached plan generated under different flags -- which is a wrong one.)

@gbaz
Copy link
Collaborator

gbaz commented Sep 27, 2021

I will note there's one future idea that could improve things still further, which is to have not just a single cache file, but for some subset of flags, a cache file per flag set. That way we would still need to solve at least once in each case, but toggling the flag back and forth further could make use of the correct cached plans. This would need some careful architecture/thinking as currently cache files don't have a way to be "per-X" for various input parameters. This work could be tackled in conjunction with #7502

@Mikolaj
Copy link
Member

Mikolaj commented Sep 27, 2021

@gbaz: however, can't we generate both buildplans, take the intersection of packages they depend on, hash versions of these packages in both buildplans and if the hashes match, not wipe out the stuff when switching between the buildplans?

@gbaz
Copy link
Collaborator

gbaz commented Sep 27, 2021

How do I put this politely? That's completely not feasible given my understanding of how the solver and buildplans work. Further, even if it could work, it would still be less nice in all axes than simply caching a plan with and without the flag.

@Mikolaj
Copy link
Member

Mikolaj commented Sep 27, 2021

@gbaz: to be clear, we are blue sky brainstorming, as opposed to considering re-opening this issue you so skilfully put to death and demanding that you realise our deranged dreams. ;D

In fact, I was hoping something like the hashing of deps and deciding if we can just let them stay is already present in some dark corner of cabal, just not firing up in this particular case.

@grayjay: is there any such functionality hidden somewhere or a ready toolbox from which @peterbecich could concoct a correct solution along these veins?

@gbaz
Copy link
Collaborator

gbaz commented Sep 27, 2021

I'll leave this issue be after this, but again, there's no reason to do any of that complicated hash-sharing stuff when we could just have one cache per flag set for some common flags. Its a much simpler architectural approach, and gets the same benefit, if not more. Under this proposed more complicated solution, one would need to run the solver once in each configuration anyway.

@Mikolaj
Copy link
Member

Mikolaj commented Sep 27, 2021

You are right that after one recompilation with tests and one without, we are better off just caching both separately. And that if the intersection of the build plans differs, we've only wasted time computing the hashes.

However, I think users care about the common case of compatible buildplans and of building once without tests (and generating some executables or packages), once with tests (and running the tests) and wiping out the repo altogether. I imagine some CI scripts or OS package builders doing just that.

@gbaz
Copy link
Collaborator

gbaz commented Sep 27, 2021

Well in that case they can always configure with --enable-tests and never switch -- there's no downside to it! It doesn't run the tests, it just creates a buildplan that enables them. In fact, in general, most people may as well configure with --enable-tests most of the time, so the constant need to switch buildplans here is a bit odd to me as a use case at all.

@Mikolaj
Copy link
Member

Mikolaj commented Sep 27, 2021

Well in that case they can always configure with --enable-tests and never switch -- there's no downside to it! It doesn't run the tests, it just creates a buildplan that enables them. In fact, in general, most people may as well configure with --enable-tests most of the time, so the constant need to switch buildplans here is a bit odd to me as a use case at all.

Agreed. Perhaps we should publicize that. I think people may 1. not think ahead 2. be scared that --enable-tests would run the tests.

BTW, about hashes vs caches, I guess these are not mutually exclusive solutions: one can have separate cache for builds with tests enabled, but just copy over (preserving time) artifacts from no-tests builds. However, copying files takes time, so this is less attractive. I agree I'd rather have cashes over hashes and making --enable-tests standard (perhaps make it a default?) is best of both worlds.

@Mikolaj
Copy link
Member

Mikolaj commented Sep 27, 2021

BTW, I'm probably spewing nonsense: it seems we already have fine-grained recompilation avoidance. I've just tested the happy path and no file was rebuilt (so copying artifacts would not be needed), only the build plan was regenerated. Is this optimization failing sometimes so that I hear people complaining?

@peterbecich: where is "long delay" that you managed to hack around coming from? Is any file rebuilt in the bad and in the good case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants