Skip to content

Update gh actions #1118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 123 commits into from
Closed

Update gh actions #1118

wants to merge 123 commits into from

Conversation

mauwii
Copy link
Contributor

@mauwii mauwii commented Oct 16, 2022

Dios mio - it was such a pain in the ... to get this working, but damn does it feel good now where it worked out ^^

The test-invoke-conda.yml is now working with latest ubuntu and macOS, also activating the conda environments and attaching the generated pictures for each OS. It would be a good Idea to set your main branch protection to only allow merging pull requests if this action is successful, since it is confirming the usability on a clean system (so I would even go so far and say people with problems have one of two problems: 1) incompatible machine or 2) some dirty setup (like messed up libraries, packages for wrong architecture, ....)

My next try is to include windows in the os matrix as well, but since I almost never have been using windows for CI/CD stuff, I did not want to wait with the PR till it is included, since I even don't know if it will work out or not.

@lstein: I tried to merge this with the dev-branch, but got merge conflicts :-/ But since I do not have Merge-Conflicts when merging it into main and it also did not change code but confirm it's functionality, I thought it would be a good Idea to directly merge it into main. I hope you see this similar 🙈

The mkdocs-material action does now deploy the whole docs without breaking emojis and stuff 🤓 It also runs when pushing/Pull requesting main- or development branch, but only deploys when running from refs/heads/main, so imho this could also be included to the main-branch-protection

Hope you like this, have a good one!

mauwii and others added 30 commits October 12, 2022 05:54
* fix mkdocs deployment

* update path to python bin

* add trigger for current branch

* change path to python_bin for mac as well

* try to use setup-python@v4 instead of setting env

* remove setup conda action

* try to use $CONDA

* remove overseen action

* change branch from master to main

* sort out if then else for faster syntax

* remove more if functions

* add updates to create-caches as well

* eliminate the rest of if functions

* try to unpin pytorch and torchvision

* restore pinned versions

* try switching from set-output to use env

* update test-invoke-conda as well

* fix env var creation

* quote variable

* add second equal to compare

* try another way to use outputs

* fix outputs

* pip install for mac before creating conda env

* fix output variable

* fix python bin path

* remove pip install for before creating conda env

* unpin streamlit version in conda mac env

* try to make git-workflows better readable
@mauwii
Copy link
Contributor Author

mauwii commented Oct 16, 2022

Ah, 4got to clarify why I unpinned all versions from the packages in environment-mac.yml:

first of all --> this was the only solution to get it working, but also one of the functions of conda is to resolve conflicts between packages and another function is the possibilty of updating a environment, which both of em is of course not possible while every version is pinned. The Conda Documentation also mentions to use another file to pin versions if it is necesarry to prevent a package from updateing...

So imho this was the best solution for this problem (and it should even solve the problem of people with macs who cant build the environment). And I think there is no better proof than the succeeding workflow ;P

@mauwii
Copy link
Contributor Author

mauwii commented Oct 16, 2022

Maybe I already managed to integrate a windows-runner, but unfortunatelly it fails by having not enough RAM 😕

* Initializing, be patient...

>> GFPGAN Initialized
>> CodeFormer Initialized
>> ESRGAN Initialized
>> Loading model from models/ldm/stable-diffusion-v1/model.ckpt
>> Calculating sha256 hash of weights file
>> sha256 = fe4efff1e174c627256e44ec2991ba279b3816e364b49f9be2abc0b3ff3f8556 (14.07s)
Traceback (most recent call last):
  File "D:\a\stable-diffusion\stable-diffusion\scripts\invoke.py", line 586, in <module>
    main()
  File "D:\a\stable-diffusion\stable-diffusion\scripts\invoke.py", line 107, in main
    gen.load_model()
  File "d:\a\stable-diffusion\stable-diffusion\ldm\generate.py", line 660, in load_model
    model = self._load_model_from_config(self.config, self.weights)
  File "d:\a\stable-diffusion\stable-diffusion\ldm\generate.py", line 799, in _load_model_from_config
    pl_sd = torch.load(io.BytesIO(weight_bytes), map_location='cpu')
  File "C:\Miniconda3\envs\invoke\lib\site-packages\torch\serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "C:\Miniconda3\envs\invoke\lib\site-packages\torch\serialization.py", line 1046, in _load
    result = unpickler.load()
  File "C:\Miniconda3\envs\invoke\lib\site-packages\torch\serialization.py", line 1016, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "C:\Miniconda3\envs\invoke\lib\site-packages\torch\serialization.py", line 997, in load_tensor
    storage = zip_file.get_storage_from_record(name, numel, torch._UntypedStorage).storage()._untyped()
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 3276800 bytes.

@lstein
Copy link
Collaborator

lstein commented Oct 16, 2022

I'm so glad that someone has done this. I was afraid to touch these files! I'm OK with it, but would like to see confirmation from the other reviewers before I go ahead and merge into main.

Copy link
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insofar as I understand how the workflows work, this looks good to me. I also think it appropriate to merge into main. Once it goes into main, I'll cherry-pick into development to avoid the conflicts.

Actually this is an awful lot of commits. Could you rebase the whole thing into one or two big commits?

Copy link
Collaborator

@lstein lstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase your PR to merge the 122 commits into a single one. To do this at the command line, find the last commit from before your first commit, and run git rebase <last commit>

@mauwii
Copy link
Contributor Author

mauwii commented Oct 16, 2022

Please rebase your PR to merge the 122 commits into a single one. To do this at the command line, find the last commit from before your first commit, and run git rebase <last commit>

Are you sure? Since there are also commits of other people in between. Wouldn't it be easier if you squash-merge it onto the main?

And yes, sorry for high amount of commits, but since you cannot test CI/CD stuff locally I was forced to "commit early" ;P

@tildebyte
Copy link
Contributor

@lstein I think in this instance we're OK to squash merge

@mauwii
Copy link
Contributor Author

mauwii commented Oct 16, 2022

@lstein I think in this instance we're OK to squash merge

Thx - there should even be no merge conflicts by doing so, while to rebase it with all the other commits in between creates a conflict on my side.

@tildebyte
Copy link
Contributor

@mauwii;

I unpinned all versions from the packages in environment-mac.yml

Please do so for environment.yml as well

Also, 🌮, this is something which really needed doing. I plan to follow up the installer work which cmdr2 is doing with something which will unpin everything on dev, and have everything properly pinned on main for releases (pip compile, if you want to take a look)

@mauwii
Copy link
Contributor Author

mauwii commented Oct 16, 2022

I do not get the point of pinning and using anaconda, couldn't you then instead not just use requirements.txt and venv (both already available in python and not creating needs to install conda or anything)?

So I didn't want to do something with environment.yml since it is actually building, but ofc could look into this as well.

Planned to look into the docker things since I learned a lot about kubernetes and want to grow knowledge in that direction, so if this also plays into your cards it would be nice 👯

@tildebyte
Copy link
Contributor

@mauwii;

you cannot test CI/CD stuff locally

Right, so what you do is open a PR against your fork, and directly merge into your 'main'. Once everything works as you want it to, you clean up your branch and open a PR against the upstream repo's 'main' (don't forget to git reset --hard your 'main'!)

${{ env.cache-name }}
restore-keys: ${{ env.cache-name }}

- name: Use Cached Stable Diffusion v1.4 Model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During my (unfortunately very lengthy and time-consuming 🙄) testing when I was trying to get Windows working, without success, I noticed that on average it's much faster to download the model every time because when you stash something in the lookaside cache to reuse, it gets zipped.

So instead of a straightforward download from Hugging Face's servers, there's a much more complicated download/unzip/move/not sure what else which usually takes longer (and yeah, this is across all three platforms).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest - I didn't find out how I need to create the secret with a token in a way that it is valid as long as the token - I cheated on my side with this secret (while still obtaining it from the huggingface site), but this was about 50% the speed I get from downloading it out of the cache (and the cache would still be valid after the secret has already lost it's authorization 🙈)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot about that - I had to set up my token for it to work properly. The main repo should never have this issue, and if it does, it's something which we'd need to fix immediately

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should I disable the caching and always download, r u sure? imho this can be rly helpfull when there are many runs and saves bandwith of the backbone for people watching cat videos on YT ;P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho this can be rly helpfull when there are many runs

That's what I'm saying - it isn't. I was absolutely stunned by this at first, so I went down a total rabbithole of stopping work on getting Windows working just to focus on the model download.

ON AVERAGE, downloading the model is faster (by a lot) than caching it.

OTOH, "saves bandwith of the backbone" is another reason why we want to limit the number of times we run the suite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for me the download from huggingface usually had betwen 50-100MB/s while GH-Cache had about 150MB/s, which is also handled internally and not over the web (where I am not sure if downloading from huggingface is maybe slower because it needs to be routed externally), which would lead to --> using cache doesn't bother the backend at all ;P

@tildebyte
Copy link
Contributor

@mauwii;

I do not get the point of pinning and using anaconda

I agree with you entirely about anaconda, but the best practice is to keep all dependencies un-pinned during development, then as the absolute last action before release, after the final merge to 'main', you run a utility to pin every dependency (and these days every dependency should be hashed as well), and then that's your release.

It doesn't make sense (for a lot of reasons) to release with unpinned dependencies; one big one for us is simply that no environment is reproducible at that point, because pip (which IIUC is also conda's resolver) will resolve all dependencies on every install. Given the pace of modern software development, you WILL be in a situation where someone gets a working install one day, wipes it (for whatever reason) the next day, and installs again, only to find that their environment is completely broken because the dependencies can no longer be resolved.

TL;DR the burden of resolving (and pinning!) all dependencies to a known state is on us, and should be done at the end of a development cycle.

@tildebyte
Copy link
Contributor

@mauwii; I don't know why I didn't just link to this in the first place https://medium.com/packagr/using-pip-compile-to-manage-dependencies-in-your-python-packages-8451b21a949e

Even if you only read the "The problem" section, it should be clear why pinning releases is best practice

@mauwii
Copy link
Contributor Author

mauwii commented Oct 16, 2022

not sure why I cannot answer to your last comment @tildebyte, but just to mention it: I am very aware of CI/CD since working as DevOps since some Years 🤓

I just ment that it would be easier in release then instead of using Conda to stick with requirements.txt since nobody needs to install anything besides a compatible version of python and can then

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Why a production release should have pinned requirements is pretty obvious - but it would f.e. also differ between X86, ARM, and else (which is not done by the current environment yamls and therefore causes trouble sinc one party always has problems as it seems)

But I am totally with you about version pining for releases !

@tildebyte
Copy link
Contributor

We're actually agreeing, but I'm not communicating effectively 😁

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

There's a step before that (which should happen as the absolutely last thing done before release), which generates a fully-resolved, pinned, hashed 'requirements.txt' which should (given some luck) work on all platforms (pip hashes all of the wheels which it finds for a given package, but sometimes things don't work as they should :D)

So my proposal will be (in agreement with you) to completely drop Conda and replace it with 'pip compile'/'pip'/'venv', which currently (still a work in progress) looks like (this would be in a script)

# Commented lines are development-time only - end users should never run these. Also note
# that 'pip-tools' IS NOT an end-user dependency
# pip install pip-tools
# pip-compile -r --annotate --annotation-style split --generate-hashes --emit-index-url --allow-unsafe --reuse-hashes requirements.in
python -m venv invokeai
# IMO it will be easier for users to remember this, and we avoid any possibility of collision
source invokeai/Scripts/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
# A source directory cannot be hashed, so pip will refuse to install it from a
# hashed 'requirements.txt' - it must be done "manually"
pip install -e .

@mauwii
Copy link
Contributor Author

mauwii commented Oct 16, 2022

@tildebyte: Yes, I just tried to point out that we aggre in why it should be pinned, but what is not clear to me in the end: Where would you get the different requirement files from (for Linux, Windows, Mac, and both in at least ARM and x64 imho). to in the end have something like

  • requirements-linux-arm.txt
  • requirements-linux-x64.txt (or amd64)
  • requirements-macos-arm.txt
  • ...

since you will be out of luck in having one allmighty requirements.txt

I think this should also be done with GH Actions but I am not sure if they have ARM runners available for free, since I am usually working on Azure-DevOps (fun fact: I just learned when using a macOS agent in DevOps it actually reffers to GitHub Runners xD )

@mauwii mauwii closed this Oct 16, 2022
@mauwii mauwii deleted the update-gh-actions branch October 16, 2022 17:23
@mauwii mauwii mentioned this pull request Oct 16, 2022
@tildebyte
Copy link
Contributor

Where would you get the different requirement files from

It shouldn't be necessary to have them, because "pip hashes all of the wheels which it finds for a given package" - but it's true that it doesn't always work out in practice. We might have to rely on having a contributor with e.g. ARM hardware who could generate the 'requirements.txt' for us. I think, at least to get started, we should generate a separate 'requirements.txt' for each platform (OS + hardware) and then diff them

Basically, this

If we are forced to go the route of per-platform requirements files, there's a spec for that (wish I'd known sooner) e.g. 'win32-py3.7-requirements.txt', 'macos-py3.10-requirements.txt'

@mauwii
Copy link
Contributor Author

mauwii commented Oct 16, 2022

just to mention it again: You are already using a Conda environment with pinned version, one for mac and one for linux.

Since macOS is imho the hugest one regarding Users with ARM CPUs, it is already pointing out that one of both fractions always has problems to install the conda environment, then new versions are pinned in the environment, then somehow the other half is starting to get errors.

I have a Azure DevOps Instance with 10 Agents available (where I also can run on ARM or on amd64), and could of course create a project for Invoke-AI in it, but it would then maybe make more sence to "pull" me in the organisation so that we can directly do this with the main repo and not automate something like main -> fork -> main

BEsides that it would also be possible to achieve something with Docker where you could use Moby to build f.E. a ARM64 container on x64 Hardware, install the Conda Env, Freeze the PIP-State, export it, ....

@tildebyte
Copy link
Contributor

You are already using a Conda environment with pinned version, one for mac and one for linux

I might be misunderstanding what you're trying to say here, but from what I can see, the problem is Conda , not pinning, especially given the fact that we have a separate file just for Mac

People who are directly creating venvs, or (me) using pipenv, and then installing directly via 'pip', have far fewer problems thann people using Conda on any environment

@mauwii
Copy link
Contributor Author

mauwii commented Oct 16, 2022

No, Conda solves a problem here, since it is able to resolve conflicts between packages by itself. But this ability is drawn away from Conta when conseqently pinning all packages to its strict version (instead f.E. using packagename==1.2.* or packagename>=1.0, <2.0)

But yeah, in the end conda is also totally overpowered for the usecase here (at least from what I found out about conda yet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants