-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Add node-based invocation system #1047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Great work! I love the pipe syntax in the CLI. I'm worried about the comment that looping constructs are not supported and would like to dig into this limitation a bit more deeply. Is there any reason that what the user sees is this:
and what happens under the covers is this loop (pseudo code)?
Basically I want to preserve the user experience as much as possible. I'll happily deprecate the magic post processing switches ( |
Upscale only accepts a single image as input, so you'd end up having to expand every node into a loop iteration until you either close the loop with something like a "gather" node or stop executing (leaving leaf nodes as results). This would be functionally equivalent to just setting up N copies of your txt2img -> upscale nodes (i.e. unwrapping the loop yourself). I guess another way of putting it is: do we want looping supported in the core execution/state management, or can the UI/CLI handle that for us by manipulating the graph? I think for things like grid I'll probably have to implement a gather type of mapping (i.e. allow |
I had a brainwave when I read this. There's already a model of how to handle processing/editing/modification metadata ("capture the process") - it's called a sidecar file (a '.xmp' file next to the original image), and it's been used in Lightroom since forever. The advantages here are
Obviously that doesn't all map 1-to-1 to our use case, but I think it's analogous. In my mind, the prompt, and basic settings (sample scheduler, steps, CFG scale, etc.) define the image. Upscaling, inpainting, outpainting, etc. are operations performed on the existing base generation. Being able to download a small XMP-like text file and apply it to an existing image would be a UX win. Now that I think of it, it also solves the "preset" problem! - there's nothing that says that sidecar files must have an originating image. Again, obviously, this doesn't cover ALL scenarios or corner cases, but I think it solves the problem in a relatively elegant way, without having to shoehorn tons of non-standard text data into text records in PNG files. P.S. I can already hear "but now you have to manage TWO files"... If you're a "serious" artist you should already have a workflow, and for digital artists (I'd even consider "straight" photographers who do minimal retouching to be "digital artists" in this sense) that had better already include file management. Casual users probably don't care as much about this stuff... |
Here's an example of a session json, which contains all of the metadata for an image generation: {
"id":"3wBcdcqdRdm95ZPShvhLww==",
"invocations":{
"0":{
"id":"0",
"type":"txt2img",
"prompt":"a cat wearing a funny hat",
"seed":0,
"steps":10,
"width":512,
"height":512,
"cfg_scale":7.5,
"sampler_name":"k_lms",
"seamless":false,
"model":"",
"progress_images":false
},
"1":{
"id":"1",
"type":"show_image",
"image":{
"image_type":"results",
"image_name":"3wBcdcqdRdm95ZPShvhLww==_0_1665464353.png"
}
}
},
"links":{
"0":[],
"1":[
{
"from_node_id":"0",
"from_field":"image",
"to_field":"image"
}
]
},
"invocation_results":{
"0":{
"invocation":{
"id":"0",
"type":"txt2img",
"prompt":"a cat wearing a funny hat",
"seed":0,
"steps":10,
"width":512,
"height":512,
"cfg_scale":7.5,
"sampler_name":"k_lms",
"seamless":false,
"model":"",
"progress_images":false
},
"outputs":{
"type":"image",
"image":{
"image_type":"results",
"image_name":"3wBcdcqdRdm95ZPShvhLww==_0_1665464353.png"
}
}
},
"1":{
"invocation":{
"id":"1",
"type":"show_image",
"image":{
"image_type":"results",
"image_name":"3wBcdcqdRdm95ZPShvhLww==_0_1665464353.png"
}
},
"outputs":{
"type":"image",
"image":{
"image_type":"results",
"image_name":"3wBcdcqdRdm95ZPShvhLww==_0_1665464353.png"
}
}
}
},
"history":[
"0",
"1"
]
} This needs a bit more work (there's some duplication in there that's not necessary, and we may want to either pre-compute seed on a node or output the seed that was used), but this should have everything you'd need to re-create the image. This gets stored in a json file in the sessions output directory. |
Please see this proposal for plugins in AUTOMATIC1111 for some ideas on this topic. InvokeAI is the next best project for this kind of middleware AI platform, as it has a reasonably large community as well. I like the invocations but we should make sure we group things by model, for example the txt2img and img2img invocation should specify StableDiffusion as we could support other diffusion models or VQGAN+CLIP even. I'm guessing the node-based invocation app is different from your main web UI? If so, do you plan on updating your web UI to work more as an invocation tool palette, as described in the proposal? Thus the pre-packaged web UI would act as the most minimal |
I believe the plan is to have the node based UI available for advanced use, with the crafted UI adapted to utilize the node backend, but still run a fairly crafted experience. I've anticipated different models on txt2img and img2img already, though it's also easy to add additional nodes to support different scenarios. The goal is to avoid nodes having too much functionality in a single node. While this will make some of them more complex to use, it will also make the system as a whole more powerful. |
Can we provide installation procedures to our invocations? i.e. do the StableDiffusion nodes handle the installation for the model or does it still come as part of the core? |
It's still part of core. Most of "core" needs refactoring though. I'd love to support plugins for nodes (and it should be pretty straightforward to support), but haven't got there yet. |
Definitely something we have to look at. Installing models and having it usable is the kind of thing people only ever wanna do once in their lifetime. As a developer I would much rather use MiDaS as part of a repository that handles all its checkpoint management, model loading, cloning repos, that kind of shit. In the future if I want to use the latest model or technique, it should be done through pulling an InvokeAI plugin, not by manually git cloning some repository doing pip installs and running some command line script. |
There's currently a series of command-line arguments in the invoke script
for loading, editing and switching among models:
- !import_model -- imports and configures a model using its weights file
- !edit_model -- edit the configuration of the model (e.g. set default
image size and description)
- !del_model -- delete the model
- !switch -- quickly switch among models without leaving the script
The backend code was designed to make it easy to expose this functionality
in the WebGUI.
Lincoln
…On Sun, Oct 16, 2022 at 11:26 PM Nicolas Martel ***@***.***> wrote:
Can we provide installation procedures to our invocations? i.e. do the
StableDiffusion nodes handle the installation for the model or does it
still come as part of the core?
—
Reply to this email directly, view it on GitHub
<#1047 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA3EVKI5G2BB3EJLCIP3FTWDTBNDANCNFSM6AAAAAARB6BYEM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*Lincoln Stein*
Head, Adaptive Oncology, OICR
Senior Principal Investigator, OICR
Professor, Department of Molecular Genetics, University of Toronto
Tel: 416-673-8514
Cell: 416-817-8240
***@***.***
*E**xecutive Assistant*
Michelle Xin
Tel: 647-260-7927
***@***.*** ***@***.***>*
*Ontario Institute for Cancer Research*
MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G
0A3
@OICR_news
<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Foicr_news&data=04%7C01%7CMichelle.Xin%40oicr.on.ca%7C9fa8636ff38b4a60ff5a08d926dd2113%7C9df949f8a6eb419d9caa1f8c83db674f%7C0%7C0%7C637583553462287559%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PS9KzggzFoecbbt%2BZQyhkWkQo9D0hHiiujsbP7Idv4s%3D&reserved=0>
| www.oicr.on.ca
*Collaborate. Translate. Change lives.*
This message and any attachments may contain confidential and/or privileged
information for the sole use of the intended recipient. Any review or
distribution by anyone other than the person for whom it was originally
intended is strictly prohibited. If you have received this message in
error, please contact the sender and delete all copies. Opinions,
conclusions or other information contained in this message may not be that
of the organization.
|
While working on moving my work on the UI to this branch, I ran into a very confusing error. Writing here in case anybody else runs into this. The error: python scripts/invoke-new.py --api --host 0.0.0.0
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/selector_events.py", line 256, in _add_reader
key = self._selector.get_key(fd)
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/selectors.py", line 193, in get_key
raise KeyError("{!r} is not registered".format(fileobj)) from None
KeyError: '7 is not registered'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/spencer/Documents/Code/stable-diffusion/scripts/invoke-new.py", line 20, in <module>
main()
File "/Users/spencer/Documents/Code/stable-diffusion/scripts/invoke-new.py", line 12, in main
invoke_api()
File "/Users/spencer/Documents/Code/stable-diffusion/ldm/invoke/app/api_app.py", line 152, in invoke_api
loop = asyncio.new_event_loop()
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/events.py", line 761, in new_event_loop
return get_event_loop_policy().new_event_loop()
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/events.py", line 659, in new_event_loop
return self._loop_factory()
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/unix_events.py", line 54, in __init__
super().__init__(selector)
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/selector_events.py", line 56, in __init__
self._make_self_pipe()
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/selector_events.py", line 107, in _make_self_pipe
self._add_reader(self._ssock.fileno(), self._read_from_self)
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/selector_events.py", line 258, in _add_reader
self._selector.register(fd, selectors.EVENT_READ,
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/selectors.py", line 523, in register
self._selector.control([kev], 0, 0)
TypeError: changelist must be an iterable of select.kevent objects
Exception ignored in: <function BaseEventLoop.__del__ at 0x1054f9c10>
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/base_events.py", line 688, in __del__
self.close()
File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/unix_events.py", line 63, in close
if self._signal_handlers:
AttributeError: '_UnixSelectorEventLoop' object has no attribute '_signal_handlers' The solution is to Apparently there is some issue with |
- NEVER overwrite user's existing models.yaml - Instead, merge its contents into new config file, and rename original to models.yaml.orig (with message) - models.yaml has been removed from repository and renamed models.yaml.example
- Faster startup for command line switch processing - Specify configuration file to modify using --config option: ./scripts/preload_models.ply --config models/my-models-file.yaml
- fix model dl path for sd-v1-4.ckpt - copy configs/models.yaml.example to configs/models.yaml
...to save some resources, since V1.5 is the default now
Complete re-write of the prompt parsing logic to be more readable and logical, and therefore also hopefully easier to debug, maintain, and augment. In the process it has also become more robust to badly-formed prompts. Squashed commit of the following: commit 8fcfa88 Author: Damian at mba <[email protected]> Date: Sun Oct 30 17:05:57 2022 +0100 further cleanup commit 1a1fd78 Author: Damian at mba <[email protected]> Date: Sun Oct 30 16:07:57 2022 +0100 cleanup and document commit 099c965 Author: Damian at mba <[email protected]> Date: Sun Oct 30 15:54:58 2022 +0100 works fully commit 5e6887e Author: Damian at mba <[email protected]> Date: Sun Oct 30 15:24:31 2022 +0100 further... commit 492fda1 Author: Damian at mba <[email protected]> Date: Sun Oct 30 14:08:57 2022 +0100 getting there... commit c6aab05 Author: Damian at mba <[email protected]> Date: Fri Oct 28 14:29:03 2022 +0200 wip doesn't compile commit 5e533f7 Author: Damian at mba <[email protected]> Date: Fri Oct 28 13:21:43 2022 +0200 working with CrossAttentionCtonrol but no Attention support yet commit 9678348 Author: Damian at mba <[email protected]> Date: Fri Oct 28 13:04:52 2022 +0200 wip rebuiling prompt parser
... to address required changes
- remove realesrgan - add git+https://github.com/invoke-ai/Real-ESRGAN.git - remove git+https://github.com/CompVis/taming-transformers.git - add taming-transformers-rom1504 - change TencentARC/GFPGAN to invoke-ai/GFPGAN
This reverts commit d05b1b3.
This reverts commit 82d4904.
- Works best with runwayML inpainting model - Numerous code changes required to propagate seed to final metadata. Original code predicated on the image being generated within InvokeAI.
- When outcropping an image you can now add a `--new_prompt` option, to specify a new prompt to be used instead of the original one used to generate the image. - Similarly you can provide a new seed using `--seed` (or `-S`). A seed of zero will pick one randomly. - This PR also fixes the crash that happened when trying to outcrop an image that does not contain InvokeAI metadata.
- Script will now offer the user the ability to create a minimal models.yaml and then gracefully exit. - Closes #1420
Add/update documentation and rename results router module to images. Adding additional invocations to support in-/out-painting Updating outpainting test in test.html Reducing outpainting expansion rate Add linking to the CLI commands Added 'history' command to node CLI Adding CLI command to set default input values Adding node-based invocation apps Add/update documentation and rename results router module to images. Reducing outpainting expansion rate Adding CLI command to set default input values More outpainting support nodes Adding node-based invocation apps Fixing some differences from development Adding some prep notes for loop support Add image upload API Add image upload api
f5f75e6
to
7694132
Compare
Add inpaint invocation.
Superseded by #1650 |
This PR adds the core of the node-based invocation system first discussed in https://github.com/invoke-ai/InvokeAI/discussions/597 and implements it through a basic CLI and API. This supersedes #1047, which was too far behind to rebase. ## Architecture ### Invocations The core of the new system is **invocations**, found in `/ldm/invoke/app/invocations`. These represent individual nodes of execution, each with inputs and outputs. Core invocations are already implemented (`txt2img`, `img2img`, `upscale`, `face_restore`) as well as a debug invocation (`show_image`). To implement a new invocation, all that is required is to add a new implementation in this folder (there is a markdown document describing the specifics, though it is slightly out-of-date). ### Sessions Invocations and links between them are maintained in a **session**. These can be queued for invocation (either the next ready node, or all nodes). Some notes: * Sessions may be added to at any time (including after invocation), but may not be modified. * Links are always added with a node, and are always links from existing nodes to the new node. These links can be relative "history" links, e.g. `-1` to link from a previously executed node, and can link either specific outputs, or can opportunistically link all matching outputs by name and type by using `*`. * There are no iteration/looping constructs. Most needs for this could be solved by either duplicating nodes or cloning sessions. This is open for discussion, but is a difficult problem to solve in a way that doesn't make the code even more complex/confusing (especially regarding node ids and history). ### Services These make up the core the invocation system, found in `/ldm/invoke/app/services`. One of the key design philosophies here is that most components should be replaceable when possible. For example, if someone wants to use cloud storage for their images, they should be able to replace the image storage service easily. The services are broken down as follows (several of these are intentionally implemented with an initial simple/naïve approach): * Invoker: Responsible for creating and executing **sessions** and managing services used to do so. * Session Manager: Manages session history. An on-disk implementation is provided, which stores sessions as json files on disk, and caches recently used sessions for quick access. * Image Storage: Stores images of multiple types. An on-disk implementation is provided, which stores images on disk and retains recently used images in an in-memory cache. * Invocation Queue: Used to queue invocations for execution. An in-memory implementation is provided. * Events: An event system, primarily used with socket.io to support future web UI integration. ## Apps Apps are available through the `/scripts/invoke-new.py` script (to-be integrated/renamed). ### CLI ``` python scripts/invoke-new.py ``` Implements a simple CLI. The CLI creates a single session, and automatically links all inputs to the previous node's output. Commands are automatically generated from all invocations, with command options being automatically generated from invocation inputs. Help is also available for the cli and for each command, and is very verbose. Additionally, the CLI supports command piping for single-line entry of multiple commands. Example: ``` > txt2img --prompt "a cat eating sushi" --steps 20 --seed 1234 | upscale | show_image ``` ### API ``` python scripts/invoke-new.py --api --host 0.0.0.0 ``` Implements an API using FastAPI with Socket.io support for signaling. API documentation is available at `http://localhost:9090/docs` or `http://localhost:9090/redoc`. This includes OpenAPI schema for all available invocations, session interaction APIs, and image APIs. Socket.io signals are per-session, and can be subscribed to by session id. These aren't currently auto-documented, though the code for event emission is centralized in `/ldm/invoke/app/services/events.py`. A very simple test html and script are available at `http://localhost:9090/static/test.html` This demonstrates creating a session from a graph, invoking it, and receiving signals from Socket.io. ## What's left? * There are a number of features not currently covered by invocations. I kept the set of invocations small during core development in order to simplify refactoring as I went. Now that the invocation code has stabilized, I'd love some help filling those out! * There's no image metadata generated. It would be fairly straightforward (and would make good sense) to serialize either a session and node reference into an image, or the entire node into the image. There are a lot of questions to answer around source images, linked images, etc. though. This history is all stored in the session as well, and with complex sessions, the metadata in an image may lose its value. This needs some further discussion. * We need a list of features (both current and future) that would be difficult to implement without looping constructs so we can have a good conversation around it. I'm really hoping we can avoid needing looping/iteration in the graph execution, since it'll necessitate separating an execution of a graph into its own concept/system, and will further complicate the system. * The API likely needs further filling out to support the UI. I think using the new API for the current UI is possible, and potentially interesting, since it could work like the new/demo CLI in a "single operation at a time" workflow. I don't know how compatible that will be with our UI goals though. It would be nice to support only a single API though. * Deeper separation of systems. I intentionally tried to not touch Generate or other systems too much, but a lot could be gained by breaking those apart. Even breaking apart Args into two pieces (command line arguments and the parser for the current CLI) would make it easier to maintain. This is probably in the future though.
This PR adds the core of the node-based invocation system first discussed in https://github.com/invoke-ai/InvokeAI/discussions/597 and implements it through a basic CLI and API.
Architecture
Invocations
The core of the new system is invocations, found in
/ldm/invoke/app/invocations
. These represent individual nodes of execution, each with inputs and outputs. Core invocations are already implemented (txt2img
,img2img
,upscale
,face_restore
) as well as a debug invocation (show_image
). To implement a new invocation, all that is required is to add a new implementation in this folder (there is a markdown document describing the specifics, though it is slightly out-of-date).Sessions
Invocations and links between them are maintained in a session. These can be queued for invocation (either the next ready node, or all nodes). Some notes:
-1
to link from a previously executed node, and can link either specific outputs, or can opportunistically link all matching outputs by name and type by using*
.Services
These make up the core the invocation system, found in
/ldm/invoke/app/services
. One of the key design philosophies here is that most components should be replaceable when possible. For example, if someone wants to use cloud storage for their images, they should be able to replace the image storage service easily.The services are broken down as follows (several of these are intentionally implemented with an initial simple/naïve approach):
Apps
Apps are available through the
/scripts/invoke-new.py
script (to-be integrated/renamed).CLI
Implements a simple CLI. The CLI creates a single session, and automatically links all inputs to the previous node's output. Commands are automatically generated from all invocations, with command options being automatically generated from invocation inputs. Help is also available for the cli and for each command, and is very verbose. Additionally, the CLI supports command piping for single-line entry of multiple commands. Example:
API
Implements an API using FastAPI with Socket.io support for signaling. API documentation is available at
http://localhost:9090/docs
orhttp://localhost:9090/redoc
. This includes OpenAPI schema for all available invocations, session interaction APIs, and image APIs. Socket.io signals are per-session, and can be subscribed to by session id. These aren't currently auto-documented, though the code for event emission is centralized in/ldm/invoke/app/services/events.py
.A very simple test html and script are available at
http://localhost:9090/static/test.html
This demonstrates creating a session from a graph, invoking it, and receiving signals from Socket.io.What's left?