Skip to content

Add node-based invocation system #1047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 165 commits into from
Closed

Add node-based invocation system #1047

wants to merge 165 commits into from

Conversation

Kyle0654
Copy link
Contributor

@Kyle0654 Kyle0654 commented Oct 11, 2022

This PR adds the core of the node-based invocation system first discussed in https://github.com/invoke-ai/InvokeAI/discussions/597 and implements it through a basic CLI and API.

Architecture

Invocations

The core of the new system is invocations, found in /ldm/invoke/app/invocations. These represent individual nodes of execution, each with inputs and outputs. Core invocations are already implemented (txt2img, img2img, upscale, face_restore) as well as a debug invocation (show_image). To implement a new invocation, all that is required is to add a new implementation in this folder (there is a markdown document describing the specifics, though it is slightly out-of-date).

Sessions

Invocations and links between them are maintained in a session. These can be queued for invocation (either the next ready node, or all nodes). Some notes:

  • Sessions may be added to at any time (including after invocation), but may not be modified.
  • Links are always added with a node, and are always links from existing nodes to the new node. These links can be relative "history" links, e.g. -1 to link from a previously executed node, and can link either specific outputs, or can opportunistically link all matching outputs by name and type by using *.
  • There are no iteration/looping constructs. Most needs for this could be solved by either duplicating nodes or cloning sessions. This is open for discussion, but is a difficult problem to solve in a way that doesn't make the code even more complex/confusing (especially regarding node ids and history).

Services

These make up the core the invocation system, found in /ldm/invoke/app/services. One of the key design philosophies here is that most components should be replaceable when possible. For example, if someone wants to use cloud storage for their images, they should be able to replace the image storage service easily.

The services are broken down as follows (several of these are intentionally implemented with an initial simple/naïve approach):

  • Invoker: Responsible for creating and executing sessions and managing services used to do so.
  • Session Manager: Manages session history. An on-disk implementation is provided, which stores sessions as json files on disk, and caches recently used sessions for quick access.
  • Image Storage: Stores images of multiple types. An on-disk implementation is provided, which stores images on disk and retains recently used images in an in-memory cache.
  • Invocation Queue: Used to queue invocations for execution. An in-memory implementation is provided.
  • Events: An event system, primarily used with socket.io to support future web UI integration.

Apps

Apps are available through the /scripts/invoke-new.py script (to-be integrated/renamed).

CLI

python scripts/invoke-new.py

Implements a simple CLI. The CLI creates a single session, and automatically links all inputs to the previous node's output. Commands are automatically generated from all invocations, with command options being automatically generated from invocation inputs. Help is also available for the cli and for each command, and is very verbose. Additionally, the CLI supports command piping for single-line entry of multiple commands. Example:

> txt2img --prompt "a cat eating sushi" --steps 20 --seed 1234 | upscale | show_image

API

python scripts/invoke-new.py --api --host 0.0.0.0

Implements an API using FastAPI with Socket.io support for signaling. API documentation is available at http://localhost:9090/docs or http://localhost:9090/redoc. This includes OpenAPI schema for all available invocations, session interaction APIs, and image APIs. Socket.io signals are per-session, and can be subscribed to by session id. These aren't currently auto-documented, though the code for event emission is centralized in /ldm/invoke/app/services/events.py.

A very simple test html and script are available at http://localhost:9090/static/test.html This demonstrates creating a session from a graph, invoking it, and receiving signals from Socket.io.

What's left?

  • There are a number of features not currently covered by invocations. I kept the set of invocations small during core development in order to simplify refactoring as I went. Now that the invocation code has stabilized, I'd love some help filling those out!
  • There's no image metadata generated. It would be fairly straightforward (and would make good sense) to serialize either a session and node reference into an image, or the entire node into the image. There are a lot of questions to answer around source images, linked images, etc. though. This history is all stored in the session as well, and with complex sessions, the metadata in an image may lose its value. This needs some further discussion.
  • We need a list of features (both current and future) that would be difficult to implement without looping constructs so we can have a good conversation around it. I'm really hoping we can avoid needing looping/iteration in the graph execution, since it'll necessitate separating an execution of a graph into its own concept/system, and will further complicate the system.
  • The API likely needs further filling out to support the UI. I think using the new API for the current UI is possible, and potentially interesting, since it could work like the new/demo CLI in a "single operation at a time" workflow. I don't know how compatible that will be with our UI goals though. It would be nice to support only a single API though.
  • Deeper separation of systems. I intentionally tried to not touch Generate or other systems too much, but a lot could be gained by breaking those apart. Even breaking apart Args into two pieces (command line arguments and the parser for the current CLI) would make it easier to maintain. This is probably in the future though.

@lstein
Copy link
Collaborator

lstein commented Oct 11, 2022

Great work! I love the pipe syntax in the CLI.

I'm worried about the comment that looping constructs are not supported and would like to dig into this limitation a bit more deeply. Is there any reason that what the user sees is this:

invoke> "a cat eating sushi" --iterations 10 --steps 20 --seed 1234 | upscale --strength 0.8 --scale 4

and what happens under the covers is this loop (pseudo code)?

for i in range(0,10):
    txt2img( --prompt "a cat eating sushi" --steps 20 --seed 1234) | upscale | save_image

Basically I want to preserve the user experience as much as possible. I'll happily deprecate the magic post processing switches (-U to upscale, etc) in favor of a pipe syntax, but the legacy commands still need to work.

@Kyle0654
Copy link
Contributor Author

Kyle0654 commented Oct 11, 2022

Upscale only accepts a single image as input, so you'd end up having to expand every node into a loop iteration until you either close the loop with something like a "gather" node or stop executing (leaving leaf nodes as results). This would be functionally equivalent to just setting up N copies of your txt2img -> upscale nodes (i.e. unwrapping the loop yourself).

I guess another way of putting it is: do we want looping supported in the core execution/state management, or can the UI/CLI handle that for us by manipulating the graph?

I think for things like grid I'll probably have to implement a gather type of mapping (i.e. allow Image to map to List[Image]). That may get difficult though, since I'd image we'd want some sort of order preserved, and there aren't really ordering guarantees in a graph (unless we either added some metadata or just sorted by node id).

@tildebyte
Copy link
Contributor

tildebyte commented Oct 11, 2022

@Kyle0654, @lstein;

There's no image metadata generated. It would be fairly straightforward (and would make good sense) to serialize either a session and node reference into an image, or the entire node into the image. There are a lot of questions to answer around source images, linked images, etc. though. This history is all stored in the session as well, and with complex sessions, the metadata in an image may lose its value. This needs some further discussion.

I had a brainwave when I read this. There's already a model of how to handle processing/editing/modification metadata ("capture the process") - it's called a sidecar file (a '.xmp' file next to the original image), and it's been used in Lightroom since forever. The advantages here are

  • Don't modify the original image unless necessary
  • Use a standard format (YAML/JSON/XML) unencumbered by what the image format supports
  • Total control over the content - you can have an entire "standard" section (e.g. EXIF), plus whatever wild-west stuff you need

Obviously that doesn't all map 1-to-1 to our use case, but I think it's analogous.

In my mind, the prompt, and basic settings (sample scheduler, steps, CFG scale, etc.) define the image. Upscaling, inpainting, outpainting, etc. are operations performed on the existing base generation.

Being able to download a small XMP-like text file and apply it to an existing image would be a UX win. Now that I think of it, it also solves the "preset" problem! - there's nothing that says that sidecar files must have an originating image.

Again, obviously, this doesn't cover ALL scenarios or corner cases, but I think it solves the problem in a relatively elegant way, without having to shoehorn tons of non-standard text data into text records in PNG files.

P.S. I can already hear "but now you have to manage TWO files"... If you're a "serious" artist you should already have a workflow, and for digital artists (I'd even consider "straight" photographers who do minimal retouching to be "digital artists" in this sense) that had better already include file management. Casual users probably don't care as much about this stuff...

@Kyle0654
Copy link
Contributor Author

Here's an example of a session json, which contains all of the metadata for an image generation:

{
   "id":"3wBcdcqdRdm95ZPShvhLww==",
   "invocations":{
      "0":{
         "id":"0",
         "type":"txt2img",
         "prompt":"a cat wearing a funny hat",
         "seed":0,
         "steps":10,
         "width":512,
         "height":512,
         "cfg_scale":7.5,
         "sampler_name":"k_lms",
         "seamless":false,
         "model":"",
         "progress_images":false
      },
      "1":{
         "id":"1",
         "type":"show_image",
         "image":{
            "image_type":"results",
            "image_name":"3wBcdcqdRdm95ZPShvhLww==_0_1665464353.png"
         }
      }
   },
   "links":{
      "0":[],
      "1":[
         {
            "from_node_id":"0",
            "from_field":"image",
            "to_field":"image"
         }
      ]
   },
   "invocation_results":{
      "0":{
         "invocation":{
            "id":"0",
            "type":"txt2img",
            "prompt":"a cat wearing a funny hat",
            "seed":0,
            "steps":10,
            "width":512,
            "height":512,
            "cfg_scale":7.5,
            "sampler_name":"k_lms",
            "seamless":false,
            "model":"",
            "progress_images":false
         },
         "outputs":{
            "type":"image",
            "image":{
               "image_type":"results",
               "image_name":"3wBcdcqdRdm95ZPShvhLww==_0_1665464353.png"
            }
         }
      },
      "1":{
         "invocation":{
            "id":"1",
            "type":"show_image",
            "image":{
               "image_type":"results",
               "image_name":"3wBcdcqdRdm95ZPShvhLww==_0_1665464353.png"
            }
         },
         "outputs":{
            "type":"image",
            "image":{
               "image_type":"results",
               "image_name":"3wBcdcqdRdm95ZPShvhLww==_0_1665464353.png"
            }
         }
      }
   },
   "history":[
      "0",
      "1"
   ]
}

This needs a bit more work (there's some duplication in there that's not necessary, and we may want to either pre-compute seed on a node or output the seed that was used), but this should have everything you'd need to re-create the image. This gets stored in a json file in the sessions output directory.

@oxysoft
Copy link

oxysoft commented Oct 17, 2022

Please see this proposal for plugins in AUTOMATIC1111 for some ideas on this topic. InvokeAI is the next best project for this kind of middleware AI platform, as it has a reasonably large community as well.

I like the invocations but we should make sure we group things by model, for example the txt2img and img2img invocation should specify StableDiffusion as we could support other diffusion models or VQGAN+CLIP even.

I'm guessing the node-based invocation app is different from your main web UI? If so, do you plan on updating your web UI to work more as an invocation tool palette, as described in the proposal? Thus the pre-packaged web UI would act as the most minimal

@Kyle0654
Copy link
Contributor Author

I believe the plan is to have the node based UI available for advanced use, with the crafted UI adapted to utilize the node backend, but still run a fairly crafted experience.

I've anticipated different models on txt2img and img2img already, though it's also easy to add additional nodes to support different scenarios. The goal is to avoid nodes having too much functionality in a single node. While this will make some of them more complex to use, it will also make the system as a whole more powerful.

@oxysoft
Copy link

oxysoft commented Oct 17, 2022

Can we provide installation procedures to our invocations? i.e. do the StableDiffusion nodes handle the installation for the model or does it still come as part of the core?

@Kyle0654
Copy link
Contributor Author

It's still part of core. Most of "core" needs refactoring though. I'd love to support plugins for nodes (and it should be pretty straightforward to support), but haven't got there yet.

@oxysoft
Copy link

oxysoft commented Oct 17, 2022

Definitely something we have to look at. Installing models and having it usable is the kind of thing people only ever wanna do once in their lifetime. As a developer I would much rather use MiDaS as part of a repository that handles all its checkpoint management, model loading, cloning repos, that kind of shit. In the future if I want to use the latest model or technique, it should be done through pulling an InvokeAI plugin, not by manually git cloning some repository doing pip installs and running some command line script.

@lstein
Copy link
Collaborator

lstein commented Oct 17, 2022 via email

@psychedelicious
Copy link
Collaborator

While working on moving my work on the UI to this branch, I ran into a very confusing error. Writing here in case anybody else runs into this.

The error:

python scripts/invoke-new.py --api --host 0.0.0.0
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/selector_events.py", line 256, in _add_reader
    key = self._selector.get_key(fd)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/selectors.py", line 193, in get_key
    raise KeyError("{!r} is not registered".format(fileobj)) from None
KeyError: '7 is not registered'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/spencer/Documents/Code/stable-diffusion/scripts/invoke-new.py", line 20, in <module>
    main()
  File "/Users/spencer/Documents/Code/stable-diffusion/scripts/invoke-new.py", line 12, in main
    invoke_api()
  File "/Users/spencer/Documents/Code/stable-diffusion/ldm/invoke/app/api_app.py", line 152, in invoke_api
    loop = asyncio.new_event_loop()
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/events.py", line 761, in new_event_loop
    return get_event_loop_policy().new_event_loop()
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/events.py", line 659, in new_event_loop
    return self._loop_factory()
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/unix_events.py", line 54, in __init__
    super().__init__(selector)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/selector_events.py", line 56, in __init__
    self._make_self_pipe()
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/selector_events.py", line 107, in _make_self_pipe
    self._add_reader(self._ssock.fileno(), self._read_from_self)
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/selector_events.py", line 258, in _add_reader
    self._selector.register(fd, selectors.EVENT_READ,
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/selectors.py", line 523, in register
    self._selector.control([kev], 0, 0)
TypeError: changelist must be an iterable of select.kevent objects
Exception ignored in: <function BaseEventLoop.__del__ at 0x1054f9c10>
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/base_events.py", line 688, in __del__
    self.close()
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/invokeai/lib/python3.9/asyncio/unix_events.py", line 63, in close
    if self._signal_handlers:
AttributeError: '_UnixSelectorEventLoop' object has no attribute '_signal_handlers'

The solution is to pip uninstall eventlet. eventlet is needed for the current web UI server.

Apparently there is some issue with asyncio, where if eventlet is installed this error can happen. Doesn't even need to be imported (!) to cause the error. Confusing little gremlin this one was.

lstein and others added 15 commits October 31, 2022 10:47
- NEVER overwrite user's existing models.yaml
- Instead, merge its contents into new config file,
  and rename original to models.yaml.orig (with
  message)
- models.yaml has been removed from repository and renamed
  models.yaml.example
- Faster startup for command line switch processing
- Specify configuration file to modify using --config option:

  ./scripts/preload_models.ply --config models/my-models-file.yaml
- fix model dl path for sd-v1-4.ckpt
- copy configs/models.yaml.example to configs/models.yaml
...to save some resources, since V1.5 is the default now
Complete re-write of the prompt parsing logic to be more readable and
logical, and therefore also hopefully easier to debug, maintain, and
augment.

In the process it has also become more robust to badly-formed prompts.

Squashed commit of the following:

commit 8fcfa88
Author: Damian at mba <[email protected]>
Date:   Sun Oct 30 17:05:57 2022 +0100

    further cleanup

commit 1a1fd78
Author: Damian at mba <[email protected]>
Date:   Sun Oct 30 16:07:57 2022 +0100

    cleanup and document

commit 099c965
Author: Damian at mba <[email protected]>
Date:   Sun Oct 30 15:54:58 2022 +0100

    works fully

commit 5e6887e
Author: Damian at mba <[email protected]>
Date:   Sun Oct 30 15:24:31 2022 +0100

    further...

commit 492fda1
Author: Damian at mba <[email protected]>
Date:   Sun Oct 30 14:08:57 2022 +0100

    getting there...

commit c6aab05
Author: Damian at mba <[email protected]>
Date:   Fri Oct 28 14:29:03 2022 +0200

    wip doesn't compile

commit 5e533f7
Author: Damian at mba <[email protected]>
Date:   Fri Oct 28 13:21:43 2022 +0200

    working with CrossAttentionCtonrol but no Attention support yet

commit 9678348
Author: Damian at mba <[email protected]>
Date:   Fri Oct 28 13:04:52 2022 +0200

    wip rebuiling prompt parser
mauwii and others added 20 commits November 9, 2022 12:53
... to address required changes
- remove realesrgan
- add git+https://github.com/invoke-ai/Real-ESRGAN.git
- remove git+https://github.com/CompVis/taming-transformers.git
- add taming-transformers-rom1504
- change TencentARC/GFPGAN to invoke-ai/GFPGAN
- Works best with runwayML inpainting model
- Numerous code changes required to propagate seed to final metadata.
  Original code predicated on the image being generated within InvokeAI.
- When outcropping an image you can now add a `--new_prompt` option, to specify
  a new prompt to be used instead of the original one used to generate the image.

- Similarly you can provide a new seed using `--seed` (or `-S`). A seed of zero
  will pick one randomly.

- This PR also fixes the crash that happened when trying to outcrop an image
  that does not contain InvokeAI metadata.
- Script will now offer the user the ability to create a
  minimal models.yaml and then gracefully exit.
- Closes #1420
Add/update documentation and rename results router module to images.

Adding additional invocations to support in-/out-painting

Updating outpainting test in test.html

Reducing outpainting expansion rate

Add linking to the CLI commands

Added 'history' command to node CLI

Adding CLI command to set default input values

Adding node-based invocation apps

Add/update documentation and rename results router module to images.

Reducing outpainting expansion rate

Adding CLI command to set default input values

More outpainting support nodes

Adding node-based invocation apps

Fixing some differences from development

Adding some prep notes for loop support

Add image upload API

Add image upload api
@Kyle0654
Copy link
Contributor Author

Kyle0654 commented Dec 1, 2022

Superseded by #1650

@lstein lstein deleted the development-invoke branch December 3, 2022 18:43
blessedcoolant added a commit that referenced this pull request Feb 25, 2023
This PR adds the core of the node-based invocation system first
discussed in https://github.com/invoke-ai/InvokeAI/discussions/597 and
implements it through a basic CLI and API. This supersedes #1047, which
was too far behind to rebase.

## Architecture

### Invocations
The core of the new system is **invocations**, found in
`/ldm/invoke/app/invocations`. These represent individual nodes of
execution, each with inputs and outputs. Core invocations are already
implemented (`txt2img`, `img2img`, `upscale`, `face_restore`) as well as
a debug invocation (`show_image`). To implement a new invocation, all
that is required is to add a new implementation in this folder (there is
a markdown document describing the specifics, though it is slightly
out-of-date).

### Sessions
Invocations and links between them are maintained in a **session**.
These can be queued for invocation (either the next ready node, or all
nodes). Some notes:
* Sessions may be added to at any time (including after invocation), but
may not be modified.
* Links are always added with a node, and are always links from existing
nodes to the new node. These links can be relative "history" links, e.g.
`-1` to link from a previously executed node, and can link either
specific outputs, or can opportunistically link all matching outputs by
name and type by using `*`.
* There are no iteration/looping constructs. Most needs for this could
be solved by either duplicating nodes or cloning sessions. This is open
for discussion, but is a difficult problem to solve in a way that
doesn't make the code even more complex/confusing (especially regarding
node ids and history).

### Services
These make up the core the invocation system, found in
`/ldm/invoke/app/services`. One of the key design philosophies here is
that most components should be replaceable when possible. For example,
if someone wants to use cloud storage for their images, they should be
able to replace the image storage service easily.

The services are broken down as follows (several of these are
intentionally implemented with an initial simple/naïve approach):
* Invoker: Responsible for creating and executing **sessions** and
managing services used to do so.
* Session Manager: Manages session history. An on-disk implementation is
provided, which stores sessions as json files on disk, and caches
recently used sessions for quick access.
* Image Storage: Stores images of multiple types. An on-disk
implementation is provided, which stores images on disk and retains
recently used images in an in-memory cache.
* Invocation Queue: Used to queue invocations for execution. An
in-memory implementation is provided.
* Events: An event system, primarily used with socket.io to support
future web UI integration.

## Apps

Apps are available through the `/scripts/invoke-new.py` script (to-be
integrated/renamed).

### CLI
```
python scripts/invoke-new.py
```

Implements a simple CLI. The CLI creates a single session, and
automatically links all inputs to the previous node's output. Commands
are automatically generated from all invocations, with command options
being automatically generated from invocation inputs. Help is also
available for the cli and for each command, and is very verbose.
Additionally, the CLI supports command piping for single-line entry of
multiple commands. Example:

```
> txt2img --prompt "a cat eating sushi" --steps 20 --seed 1234 | upscale | show_image
```

### API
```
python scripts/invoke-new.py --api --host 0.0.0.0
```

Implements an API using FastAPI with Socket.io support for signaling.
API documentation is available at `http://localhost:9090/docs` or
`http://localhost:9090/redoc`. This includes OpenAPI schema for all
available invocations, session interaction APIs, and image APIs.
Socket.io signals are per-session, and can be subscribed to by session
id. These aren't currently auto-documented, though the code for event
emission is centralized in `/ldm/invoke/app/services/events.py`.

A very simple test html and script are available at
`http://localhost:9090/static/test.html` This demonstrates creating a
session from a graph, invoking it, and receiving signals from Socket.io.

## What's left?

* There are a number of features not currently covered by invocations. I
kept the set of invocations small during core development in order to
simplify refactoring as I went. Now that the invocation code has
stabilized, I'd love some help filling those out!
* There's no image metadata generated. It would be fairly
straightforward (and would make good sense) to serialize either a
session and node reference into an image, or the entire node into the
image. There are a lot of questions to answer around source images,
linked images, etc. though. This history is all stored in the session as
well, and with complex sessions, the metadata in an image may lose its
value. This needs some further discussion.
* We need a list of features (both current and future) that would be
difficult to implement without looping constructs so we can have a good
conversation around it. I'm really hoping we can avoid needing
looping/iteration in the graph execution, since it'll necessitate
separating an execution of a graph into its own concept/system, and will
further complicate the system.
* The API likely needs further filling out to support the UI. I think
using the new API for the current UI is possible, and potentially
interesting, since it could work like the new/demo CLI in a "single
operation at a time" workflow. I don't know how compatible that will be
with our UI goals though. It would be nice to support only a single API
though.
* Deeper separation of systems. I intentionally tried to not touch
Generate or other systems too much, but a lot could be gained by
breaking those apart. Even breaking apart Args into two pieces (command
line arguments and the parser for the current CLI) would make it easier
to maintain. This is probably in the future though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.