[Docs]: Services without a gateway (dstackai#2011)

jvstme · pranitnaik · commit f8490b42ba14 · 2024-12-20T17:12:51.000+05:30
- Update all the spots where gateway was mentioned
  as being required for services
- Do not emphasize that gateways are needed for
  auto-scaling. This is a temporary restriction
  and users will see a clear error message is they
  attempt to use auto-scaling without a gateway.
  Emphasize that gateways provide a custom domain
  and HTTPS, which is their main value.
- In Protips, add a comparison of running web apps
  as Tasks vs Services without a gateway vs
  Services with a gateway
- Add a trailing slash in `/proxy/services/.../`
  to be consistent with CLI and avoid redirects
- Other edits here and there
diff --git a/.gitignore b/.gitignore
@@ -4,6 +4,7 @@
 
 dist/
 venv/
+/site/
 /.cache/
 .pytest_cache/
 .coverage
diff --git a/docs/blog/posts/amd-on-runpod.md b/docs/blog/posts/amd-on-runpod.md
@@ -101,8 +101,8 @@ Once the configuration is ready, run `dstack apply -f <configuration file>`, and
 cloud resources and run the configuration.
 
 ??? info "Control plane"
-    If you specify `model` when running a service, `dstack` will automatically register the model on the gateway's global
-    endpoint and allow you to use it for chat via the control plane UI.
+    If you specify `model` when running a service, `dstack` will automatically register the model on
+    an OpenAI-compatible endpoint and allow you to use it for chat via the control plane UI.
     
     <img src="https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-control-plane-model-llama31.png?raw=true" width="750px" />
 
diff --git a/docs/blog/posts/dstack-sky.md b/docs/blog/posts/dstack-sky.md
@@ -77,7 +77,7 @@ With `dstack Sky` you can use all of `dstack`'s features, incl. [dev environment
 [tasks](../../docs/tasks.md), [services](../../docs/services.md), and 
 [fleets](../../docs/concepts/fleets.md).
 
-To use services, the open-source version requires setting up a gateway with your own domain. 
+To publish services, the open-source version requires setting up a gateway with your own domain. 
 `dstack Sky` comes with a pre-configured gateway.
 
 <div class="termy">
diff --git a/docs/blog/posts/tpu-on-gcp.md b/docs/blog/posts/tpu-on-gcp.md
@@ -120,8 +120,8 @@ and [vLLM :material-arrow-top-right-thin:{ .external }](https://github.com/vllm-
     </div>
 
 ??? info "Control plane"
-    If you specify `model` when running a service, `dstack` will automatically register the model on the gateway's global
-    endpoint and allow you to use it for chat via the control plane UI.
+    If you specify `model` when running a service, `dstack` will automatically register the model on
+    an OpenAI-compatible endpoint and allow you to use it for chat via the control plane UI.
     
     <img src="https://github.com/dstackai/static-assets/blob/main/static-assets/images/dstack-control-plane-model-llama31.png?raw=true" width="750px" />
 
diff --git a/docs/docs/concepts/fleets.md b/docs/docs/concepts/fleets.md
@@ -205,7 +205,7 @@ on the hosts specified in `ssh_config`.
 
 ### List fleets
 
-The [`dstack fleet`](../reference/cli/index.md#dstack-gateway-list) command lists fleet instances and theri status:
+The [`dstack fleet`](../reference/cli/index.md#dstack-fleet-list) command lists fleet instances and their status:
 
 <div class="termy">
 
diff --git a/docs/docs/concepts/gateways.md b/docs/docs/concepts/gateways.md
@@ -1,10 +1,9 @@
 # Gateways
 
-Gateways manage the ingress traffic of running services and provide them with an HTTPS endpoint mapped to your domain,
+Gateways manage the ingress traffic of running [services](../services.md)
+and provide them with an HTTPS endpoint mapped to your domain,
 handling authentication, load distribution, and auto-scaling.
 
-To run a service, you need at least one gateway set up.
-
 > If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
 > the gateway is already set up for you.
 
@@ -43,20 +42,22 @@ To create or update the gateway, simply call the [`dstack apply`](../reference/c
 <div class="termy">
 
 ```shell
-$ dstack apply . -f examples/deployment/gateway.dstack.yml
+$ dstack apply -f gateway.dstack.yml
 The example-gateway doesn't exist. Create it? [y/n]: y
 
  BACKEND  REGION     NAME             HOSTNAME  DOMAIN       DEFAULT  STATUS
  aws      eu-west-1  example-gateway            example.com  ✓        submitted
-
 ```
 
 </div>
 
 ## Update DNS records
 
 Once the gateway is assigned a hostname, go to your domain's DNS settings
-and add an `A` DNS record for `*.<gateway domain>` (e.g., `*.example.com`) pointing to the gateway's hostname.
+and add a DNS record for `*.<gateway domain>`, e.g. `*.example.com`.
+The record should point to the gateway's hostname shown in `dstack`
+and should be of type `A` if the hostname is an IP address (most cases),
+or of type `CNAME` if the hostname is another domain (some private gateways and Kubernetes).
 
 ## Manage gateways
 
diff --git a/docs/docs/guides/protips.md b/docs/docs/guides/protips.md
@@ -93,9 +93,26 @@ This allows you to access the remote `8501` port on `localhost:8501` while the C
     
     This will forward the remote `8501` port to `localhost:3000`.
 
-[Services](../services.md) require a gateway but they also provide additional features for
-production-grade service deployment not offered by tasks, such as HTTPS domains and auto-scaling.
-If you run a web app as a task and it works, go ahead and run it as a service.
+[Services](../services.md) provide additional features not offered by tasks,
+such as authorization, load balancing, auto-scaling, an OpenAI-compatible endpoint for models, etc.
+
+Unlike tasks, services are accessible throughout their lifetime, not only when the CLI is attached.
+By default, they are published at `<dstack server URL>/proxy/services/<project name>/<run name>/`.
+Additionally, if your project has a [gateway](../concepts/gateways.md),
+services can be published at a custom domain with HTTPS instead.
+
+So what should you choose for running a web app? Here are some suggestions:
+
+- If you are running a simple app that you only need temporarily, consider **tasks**.
+- If your app needs to be available at all times or if it needs to benefit from advanced features
+  such as authorization or load balancing, use **services**.
+    - If the service will only be accessed by you and other `dstack` users and supports running
+      behind a URL path prefix, **no gateway** is needed.
+    - If the service requires public access, a custom domain, HTTPS, or increased network throughput,
+      **create a gateway** first.
+
+??? info "Auto-scaling and WebSockets"
+    Services using WebSockets or auto-scaling currently require a gateway.
 
 ## Docker and Docker Compose
 
diff --git a/docs/docs/guides/troubleshooting.md b/docs/docs/guides/troubleshooting.md
@@ -84,8 +84,9 @@ was using spot instances and was interrupted. To address this, you can either se
 
 #### Gateway configuration
 
-The most common reason a service fails to start is either because you haven’t [created a gateway](../concepts/gateways.md) or haven’t set up the
-correct DNS record pointing to the gateway's hostname.
+If all services fail to start with a specific gateway, make sure a
+[correct DNS record](../concepts/gateways.md#update-dns-records)
+pointing to the gateway's hostname is configured.
 
 ### Service endpoint doesn't work 
 
@@ -94,10 +95,6 @@ correct DNS record pointing to the gateway's hostname.
 If the service endpoint returns a 403 error, it is likely because the [`Authorization`](../services.md#access-the-endpoint) 
 header with the correct `dstack` token was not provided.
 
-#### SSH fleets
-
-If you attempt to run a service on an SSH fleet, it won't work due to a [known issue :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues/1640){:target="_blank"} that is expected to be fixed soon.
-
 [//]: # (#### Other)
 [//]: # (TODO: Explain how to get the gateway logs)
 
diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -25,8 +25,8 @@ for AI workloads both in the cloud and on-prem, speeding up the development, tra
 * [Tasks](tasks.md) &mdash; for scheduling jobs, incl. distributed ones (or running web apps)
 * [Services](services.md) &mdash; for deploying models (or web apps)
 * [Fleets](concepts/fleets.md) &mdash; for managing cloud and on-prem clusters
-* [Volumes](concepts/volumes.md) &mdash; for managing instance and network volumes (to persist data)
-* [Gateway](concepts/fleets.md) &mdash; for handling auto-scaling and ingress traffic
+* [Volumes](concepts/volumes.md) &mdash; for managing network volumes (to persist data)
+* [Gateways](concepts/gateways.md) &mdash; for publishing services with a custom domain and HTTPS
 
 Configuration can be defined as YAML files within your repo.
 
diff --git a/docs/docs/quickstart.md b/docs/docs/quickstart.md
@@ -109,7 +109,7 @@ Your folder can be a regular local folder or a Git repo.
     </div>
 
     By default, tasks run on a single instance. To run a distributed task, specify 
-    [`nodes` and system environment variables](reference/dstack.yml/task.md#distributed-tasks), 
+    [`nodes`](reference/dstack.yml/task.md#distributed-tasks), 
     and `dstack` will run it on a cluster.
 
     ##### Run the configuration
@@ -119,16 +119,14 @@ Your folder can be a regular local folder or a Git repo.
     <div class="termy">
 
     ```shell
-    $ dstack apply -f streamlit.dstack.yml
+    $ dstack apply -f serve-task.dstack.yml
     
      #  BACKEND  REGION           RESOURCES                 SPOT  PRICE
      1  gcp      us-west4         2xCPU, 8GB, 100GB (disk)  yes   $0.010052
      2  azure    westeurope       2xCPU, 8GB, 100GB (disk)  yes   $0.0132
      3  gcp      europe-central2  2xCPU, 8GB, 100GB (disk)  yes   $0.013248
      
     Submit the run streamlit? [y/n]: y
-     
-    Continue? [y/n]: y
     
     Provisioning `streamlit`...
     ---> 100%
@@ -169,7 +167,7 @@ Your folder can be a regular local folder or a Git repo.
     # Expose the vllm server port
     port: 8000
 
-    # Specify a name if it's an Open-AI compatible model
+    # Specify a name if it's an OpenAI-compatible model
     model: meta-llama/Meta-Llama-3.1-8B-Instruct
     
     # Required resources
@@ -186,36 +184,31 @@ Your folder can be a regular local folder or a Git repo.
     <div class="termy">
 
     ```shell
-    $ dstack apply -f streamlit.dstack.yml
+    $ dstack apply -f service.dstack.yml
     
-     #  BACKEND  REGION           RESOURCES                 SPOT  PRICE
-     1  gcp      us-west4         2xCPU, 8GB, 100GB (disk)  yes   $0.010052
-     2  azure    westeurope       2xCPU, 8GB, 100GB (disk)  yes   $0.0132
-     3  gcp      europe-central2  2xCPU, 8GB, 100GB (disk)  yes   $0.013248
+     #  BACKEND  REGION     INSTANCE       RESOURCES                    SPOT  PRICE
+     1  aws      us-west-2  g5.4xlarge     16xCPU, 64GB, 1xA10G (24GB)  yes   $0.22
+     2  aws      us-east-2  g6.xlarge      4xCPU, 16GB, 1xL4 (24GB)     yes   $0.27
+     3  gcp      us-west1   g2-standard-4  4xCPU, 16GB, 1xL4 (24GB)     yes   $0.27
      
-    Submit the run streamlit? [y/n]: y
-     
-    Continue? [y/n]: y
+    Submit the run llama31-service? [y/n]: y
     
-    Provisioning `streamlit`...
+    Provisioning `llama31-service`...
     ---> 100%
 
     Service is published at: 
-      http://localhost:3000/proxy/services/main/llama31-service
+      http://localhost:3000/proxy/services/main/llama31-service/
     ```
     
     </div>
 
     If you specified `model`, the model will also be available via an OpenAI-compatible endpoint at
     `<dstack server URL>/proxy/models/<project name>`.
 
-    ??? info "Gateway"
-        By default, services run on a single instance. However, you can specify `replicas` and `target` to enable 
-        [auto-scaling](reference/dstack.yml/service.md#auto-scaling).
-
-        Note, to use auto-scaling, a custom domain, or HTTPS, set up a 
+    !!! info "Gateway"
+        To publish a service with a custom domain and HTTPS, set up a 
         [gateway](concepts/gateways.md) before running the service.
-        A gateway pre-configured for you if you are using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}.
+        A gateway is pre-configured for you if you are using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}.
 
 `dstack apply` automatically provisions instances, uploads the code from the current repo (incl. your local uncommitted changes).
 
diff --git a/docs/docs/reference/cli/index.md b/docs/docs/reference/cli/index.md
@@ -68,10 +68,6 @@ $ dstack delete --help
 
 </div>
 
-!!! info "NOTE:"
-    The `dstack delete` command currently supports only `gateway` configurations.
-    Support for other configuration types is coming soon.
-
 ### dstack ps
 
 This command shows the status of runs.
@@ -189,8 +185,7 @@ $ dstack fleet delete --help
 
 ### dstack gateway
 
-A gateway is required for running services. It handles ingress traffic, authorization, domain mapping, model mapping
-for the OpenAI-compatible endpoint, and so on.
+A gateway allows publishing services at a custom domain with HTTPS.
 
 ##### dstack gateway list
 
@@ -427,7 +422,7 @@ $ dstack pool delete --help
 
 ??? info "Internal environment variables"
      * `DSTACK_SERVER_ROOT_LOG_LEVEL` – (Optional) Sets root logger log level. Defaults to `ERROR`.
-     * `DSTACK_SERVER_LOG_FORMAT` – (Optional) Sets format of log output. Can be `rich`, `standard`, `json`.. Defaults to `rich`.
+     * `DSTACK_SERVER_LOG_FORMAT` – (Optional) Sets format of log output. Can be `rich`, `standard`, `json`. Defaults to `rich`.
      * `DSTACK_SERVER_UVICORN_LOG_LEVEL` – (Optional) Sets uvicorn logger log level. Defaults to `ERROR`.
      * `DSTACK_PROFILE` – (Optional) Has the same effect as `--profile`. Defaults to `None`.
      * `DSTACK_PROJECT` – (Optional) Has the same effect as `--project`. Defaults to `None`.
diff --git a/docs/docs/reference/dstack.yml/gateway.md b/docs/docs/reference/dstack.yml/gateway.md
@@ -1,6 +1,6 @@
 # gateway
 
-The `gateway` configuration type allows creating and updating [gateways](../../services.md).
+The `gateway` configuration type allows creating and updating [gateways](../../concepts/gateways.md).
 
 > Configuration files must be inside the project repo, and their names must end with `.dstack.yml` 
 > (e.g. `.dstack.yml` or `gateway.dstack.yml` are both acceptable).
diff --git a/docs/docs/reference/dstack.yml/service.md b/docs/docs/reference/dstack.yml/service.md
@@ -108,13 +108,11 @@ If you want, you can specify your own Docker image via `image`.
     All backends except `runpod`, `vastai` and `kubernetes` also allow to use [Docker and Docker Compose](../../guides/protips.md#docker-and-docker-compose) 
     inside `dstack` runs.
 
-### Model gateway { #model-mapping }
-
-By default, if you run a service, its endpoint is accessible at `https://<run name>.<gateway domain>`.
+### Models { #model-mapping }
 
 If you are running a chat model with an OpenAI-compatible interface,
-you can optionally set the [`model`](#model) property to make the model accessible via
-the model gateway provided by `dstack`.
+set the [`model`](#model) property to make the model accessible via
+the OpenAI-compatible endpoint provided by `dstack`.
 
 <div editor-title="service.dstack.yml"> 
 
@@ -138,7 +136,7 @@ resources:
   # Change to what is required
   gpu: 24GB
 
-# Make the model accessible at https://gateway.<gateway domain>
+# Register the model
 model: meta-llama/Meta-Llama-3.1-8B-Instruct
 
 # Alternatively, use this syntax to set more model settings:
@@ -151,8 +149,9 @@ model: meta-llama/Meta-Llama-3.1-8B-Instruct
 
 </div>
 
-With such a configuration, once the service is up, you'll be able to access the model at
-`https://gateway.<gateway domain>` via the OpenAI-compatible interface.
+Once the service is up, the model will be available via the OpenAI-compatible endpoint
+at `<dstack server URL>/proxy/models/<project name>`
+or at `https://gateway.<gateway domain>` if your project has a gateway.
 
 ### Auto-scaling
 
@@ -199,6 +198,11 @@ The [`replicas`](#replicas) property can be a number or a range.
 
 Setting the minimum number of replicas to `0` allows the service to scale down to zero when there are no requests.
 
+!!! info "Gateway"
+    Services with a fixed number of replicas are supported both with and without a
+    [gateway](../../concepts/gateways.md).
+    Auto-scaling is currently only supported for services running with a gateway.
+
 ### Resources { #_resources }
 
 If you specify memory size, you can either specify an explicit size (e.g. `24GB`) or a 
diff --git a/docs/docs/services.md b/docs/docs/services.md
@@ -28,7 +28,7 @@ commands:
 # Expose the vllm server port
 port: 8000
 
-# Specify a name if it's an Open-AI compatible model
+# Specify a name if it's an OpenAI-compatible model
 model: meta-llama/Meta-Llama-3.1-8B-Instruct
 
 # Use either spot or on-demand instances
@@ -47,12 +47,9 @@ If you don't specify your Docker image, `dstack` uses the [base](https://hub.doc
 Note, the `model` property is optional and not needed when deploying a non-OpenAI-compatible model or a regular web app.
 
 !!! info "Gateway"
-    By default, services run on a single instance. However, you can specify `replicas` and `target` to enable 
-    [auto-scaling](reference/dstack.yml/service.md#auto-scaling).
-
-    Note, to use auto-scaling, a custom domain, or HTTPS, set up a 
+    To publish a service with a custom domain and HTTPS, set up a 
     [gateway](concepts/gateways.md) before running the service.
-    A gateway pre-configured for you if you are using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}.
+    A gateway is pre-configured for you if you are using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}.
 
 !!! info "Reference"
     See [.dstack.yml](reference/dstack.yml/service.md) for all the options supported by
@@ -80,7 +77,7 @@ Provisioning...
 ---> 100%
 
 Service is published at: 
-  http://localhost:3000/proxy/services/main/llama31-service
+  http://localhost:3000/proxy/services/main/llama31-service/
 ```
 
 </div>
@@ -92,8 +89,8 @@ To avoid uploading large files, ensure they are listed in `.gitignore`.
 
 ### Service
 
-If no gateway is created, the service’s endpoint will be accessible at `<dstack server URL>
-/proxy/services/<project name>/<run name>`.
+If no gateway is created, the service’s endpoint will be accessible at
+`<dstack server URL>/proxy/services/<project name>/<run name>/`.
 
 By default, the service endpoint requires the `Authorization` header with `Bearer <dstack token>`.
 
diff --git a/examples/deployment/nim/README.md b/examples/deployment/nim/README.md
@@ -68,8 +68,9 @@ Provisioning...
 ```
 </div>
 
-If no gateway is created, the service’s endpoint will be accessible at 
-`<dstack server URL>/proxy/services/<project name>/<run name>`.
+Once the service is up, the model will be available via the OpenAI-compatible endpoint
+at `<dstack server URL>/proxy/models/<project name>`
+or at `https://gateway.<gateway domain>` if your project has a gateway.
 
 <div class="termy">
 
diff --git a/examples/deployment/tgi/README.md b/examples/deployment/tgi/README.md
@@ -69,8 +69,9 @@ Provisioning...
 ```
 </div>
 
-If no gateway is created, the service’s endpoint will be accessible at 
-`<dstack server URL>/proxy/services/<project name>/<run name>`.
+Once the service is up, the model will be available via the OpenAI-compatible endpoint
+at `<dstack server URL>/proxy/models/<project name>`
+or at `https://gateway.<gateway domain>` if your project has a gateway.
 
 <div class="termy">
 
diff --git a/examples/deployment/vllm/README.md b/examples/deployment/vllm/README.md
diff --git a/examples/llms/llama31/README.md b/examples/llms/llama31/README.md
diff --git a/examples/llms/llama32/README.md b/examples/llms/llama32/README.md
diff --git a/src/dstack/_internal/core/models/configurations.py b/src/dstack/_internal/core/models/configurations.py