bahaal-tech
diff --git a/‎docs/docs/index.md
Lines changed: 4 additions & 2 deletions b/‎docs/docs/index.md
Lines changed: 4 additions & 2 deletions
diff --git a/‎docs/docs/quickstart.md
Lines changed: 44 additions & 53 deletions b/‎docs/docs/quickstart.md
Lines changed: 44 additions & 53 deletions
diff --git a/‎docs/docs/services.md
Lines changed: 30 additions & 25 deletions b/‎docs/docs/services.md
Lines changed: 30 additions & 25 deletions
diff --git a/‎examples/llms/llama31/.dstack.yml
Lines changed: 0 additions & 20 deletions b/‎examples/llms/llama31/.dstack.yml
Lines changed: 0 additions & 20 deletions
@@ -22,9 +22,11 @@ for AI workloads both in the cloud and on-prem, speeding up the development, tra
 `dstack` supports the following configurations:
 
 * [Dev environments](dev-environments.md) &mdash; for interactive development using a desktop IDE
-* [Tasks](tasks.md) &mdash; for scheduling jobs (incl. distributed jobs) or running web apps
-* [Services](services.md) &mdash; for deployment of models and web apps (with auto-scaling and authorization)
+* [Tasks](tasks.md) &mdash; for scheduling jobs, incl. distributed ones (or running web apps)
+* [Services](services.md) &mdash; for deploying models (or web apps)
 * [Fleets](concepts/fleets.md) &mdash; for managing cloud and on-prem clusters
+* [Volumes](concepts/volumes.md) &mdash; for managing instance and network volumes (to persist data)
+* [Gateway](concepts/fleets.md) &mdash; for handling auto-scaling and ingress traffic
 
 Configuration can be defined as YAML files within your repo.
 
 
@@ -21,8 +21,7 @@ Your folder can be a regular local folder or a Git repo.
 
 === "Dev environment"
 
-    A dev environment lets you provision a remote machine with your code, dependencies, and resources, and access it 
-    with your desktop IDE.
+    A dev environment lets you provision an instance and access it with your desktop IDE.
 
     ##### Define a configuration
 
@@ -32,18 +31,14 @@ Your folder can be a regular local folder or a Git repo.
 
     ```yaml
     type: dev-environment
-    # The name is optional, if not specified, generated randomly
     name: vscode
 
+    # If `image` is not specified, dstack uses its default image
     python: "3.11"
-    # Uncomment to use a custom Docker image
     #image: dstackai/base:py3.13-0.6-cuda-12.1
 
     ide: vscode
 
-    # Use either spot or on-demand instances
-    spot_policy: auto
-    
     # Uncomment to request resources
     #resources:
     #  gpu: 24GB
@@ -78,24 +73,24 @@ Your folder can be a regular local folder or a Git repo.
 
     Open the link to access the dev environment using your desktop IDE.
 
+    Alternatively, you can access it via `ssh <run name>`.
+
 === "Task"
 
-    A task allows you to schedule a job or run a web app. It lets you configure 
-    dependencies, resources, ports, the number of nodes (if you want to run the task on a cluster), etc.
+    A task allows you to schedule a job or run a web app. Tasks can be distributed and can forward ports.
 
     ##### Define a configuration
 
     Create the following configuration file inside the repo:
 
-    <div editor-title="streamlit.dstack.yml"> 
+    <div editor-title="examples/misc/streamlit/serve-task.dstack.yml"> 
 
     ```yaml
     type: task
-    # The name is optional, if not specified, generated randomly
     name: streamlit
 
+    # If `image` is not specified, dstack uses its default image
     python: "3.11"
-    # Uncomment to use a custom Docker image
     #image: dstackai/base:py3.13-0.6-cuda-12.1
 
     # Commands of the task
@@ -106,16 +101,17 @@ Your folder can be a regular local folder or a Git repo.
     ports:
       - 8501
 
-    # Use either spot or on-demand instances
-    spot_policy: auto
-    
     # Uncomment to request resources
     #resources:
     #  gpu: 24GB
     ```
 
     </div>
 
+    By default, tasks run on a single instance. To run a distributed task, specify 
+    [`nodes` and system environment variables](reference/dstack.yml/task.md#distributed-tasks), 
+    and `dstack` will run it on a cluster.
+
     ##### Run the configuration
 
     Run the configuration via [`dstack apply`](reference/cli/index.md#dstack-apply):
@@ -144,50 +140,41 @@ Your folder can be a regular local folder or a Git repo.
 
     </div>
 
-    `dstack apply` automatically forwards the remote ports to `localhost` for convenient access.
+    If you specified `ports`, they will be automatically forwarded to `localhost` for convenient access.
 
 === "Service"
 
-    A service allows you to deploy a web app or a model as a scalable endpoint. It lets you configure
-    dependencies, resources, authorization, auto-scaling rules, etc. 
-
-    ??? info "Prerequisites"
-        If you're using the open-source server, you must set up a [gateway](concepts/gateways.md) before you can run a service.
-
-        If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
-        the gateway is already set up for you.
+    A service allows you to deploy a model or any web app as an endpoint.
 
     ##### Define a configuration
 
     Create the following configuration file inside the repo:
 
-    <div editor-title="streamlit-service.dstack.yml"> 
+    <div editor-title="examples/deployment/vllm/service.dstack.yml"> 
 
     ```yaml
     type: service
-    # The name is optional, if not specified, generated randomly
-    name: streamlit-service
+    name: llama31-service
 
+    # If `image` is not specified, dstack uses its default image
     python: "3.11"
-    # Uncomment to use a custom Docker image
     #image: dstackai/base:py3.13-0.6-cuda-12.1
 
-    # Commands of the service
+    # Required environment variables
+    env:
+      - HF_TOKEN
     commands:
-      - pip install streamlit
-      - streamlit hello
-    # Port of the service
-    port: 8501
-
-    # Comment to enable authorization
-    auth: False
+      - pip install vllm
+      - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
+    # Expose the vllm server port
+    port: 8000
 
-    # Use either spot or on-demand instances
-    spot_policy: auto
+    # Specify a name if it's an Open-AI compatible model
+    model: meta-llama/Meta-Llama-3.1-8B-Instruct
 
-    # Uncomment to request resources
-    #resources:
-    #  gpu: 24GB
+    # Required resources
+    resources:
+      gpu: 24GB
     ```
 
     </div>
@@ -213,28 +200,32 @@ Your folder can be a regular local folder or a Git repo.
     Provisioning `streamlit`...
     ---> 100%
 
-      Welcome to Streamlit. Check out our demo in your browser.
-
-      Local URL: https://streamlit-service.example.com
+    Service is published at: 
+      http://localhost:3000/proxy/services/main/llama31-service
     ```
 
     </div>
 
-    Once the service is up, its endpoint is accessible at `https://<run name>.<gateway domain>`.
+    If you specified `model`, the model will also be available via an OpenAI-compatible endpoint at
+    `<dstack server URL>/proxy/models/<project name>`.
+
+    ??? info "Gateway"
+        By default, services run on a single instance. However, you can specify `replicas` and `target` to enable 
+        [auto-scaling](reference/dstack.yml/service.md#auto-scaling).
 
-> `dstack apply` automatically uploads the code from the current repo, including your local uncommitted changes.
+        Note, to use auto-scaling, a custom domain, or HTTPS, set up a 
+        [gateway](concepts/gateways.md) before running the service.
+        A gateway pre-configured for you if you are using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}.
+
+`dstack apply` automatically provisions instances, uploads the code from the current repo (incl. your local uncommitted changes).
 
 ## Troubleshooting
 
-Something not working? Make sure to check out the [troubleshooting](guides/troubleshooting.md) guide.
+Something not working? See the [troubleshooting](guides/troubleshooting.md) guide.
 
 ## What's next?
 
 1. Read about [dev environments](dev-environments.md), [tasks](tasks.md), 
     [services](services.md), and [fleets](concepts/fleets.md) 
-2. Browse [examples](https://dstack.ai/examples)
-3. Join the community via [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
-
-!!! info "Examples"
-    To see how dev environments, tasks, services, and fleets can be used for 
-    training and deploying AI models, check out the [examples](examples/index.md).
+2. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd)
+3. Browse [examples](https://dstack.ai/examples)
@@ -1,17 +1,8 @@
 # Services
 
-A service allows you to deploy a web app or a model as a scalable endpoint. It lets you configure
+A service allows you to deploy a model or a web app as an endpoint. It lets you configure
 dependencies, resources, authorization, auto-scaling rules, etc.
 
-Services are provisioned behind a [gateway](concepts/gateways.md) which provides an HTTPS endpoint mapped to your domain,
-handles authentication, distributes load, and performs auto-scaling.
-
-??? info "Gateways"
-    If you're using the open-source server, you must set up a [gateway](concepts/gateways.md) before you can run a service.
-
-    If you're using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"},
-    the gateway is already set up for you.
-
 ## Define a configuration
 
 First, create a YAML file in your project folder. Its name must end with `.dstack.yml` (e.g. `.dstack.yml` or
@@ -26,7 +17,7 @@ type: service
 name: llama31-service
 
 # If `image` is not specified, dstack uses its default image
-python: "3.10"
+python: "3.11"
 
 # Required environment variables
 env:
@@ -37,26 +28,31 @@ commands:
 # Expose the vllm server port
 port: 8000
 
+# Specify a name if it's an Open-AI compatible model
+model: meta-llama/Meta-Llama-3.1-8B-Instruct
+
 # Use either spot or on-demand instances
 spot_policy: auto
 
+# Required resources
 resources:
-  # Change to what is required
   gpu: 24GB
-
-# Comment out if you won't access the model via https://gateway.<gateway domain>
-model: meta-llama/Meta-Llama-3.1-8B-Instruct
 ```
 
 </div>
 
 If you don't specify your Docker image, `dstack` uses the [base](https://hub.docker.com/r/dstackai/base/tags) image
 (pre-configured with Python, Conda, and essential CUDA drivers).
 
-!!! info "Auto-scaling"
-    By default, the service is deployed to a single instance. However, you can specify the
-    [number of replicas and scaling policy](reference/dstack.yml/service.md#auto-scaling).
-    In this case, `dstack` auto-scales it based on the load.
+Note, the `model` property is optional and not needed when deploying a non-OpenAI-compatible model or a regular web app.
+
+!!! info "Gateway"
+    By default, services run on a single instance. However, you can specify `replicas` and `target` to enable 
+    [auto-scaling](reference/dstack.yml/service.md#auto-scaling).
+
+    Note, to use auto-scaling, a custom domain, or HTTPS, set up a 
+    [gateway](concepts/gateways.md) before running the service.
+    A gateway pre-configured for you if you are using [dstack Sky :material-arrow-top-right-thin:{ .external }](https://sky.dstack.ai){:target="_blank"}.
 
 !!! info "Reference"
     See [.dstack.yml](reference/dstack.yml/service.md) for all the options supported by
@@ -83,7 +79,8 @@ Submit the run llama31-service? [y/n]: y
 Provisioning...
 ---> 100%
 
-Service is published at https://llama31-service.example.com
+Service is published at: 
+  http://localhost:3000/proxy/services/main/llama31-service
 ```
 
 </div>
@@ -93,14 +90,17 @@ To avoid uploading large files, ensure they are listed in `.gitignore`.
 
 ## Access the endpoint
 
-One the service is up, its endpoint is accessible at `https://<run name>.<gateway domain>`.
+### Service
+
+If no gateway is created, the service’s endpoint will be accessible at `<dstack server URL>
+/proxy/services/<project name>/<run name>`.
 
 By default, the service endpoint requires the `Authorization` header with `Bearer <dstack token>`.
 
 <div class="termy">
 
 ```shell
-$ curl https://llama31-service.example.com/v1/chat/completions \
+$ curl http://localhost:3000/proxy/services/main/llama31-service/v1/chat/completions \
     -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer &lt;dstack token&gt;' \
     -d '{
@@ -119,10 +119,15 @@ $ curl https://llama31-service.example.com/v1/chat/completions \
 Authorization can be disabled by setting [`auth`](reference/dstack.yml/service.md#authorization) to `false` in the
 service configuration file.
 
-### Gateway endpoint
+> When a [gateway](concepts/gateways.md) is configured, the service endpoint will be accessible at `https://<run name>.<gateway domain>`.
+
+### Model
+
+If the service defines the `model` property, the model can be accessed with
+the OpenAI-compatible endpoint at `<dstack server URL>/proxy/models/<project name>`,
+or via the control plane UI's playground.
 
-In case the service has the [model mapping](reference/dstack.yml/service.md#model-mapping) configured, you will also be
-able to access the model at `https://gateway.<gateway domain>` via the OpenAI-compatible interface.
+> When a [gateway](concepts/gateways.md) is configured, the OpenAI-compatible endpoint is available at `https://gateway.<gateway domain>`.
 
 ## Manage runs