Skip to content

[Feature]: GCP Hyperdisk Volumes #2588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dakota-shop opened this issue Apr 30, 2025 · 5 comments
Open

[Feature]: GCP Hyperdisk Volumes #2588

dakota-shop opened this issue Apr 30, 2025 · 5 comments
Labels

Comments

@dakota-shop
Copy link

dakota-shop commented Apr 30, 2025

Problem

Currently, Dstack only supports using pd-balanced GCE Disks. For higher performance persistent storage on GCP, we want the ability to specify the underlying disk type (e.g. hyperdisk-balanced).

Solution

Add the ability to specify the Volume type that the Backend can interpolate into the GCE disk type (instead of hardcoding pd-balanced).

Workaround

No response

Would you like to help us implement this feature by sending a PR?

Yes

@dakota-shop
Copy link
Author

Simple Proposal

  • VolumeConfiguration gets a new volume_type attribute of type VolumeType, which is a new Enum class containing non-default volume types. Entries should be prefixed with the backend name (e.g. gcp_hyperdisk_balanced) since each volume type will require backend-specific support.
  • Example:
type: volume
name: myhypervol
backend: gcp
region: us-central1
size: 100GB
volume_type: gcp_hyperdisk_balanced
  • volume_type is optional and defaults to None, which means "use the backend's default volume implementation".
  • Add validation logic on the Volume that prevents VolumeType entries from being specified on an incompatible backend. There are many possible approaches for this:
    • Validate based on the enum value prefix (or VolumeType could be implemented as a richer datatype that refers to the Backend Enum)
    • Delegate VolumeType compatibility validation to the Backend ABC - the default implementation supports no VolumeTypes besides the backend-specific default (volume_type = None).

This is easy to implement, but I wonder whether there is a more robust, generalizable way to specify backend-specific resource options.

Extensible Proposal

  • Create a field containing backend-specific volume options.
  • Example:
type: volume
name: myhypervol
backend: gcp
region: us-central1
size: 100GB
backend_options:
  gcp:
    volume_type: hyperdisk-balanced
  • Because options are explicitly backend-specific, it is easy to delegate options validation to the Backend implementation.
  • The data type of the proposed backend_options field could be Dict[str, BackendSpecificVolumeOptions], where BackendSpecificVolumeOptions could either be truly Backend-specific (i.e. Text or similar to Dstack Core) or could be implemented naively as Dict[str, str].
  • This approach has the advantage of potentially being reusable for other resource types.

@r4victor
Copy link
Collaborator

r4victor commented May 5, 2025

@dakota-shop, thanks for a detailed proposal! We should go with Extensible since we plan to support more backend-specific volume options (e.g. various parameters for AWS EBS volumes), and it would be much better covered by Extensible than Simple.

Some comments on how I'd implement it:

  • Rename backend_options to options to keep it concise. All non-backend-specific parameters are top-level anyway.

  • We don't have backend-specific options in other configurations yet, but we do have different backend configs, and that can be taken for reference. Every backend can define *VolumeOptions models with type/volume_type:

    options:
      type: hyperdisk
      variant: balanced

    These are to be aggregated into VolumeOptions union and it will be parsed using options.type as discriminator. The dstack core needs to know about backend-specific *VolumeOptions models so that the API reference and json schema (configurations autocomplete) automatically document options structure.

  • options.type must be unique, so we can prefix it with backend name type: gcp_hyperdisk. But for major clouds, they have different names for their persistence solutions, so we can go without a prefix as well.

  • For volumes in particular, we don't need to specify multiple options.type for different backends (at least for now) since every configurations specifies one backend. But it may be needed in the future or for other configuration types. We can then extend options to list:

      options:
        - type: hyperdisk
          variant: balanced
        - type: ebs
          variant: io2

    A backend should have no problem filtering out its options.

  • Not sure if it should be type: hyperdisk with subtypes like variant: balanced or type: hyperdisk_balanced but I'm inclined to former since we'll support different types like persistent_disk, hyperdisk, filestore, etc, and I think it would be easier to reason about than having multiple variations of each in type.

@r4victor
Copy link
Collaborator

r4victor commented May 5, 2025

@dakota-shop, Speaking of hyperdisks specifically, are you planning to support only Hyperdisk Balanced or other types like Hyperdisk ML as well? Since one of the core features of hyperdisks is configurable IOPS/throughput, do you want to make it configurable via options or start with some sane defaults?

@dakota-shop
Copy link
Author

we don't need to specify multiple options.type for different backends (at least for now) since every configurations specifies one backend. But it may be needed in the future or for other configuration types. We can then extend options to list

Just to confirm, I think you're suggesting we have VolumeOptions support a single type for now and refactor this later if we need to specify multiple; did I understand correctly?

But for major clouds, they have different names for their persistence solutions, so we can go without a prefix as well

If we only implement VolumeOptions for some major providers in the short term, this may not be an issue we need to deal with right now; but I wonder how annoying it will be to refactor in the future (or support aliases, etc) if Dstack wants to add support for volume types on 2 clouds that happen to have named their special disk solution the same thing.

Since one of the core features of hyperdisks is configurable IOPS/throughput, do you want to make it configurable via options or start with some sane defaults?

I think it makes sense to expose Hyperdisk options in the VolumeOptions.

A backend should have no problem filtering out its options.

I'm curious if you had a preferred implementation in mind for this. I can imagine adding a method to the ABC for Backend that returns the types that backend supports. I can also imagine adding some static information to e.g. the hyperdisk-specific union type member. I'm thinking the latter is cleaner, but an explicitly-typed approach might require that each union type member extends from some ABC that specifies a getter for the affiliated Backend.

@r4victor
Copy link
Collaborator

r4victor commented May 7, 2025

Just to confirm, I think you're suggesting we have VolumeOptions support a single type for now and refactor this later if we need to specify multiple; did I understand correctly?

Yes, we'd support new types by adding *VolumeOptions models to the Union in backends/models.py:

from dstack._internal.core.backends.gcp.models import (
    GCPHyperdiskVolumeOptions,
)

AnyVolumeOptions = Union[GCPHyperdiskVolumeOptions, ...]
# If there is only one type, use type alias instead of Union:
# AnyVolumeOptions  = GCPHyperdiskVolumeOptions

In gcp/models.py, you'd define:

class GCPHyperdiskVolumeOptions(CoreModel):
    type: Literal["hyperdisk"] = "hyperdisk"
    # oips, throughput, etc

but I wonder how annoying it will be to refactor in the future (or support aliases, etc) if Dstack wants to add support for volume types on 2 clouds that happen to have named their special disk solution the same thing.

We could work around this later by using prefixes only when they are necessary and/or adding aliases for those without prefixes. But to make the implementation more straightforward, let's always use prefixes like "gcp_hyperdisk", "gcp_persistent_disk", "aws_ebs", etc – I don't have a strong stance on this.

A backend should have no problem filtering out its options.

I'm curious if you had a preferred implementation in mind for this. I can imagine adding a method to the ABC for Backend that returns the types that backend supports.

For volumes, now, we don't need to introduce any new interfaces. The backend will accept VolumeConfiguration.options in create_volume() and will simply check that the type does not belong to other backend and raise an error if necessary.

If options becomes a list in the future, the backend would simply iterate over the list to find its options, e.g. in create_volume().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants