-
Notifications
You must be signed in to change notification settings - Fork 245
Deployment functionality #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, @vishnubob. You're correct that Replicate doesn't currently expose any APIs for managing deployments. However, you can configure your deployment with a min / max number of concurrent predictions to handle, and the autoscaler will spin up and down down model instances based on inbound requests. |
Hi @mattt, thanks for your response. I am using replicate for an interactive photobooth, so my use case is a bit unusual. Since the installation is temporal, I only need the deployment while the installation is available. In order to reduce any latency, I standup a single node deployment while the installation is available, and spin down the nodes when I strike. However, it's a complicated installation, and I sometimes forget to spin down the deployments during strike, so I end up paying for idle deployments. Being able to automate the deployment from the software would be a huge win. For now, I have transitioned this part of the project to tailscale which lets me use my own server at home, but if I could automate the deployment, I would switch back to using replicate. |
Closing the loop on this — Replicate's API now supports creating and modifying deployments. Support for these endpoints was added to the Python client by #258, and is available in recent versions. |
I would like to be able to spin up and shutdown deployments from the API. From looking over the API and python client, this doesn’t seem possible. Am I missing something or would it be possible to add this functionality?
Thanks!
The text was updated successfully, but these errors were encountered: