-
Notifications
You must be signed in to change notification settings - Fork 164
epic: Improve Cortex Engine Management #1416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
User Journey
|
@dan-homebrew, here's my opinion about this task. I think should consolidate both #1453 and #1454 into this ticket.
So, I think we need a solution at runtime, detect current hardware state and automatically try to choose the best engine possible. We also have to provide a user a way to specify the engine version and variant that user want to run their model. ApproachI fully agree with option 1 you provided over #1453 File system structure
Tasks breakdown
Edge cases
|
@namchuai Merging "Cortex handles Engine Variants" into this issue: Cortex handles Engine Variants
Tasklist
QuestionsScenario
Cortex needs an elegant way to handle different engine versions + variants, without confusing the user. From my naive perspective, there are two key approaches Option 1: Every engine is versioned, and maintains a list of variants that it can use
> cortex engines get llama.cpp
{
version: b3919
...
}
> cortex engines llama.cpp variants list
llama-b3912-bin-win-hip-x64-gfx1030
llama-b3912-bin-win-cuda-cu11.7.1-x64
> cortex engines llama.cpp use llama-b3912-bin-win-cuda-cu11.7.1 Option 2: Every engine version/variant is a first-class Engine citizen
> cortex engines list
llama.cpp-b3919-cuda
llama.cpp-b3821-vulkan |
@namchuai Merging "Cortex handles Engine Versions": Cortex handles Engine Versions
Tasklist
DesignAPICLI# CLI
> cortex engines update llama.cpp
# API
POST /engines/{engine}/update Open question: should we allow users to run different versions of llama.cpp? > cortex engines llama.cpp versions
1. b3919
2. b3909
> cortex engines Release ManagementCortex Stable and Nightly defines a llama.cpp version that it supports
Cortex Nightly automatically pulls the latest llama.cpp, and forces us to fix it? |
Here's some draft. Will update it from time to time. Engine install
Will list stable release from Requirements
Sample output
Selected llama-cpp version: v0.1.36
Enter number to select: _
Question 1: how to know which engine is being set as used?
|
@namchuai For this issue, can you make sure we come up with a clear API first (e.g.
|
@dan-homebrew @gabrielle-ong , here's the APIs that I think we will have. Engine Management API DocumentationBasic Engine OperationsInstall Engine VariantPOST /engines/{engine_type}/{version}/{variant} Uninstall Engine VariantDELETE /engines/{engine_type}/{version}/{variant} List Installed Engine VariantsGET /engines/{engine_type} Response: [
{
"engine": "llama-cpp",
"name": "mac-arm64",
"version": "0.1.35-28.10.24"
},
{
"engine": "llama-cpp",
"name": "linux-amd64-avx",
"version": "0.1.35-27.10.24"
}
] Release InformationList Released Engine VersionsGET /engines/{engine_type}?release=true Response: [
{
"draft": false,
"name": "v0.1.37",
"prerelease": true,
"published_at": "2024-10-30T03:39:23Z",
"url": "https://api.github.com/repos/janhq/cortex.llamacpp/releases/182594588"
},
{
"draft": false,
"name": "v0.1.35-28.10.24",
"prerelease": true,
"published_at": "2024-10-28T17:30:48Z",
"url": "https://api.github.com/repos/janhq/cortex.llamacpp/releases/182309346"
}
] List Released Engine VariantsGET /engines/{engine_type}/{version} Response: [
{
"created_at": "2024-10-28T17:35:51Z",
"download_count": 0,
"name": "linux-amd64-avx-cuda-11-7",
"size": 151240428
},
{
"created_at": "2024-10-28T17:34:05Z",
"download_count": 0,
"name": "linux-amd64-avx",
"size": 1548720
}
] Default Engine ManagementGet Default Engine VariantGET /engines/{engine_type}/default Set Default Engine VariantPOST /engines/{engine_type}/{version}/{variant}/default Engine Runtime OperationsLoad EnginePOST /engines/{engine_type}/load Uses the variant set as default Unload EngineDELETE /engines/{engine_type}/load Uses the variant set as default Update EngineUpdate the current (default) engine variant to latest. POST /engines/{engine_type}/update Success response: {
"engine": "cortex.llamacpp",
"from": "v0.1.35-28.10.24",
"to": "0.1.35",
"variant": "mac-arm64"
} Failed response: {
"message": "Engine cortex.llamacpp, mac-arm64 is already up-to-date! Version v0.1.35"
} List All Engines and VariantsGet All EnginesGET /engines Response: {
"llama-cpp": [
{
"engine": "llama-cpp",
"name": "mac-arm64",
"version": "0.1.35-28.10.24"
},
{
"engine": "llama-cpp",
"name": "linux-amd64-avx",
"version": "0.1.35-27.10.24"
},
{
"engine": "llama-cpp",
"name": "linux-amd64-avx",
"version": "0.1.36"
},
{
"engine": "llama-cpp",
"name": "linux-amd64-avx2-cuda-12-0",
"version": "0.1.36"
}
],
"onnxruntime": [],
"tensorrt-llm": []
} |
@namchuai, @dan-homebrew: Issue:
|
|
Hi @namchuai, |
Hi @namchuai, summarising the list of issues:
I'll work on the docs: |
sorry @gabrielle-ong @TC117 for late response, I will work on this list. |
Thanks @namchuai! |
Also tracking this followup task for engines API endpoint (move params to body instead of path) |
Thanks @namchuai, marking as complete - released with Cortex 1.0.3 and Jan 0.5.9. |
Goal
Tasklist
Open Questions
Appendix
Improvements to #1072

The text was updated successfully, but these errors were encountered: