Skip to content

bug: cortex pull from hf times out #1017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #1077
oatmealm opened this issue Aug 14, 2024 · 8 comments
Closed
Tracked by #1077

bug: cortex pull from hf times out #1017

oatmealm opened this issue Aug 14, 2024 · 8 comments
Assignees
Labels
category: model management Model pull, yaml, model state P1: important Important feature / fix type: bug Something isn't working
Milestone

Comments

@oatmealm
Copy link

Describe the bug
When pulling BAAI/bge-reranker-v2-m3 reranking model progress bar stays at 0 forever

To Reproduce

cortex pull BAAI/* (any model)

Expected behavior

I'm expecting it to download the model and make it available locally

Desktop (please complete the following information):

Both macOS M2 and Linux Fedora

@oatmealm oatmealm added the type: bug Something isn't working label Aug 14, 2024
@oatmealm
Copy link
Author

Just seen this:

cortex pull BAAI/bge-m3                 
✔ Dependencies loaded in 274ms
✔ API server is online
Downloading model...
✔ Model downloaded
 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0% | ETA: 0s | 0/100TypeError: terminated
    at Fetch.onAborted (node:internal/deps/undici/undici:10815:53)
    at Fetch.emit (node:events:520:28)
    at Fetch.terminate (node:internal/deps/undici/undici:9973:14)
    at Object.onError (node:internal/deps/undici/undici:10927:38)
    at Request.onError (node:internal/deps/undici/undici:2079:31)
    at Object.errorRequest (node:internal/deps/undici/undici:1576:17)
    at Socket.<anonymous> (node:internal/deps/undici/undici:6045:16)
    at Socket.emit (node:events:532:35)
    at TCP.<anonymous> (node:net:337:12)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
  [cause]: BodyTimeoutError: Body Timeout Error
      at Timeout.onParserTimeout [as callback] (node:internal/deps/undici/undici:5979:32)
      at Timeout.onTimeout [as _onTimeout] (node:internal/deps/undici/undici:2356:17)
      at listOnTimeout (node:internal/timers:581:17)
      at process.processTimers (node:internal/timers:519:7) {
    code: 'UND_ERR_BODY_TIMEOUT'
  }
}

@oatmealm
Copy link
Author

Missed the point about GGUF, but seem to be an issue with some GGUF as well:

cortex pull pervll/bge-reranker-v2-gemma-Q4_K_M-GGUF    
✔ Dependencies loaded in 438ms
✔ API server is online
Downloading model...
✔ Model downloaded
 ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0% | ETA: 0s | 0/100

@simboonlong
Copy link

the model i was pulling is https://huggingface.co/cortexso/codestral, encountered similar problem:

model-timeout

errors out around 15min mark. i notice UND_ERR_BODY_TIMEOUT only happens to .guff files that are huge, cortexso/codestral is at around 13gb. perhaps longer timeout in @nestjs/axios?

@imtuyethan imtuyethan moved this to Icebox in Menlo Aug 28, 2024
@imtuyethan imtuyethan moved this from Icebox to Need Investigation in Menlo Sep 2, 2024
@freelerobot freelerobot added P1: important Important feature / fix category: model management Model pull, yaml, model state labels Sep 6, 2024
@freelerobot freelerobot changed the title bug: cortex pull of some hf models does nothing bug: cortex pull from hf timesout Sep 6, 2024
@freelerobot freelerobot changed the title bug: cortex pull from hf timesout bug: cortex pull from hf times out Sep 6, 2024
@freelerobot
Copy link
Contributor

Related: #3521 #3519

@github-project-automation github-project-automation bot moved this from Need Investigation to Completed in Menlo Sep 6, 2024
@dan-menlo dan-menlo reopened this Sep 6, 2024
@github-project-automation github-project-automation bot moved this from Completed to Scheduled in Menlo Sep 6, 2024
@dan-menlo
Copy link
Contributor

@namchuai I am scheduling this in Sprint 20 - I know this is a Cortex Platform issue, but we should make sure that this behavior does not occur for cortex.cpp.

Once verified, we can close

@dan-menlo
Copy link
Contributor

Linking this to #1077 main issue, and queuing for Sprint 20

@freelerobot
Copy link
Contributor

Related #1288

@dan-menlo
Copy link
Contributor

I am closing this, as it's covered by #1288

@gabrielle-ong gabrielle-ong moved this from Review + QA to Completed in Menlo Sep 30, 2024
@gabrielle-ong gabrielle-ong added this to the v1.0.0 milestone Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: model management Model pull, yaml, model state P1: important Important feature / fix type: bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

7 participants