Skip to content

Support Adept Persimmon 8b #3410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

phillip-kravtsov
Copy link
Contributor

@phillip-kravtsov phillip-kravtsov commented Sep 29, 2023

  • Adds Persimmon 8B which is, architecturally, a standard dense transformer with:
    • Q/K layernorm
    • Squared ReLU activations
    • partial RoPE
    • very large vocab size (most unused for text)

To support Partial RoPE & Squared ReLU, this PR adds concat & square kernels for metal.
I've confirmed agreement between the GGML & HF implementation up to tensor values in the last layer.

@ggerganov ggerganov added high priority Very important issue model Model specific labels Sep 30, 2023
@ggerganov
Copy link
Member

Let's resolve the CI fails and merge

@phillip-kravtsov phillip-kravtsov force-pushed the phillip-kravtsov/support-adept-persimmon-8b branch from 92acb44 to 5d259d3 Compare October 5, 2023 18:04
@ggerganov ggerganov merged commit 0e797c2 into ggml-org:master Oct 7, 2023
@slaren
Copy link
Member

slaren commented Oct 7, 2023

The switches in llm_load_hparams and llama_build_graph are missing breaks, so it should be using the refact graph. Does this work currently?

@ggerganov
Copy link
Member

@phillip-kravtsov PTAL at @slaren's comment and fix as necessary

@KerfuffleV2
Copy link
Collaborator

I got tired of seeing the compiler warning and created #3535 (not sure if there are any other issues, haven't had a chance to test it yet).

@phillip-kravtsov
Copy link
Contributor Author

Thanks for the fix @KerfuffleV2 -- that PR should be sufficient.

joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 12, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp:
  py : change version of numpy requirement to 1.24.4 (ggml-org#3515)
  quantize : fail fast on write errors (ggml-org#3521)
  metal : support default.metallib load & reuse code for swift package (ggml-org#3522)
  llm : support Adept Persimmon 8B (ggml-org#3410)
  Fix for ggml-org#3454 (ggml-org#3455)
  readme : update models, cuda + ppl instructions (ggml-org#3510)
  server : docs fix default values and add n_probs (ggml-org#3506)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority Very important issue model Model specific
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants