Skip to content

[Bug report] Performance deterioration of LLaMA-2 model due to hardcoded rms_norm_eps  #2373

Closed
@xx205

Description

@xx205

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [Yes] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [Yes] I carefully followed the README.md.
  • [Yes] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [Yes] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

When running converted ggml model, the eps used in RMSNorm is consistent with original model definition.

Current Behavior

The norm_eps used in RMSNorm is hardcoded to 1e-6, in all backends: X86, CUDA, Metal.
Related commit: Change RMSNorm eps to 1e-6 #173 (22213a1)

Environment and Context

Recently I want to evaluate LLaMA-1 and LLaMA-2 models on MMLU (Measuring Massive Multitask Language Understanding, https://github.com/hendrycks/test) test set, and I chose llama.cpp as the inference engine.
The performance of LLaMA-1 models are nearly the same as the paper reported, but for LLaMA-2 7B and 13B models, they just got the LLaMA-1 7B level scores.
Then I check the model definitions of LLaMA-2 7B and 13B and found the “rms_norm_eps” in config.json is 1e-5 instead of 1e-6.
After recompiling the source code with the change of eps=1-5, the test results of LLaMA-2 models are finally looking good.

Related issue:
GGML model showing noticeable quality issues when compared to HF model #2354

Affected discussions:
LLaMA-2 Perplexities #2352
Presentation on llama.cpp on 25.07.2023 at karlsruhe.ai #2281

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions