fix: don't add space after special tokens in SPM #7697

giladgd · 2024-06-02T16:59:16Z

There's an issue where tokenizing a text with special tokens introduces an additional space after special tokens.

For example, using this model to tokenize <|from|>system with special tokens enabled and detokenizing it back
resulted in <|from|> system.

This PR fixes that.

Fixes #7629

giladgd · 2024-06-02T16:59:53Z

@jaime-m-p I found that this issue was introduced in #7375; I think this was a mistake.
Can you please check?

jaime-m-p · 2024-06-02T19:57:49Z

@giladgd Indeed, this fixes the problem for this concrete issue, but unfortunately breaks other models.

The problem is the rtrim property of some special tokens.
The model functionary-small-v2.2 has all rtrim and ltrim to false.
So I guess it will work just setting this variable to false:
https://github.com/ggerganov/llama.cpp/blob/fb1fef99622c6c8004e974b6d2a765a758e96400/llama.cpp#L13364

I already have a PR trying to fix this: #7685.
I tested the functionary-small-v2.2 tokenizer and seems to work correctly.

Also note that the extra space when detokenizing is the expeced result:

tokenizer = AutoTokenizer.from_pretrained("./models/tokenizers/functionary-small-v2.2/")
text = "<|from|>system"
ids  = tokenizer.encode(text, add_special_tokens=False)
re   = tokenizer.decode(ids)
print(text)
print(re)

Output:
<|from|>system
<|from|> system

giladgd · 2024-06-02T20:31:11Z

@jaime-m-p I see what you mean.
I think preserving the original rtrim property for all special tokens inside a GGUF file and using it in the tokenizer is indeed the right solution, but ideally, it should be done in a non-breaking manner, so when rtrim is not present in the GGUF file (or has a specific setting like in #7685), the default should be false so that older models will continue functioning as they used to.

github-actions · 2024-06-03T14:16:14Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 537 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8692.47ms p(95)=21612.06ms fails=, finish reason: stop=479 truncated=58
Prompt processing (pp): avg=98.28tk/s p(95)=439.61tk/s
Token generation (tg): avg=32.03tk/s p(95)=46.25tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=fixSpmTokenizer commit=67a43f94fad6abbcd2c18f0e2837a5b35d224e46

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717423546 --> 1717424168
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 513.42, 513.42, 513.42, 513.42, 513.42, 676.59, 676.59, 676.59, 676.59, 676.59, 675.94, 675.94, 675.94, 675.94, 675.94, 666.37, 666.37, 666.37, 666.37, 666.37, 749.75, 749.75, 749.75, 749.75, 749.75, 747.59, 747.59, 747.59, 747.59, 747.59, 746.19, 746.19, 746.19, 746.19, 746.19, 773.6, 773.6, 773.6, 773.6, 773.6, 779.53, 779.53, 779.53, 779.53, 779.53, 798.76, 798.76, 798.76, 798.76, 798.76, 819.93, 819.93, 819.93, 819.93, 819.93, 824.4, 824.4, 824.4, 824.4, 824.4, 820.85, 820.85, 820.85, 820.85, 820.85, 828.92, 828.92, 828.92, 828.92, 828.92, 829.98, 829.98, 829.98, 829.98, 829.98, 836.88, 836.88, 836.88, 836.88, 836.88, 836.6, 836.6, 836.6, 836.6, 836.6, 833.31, 833.31, 833.31, 833.31, 833.31, 798.81, 798.81, 798.81, 798.81, 798.81, 800.14, 800.14, 800.14, 800.14, 800.14, 796.66, 796.66, 796.66, 796.66, 796.66, 802.3, 802.3, 802.3, 802.3, 802.3, 803.76, 803.76, 803.76, 803.76, 803.76, 809.5, 809.5, 809.5, 809.5, 809.5, 816.05, 816.05, 816.05, 816.05, 816.05, 819.07, 819.07, 819.07, 819.07, 819.07, 821.23, 821.23, 821.23, 821.23, 821.23, 832.07, 832.07, 832.07, 832.07, 832.07, 829.42, 829.42, 829.42, 829.42, 829.42, 829.66, 829.66, 829.66, 829.66, 829.66, 831.67, 831.67, 831.67, 831.67, 831.67, 835.97, 835.97, 835.97, 835.97, 835.97, 833.95, 833.95, 833.95, 833.95, 833.95, 836.58, 836.58, 836.58, 836.58, 836.58, 841.97, 841.97, 841.97, 841.97, 841.97, 853.69, 853.69, 853.69, 853.69, 853.69, 852.67, 852.67, 852.67, 852.67, 852.67, 855.34, 855.34, 855.34, 855.34, 855.34, 853.69, 853.69, 853.69, 853.69, 853.69, 853.39, 853.39, 853.39, 853.39, 853.39, 857.54, 857.54, 857.54, 857.54, 857.54, 859.31, 859.31, 859.31, 859.31, 859.31, 856.59, 856.59, 856.59, 856.59, 856.59, 833.65, 833.65, 833.65, 833.65, 833.65, 833.37, 833.37, 833.37, 833.37, 833.37, 832.57, 832.57, 832.57, 832.57, 832.57, 831.12, 831.12, 831.12, 831.12, 831.12, 838.12, 838.12, 838.12, 838.12, 838.12, 837.56, 837.56, 837.56, 837.56, 837.56, 839.06, 839.06, 839.06, 839.06, 839.06, 841.86, 841.86, 841.86, 841.86, 841.86, 844.6, 844.6, 844.6, 844.6, 844.6, 840.53, 840.53, 840.53, 840.53, 840.53, 837.89, 837.89, 837.89, 837.89, 837.89, 831.51, 831.51, 831.51, 831.51, 831.51, 832.59, 832.59, 832.59, 832.59, 832.59, 831.92, 831.92, 831.92, 831.92, 831.92, 832.72, 832.72, 832.72, 832.72, 832.72, 833.0, 833.0, 833.0, 833.0, 833.0, 834.97, 834.97, 834.97, 834.97, 834.97, 834.97, 834.97]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717423546 --> 1717424168
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.58, 40.58, 40.58, 40.58, 40.58, 31.61, 31.61, 31.61, 31.61, 31.61, 30.39, 30.39, 30.39, 30.39, 30.39, 25.45, 25.45, 25.45, 25.45, 25.45, 26.71, 26.71, 26.71, 26.71, 26.71, 27.32, 27.32, 27.32, 27.32, 27.32, 28.55, 28.55, 28.55, 28.55, 28.55, 30.04, 30.04, 30.04, 30.04, 30.04, 30.89, 30.89, 30.89, 30.89, 30.89, 31.59, 31.59, 31.59, 31.59, 31.59, 31.36, 31.36, 31.36, 31.36, 31.36, 31.22, 31.22, 31.22, 31.22, 31.22, 30.76, 30.76, 30.76, 30.76, 30.76, 30.61, 30.61, 30.61, 30.61, 30.61, 29.53, 29.53, 29.53, 29.53, 29.53, 29.43, 29.43, 29.43, 29.43, 29.43, 28.76, 28.76, 28.76, 28.76, 28.76, 29.1, 29.1, 29.1, 29.1, 29.1, 29.26, 29.26, 29.26, 29.26, 29.26, 29.15, 29.15, 29.15, 29.15, 29.15, 28.72, 28.72, 28.72, 28.72, 28.72, 28.56, 28.56, 28.56, 28.56, 28.56, 28.59, 28.59, 28.59, 28.59, 28.59, 28.85, 28.85, 28.85, 28.85, 28.85, 28.9, 28.9, 28.9, 28.9, 28.9, 29.2, 29.2, 29.2, 29.2, 29.2, 29.55, 29.55, 29.55, 29.55, 29.55, 29.52, 29.52, 29.52, 29.52, 29.52, 29.33, 29.33, 29.33, 29.33, 29.33, 29.38, 29.38, 29.38, 29.38, 29.38, 29.61, 29.61, 29.61, 29.61, 29.61, 29.74, 29.74, 29.74, 29.74, 29.74, 29.83, 29.83, 29.83, 29.83, 29.83, 30.01, 30.01, 30.01, 30.01, 30.01, 30.05, 30.05, 30.05, 30.05, 30.05, 30.09, 30.09, 30.09, 30.09, 30.09, 29.85, 29.85, 29.85, 29.85, 29.85, 29.83, 29.83, 29.83, 29.83, 29.83, 29.47, 29.47, 29.47, 29.47, 29.47, 29.4, 29.4, 29.4, 29.4, 29.4, 29.57, 29.57, 29.57, 29.57, 29.57, 29.69, 29.69, 29.69, 29.69, 29.69, 29.9, 29.9, 29.9, 29.9, 29.9, 29.63, 29.63, 29.63, 29.63, 29.63, 29.52, 29.52, 29.52, 29.52, 29.52, 29.24, 29.24, 29.24, 29.24, 29.24, 28.53, 28.53, 28.53, 28.53, 28.53, 28.47, 28.47, 28.47, 28.47, 28.47, 28.47, 28.47, 28.47, 28.47, 28.47, 28.51, 28.51, 28.51, 28.51, 28.51, 28.6, 28.6, 28.6, 28.6, 28.6, 28.69, 28.69, 28.69, 28.69, 28.69, 28.75, 28.75, 28.75, 28.75, 28.75, 28.73, 28.73, 28.73, 28.73, 28.73, 28.69, 28.69, 28.69, 28.69, 28.69, 28.59, 28.59, 28.59, 28.59, 28.59, 28.65, 28.65, 28.65, 28.65, 28.65, 28.8, 28.8, 28.8, 28.8, 28.8, 28.94, 28.94, 28.94, 28.94, 28.94, 29.01, 29.01, 29.01, 29.01, 29.01, 29.11, 29.11]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717423546 --> 1717424168
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.29, 0.29, 0.29, 0.29, 0.29, 0.46, 0.46, 0.46, 0.46, 0.46, 0.4, 0.4, 0.4, 0.4, 0.4, 0.09, 0.09, 0.09, 0.09, 0.09, 0.22, 0.22, 0.22, 0.22, 0.22, 0.25, 0.25, 0.25, 0.25, 0.25, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.23, 0.23, 0.23, 0.23, 0.23, 0.31, 0.31, 0.31, 0.31, 0.31, 0.28, 0.28, 0.28, 0.28, 0.28, 0.29, 0.29, 0.29, 0.29, 0.29, 0.42, 0.42, 0.42, 0.42, 0.42, 0.25, 0.25, 0.25, 0.25, 0.25, 0.33, 0.33, 0.33, 0.33, 0.33, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.31, 0.31, 0.31, 0.31, 0.31, 0.37, 0.37, 0.37, 0.37, 0.37, 0.34, 0.34, 0.34, 0.34, 0.34, 0.23, 0.23, 0.23, 0.23, 0.23, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.29, 0.29, 0.29, 0.29, 0.29, 0.24, 0.24, 0.24, 0.24, 0.24, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.22, 0.22, 0.22, 0.22, 0.22, 0.23, 0.23, 0.23, 0.23, 0.23, 0.36, 0.36, 0.36, 0.36, 0.36, 0.37, 0.37, 0.37, 0.37, 0.37, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.08, 0.08, 0.08, 0.08, 0.08, 0.13, 0.13, 0.13, 0.13, 0.13, 0.36, 0.36, 0.36, 0.36, 0.36, 0.56, 0.56, 0.56, 0.56, 0.56, 0.49, 0.49, 0.49, 0.49, 0.49, 0.44, 0.44, 0.44, 0.44, 0.44, 0.32, 0.32, 0.32, 0.32, 0.32, 0.27, 0.27, 0.27, 0.27, 0.27, 0.24, 0.24, 0.24, 0.24, 0.24, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.2, 0.2, 0.2, 0.2, 0.2, 0.36, 0.36, 0.36, 0.36, 0.36, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.21, 0.21, 0.21, 0.21, 0.21, 0.13, 0.13, 0.13, 0.13, 0.13, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.22]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717423546 --> 1717424168
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0]

jaime-m-p · 2024-06-04T17:38:56Z

the default should be false so that older models will continue functioning as they used to

@giladgd Fixed: 3b38d48

giladgd · 2024-06-04T21:09:55Z

the default should be false so that older models will continue functioning as they used to

@giladgd Fixed: 3b38d48

@jaime-m-p I've just checked it again with the old functionary-small-v2.2.q4_0.gguf model and the issue is still there; tokenizing <|from|>system with special tokens enabled and detokenizing it back still results in <|from|> system.
The latest Functionary models use a pre-tokenizer so this issue does not affect them; I use this old model as a test case for testing compatibility with old models

jaime-m-p · 2024-06-04T23:23:18Z

@giladgd, @snichols Sorry, I think I'm missing something.
When you say "it's wrong" you mean "is not the previous" behavior?
When I say "it's wrong" I mean the output does not match AutoTokenizer.

Maybe I'm making some mistake, but seems AutoTokenizer output is <|from|> system.
If this is not a valid assumption, then we need an extra variable telling with model is old or new and reintroduce previous behavior for old models.

I use this old model as a test case for testing compatibility with old models

I see. If AutoTokenizer is not our ground truth, do you know what is the ground truth implementatión? so I can compare and try to match outputs.

I generated vocab files from the link you provided (functionary-small-v2.2.q4_0.gguf) and the random brute-force test is ok (I mean, matches AutoTokenizer outputs).

shibe2 · 2024-06-05T03:26:55Z

Looking for ground truth is not the way. Theoretically, prompt format/tokenization should match one which was used for fine-tuning. That principle applies to space insertion too, and it may vary between fine-tunes of the same base model. But ultimately, what matters is which format works best with the given model. Sometimes a different format works better than model's original format. To achieve best results, one has to test the model, preferably on their particular use case, with and without spaces in different places. So it is best to leave the choice to the user. Now the problem is that if the space is inserted by llama.cpp, but is not needed, it is difficult to get rid of it.

I used to have a workaround to remove the unwanted space, but it was so ugly and annoying that at some point, I just uprooted the piece of code that inserts space, and I add it to the prompt on client side when needed.

ggerganov · 2024-06-05T06:30:26Z

We want to match AutoTokenizer as much as possible - it seems to be the best choice in the current ecosystem. Of course if some finetune chooses some arbitrary tokenization, then llama.cpp (and maybe transformers) would not be correct, but this is another issue

giladgd · 2024-06-09T23:18:10Z

I originally considered the change to the default tokenization a breaking change since it affected the generated outputs of older models, but I now understand that the previous default tokenization behavior was indeed incorrect.

@jaime-m-p Thanks for your help as I stumbled through this :)

fix: don't add space after special tokens when using SPM

fb1fef9

giladgd changed the title ~~fix: don't add space after special tokens when using SPM~~ fix: don't add space after special tokens in SPM Jun 2, 2024

giladgd mentioned this pull request Jun 2, 2024

Bug: SPM tokenization breaks in at least one specific case. #7629

Closed

test: fix tests

67a43f9

github-actions bot added examples server labels Jun 2, 2024

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Jun 3, 2024

giladgd closed this Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: don't add space after special tokens in SPM #7697

fix: don't add space after special tokens in SPM #7697

giladgd commented Jun 2, 2024 •

edited

Loading

giladgd commented Jun 2, 2024

jaime-m-p commented Jun 2, 2024

giladgd commented Jun 2, 2024

github-actions bot commented Jun 3, 2024

jaime-m-p commented Jun 4, 2024

giladgd commented Jun 4, 2024

jaime-m-p commented Jun 4, 2024

shibe2 commented Jun 5, 2024

ggerganov commented Jun 5, 2024

giladgd commented Jun 9, 2024 •

edited

Loading

fix: don't add space after special tokens in SPM #7697

fix: don't add space after special tokens in SPM #7697

Conversation

giladgd commented Jun 2, 2024 • edited Loading

giladgd commented Jun 2, 2024

jaime-m-p commented Jun 2, 2024

giladgd commented Jun 2, 2024

github-actions bot commented Jun 3, 2024

jaime-m-p commented Jun 4, 2024

giladgd commented Jun 4, 2024

jaime-m-p commented Jun 4, 2024

shibe2 commented Jun 5, 2024

ggerganov commented Jun 5, 2024

giladgd commented Jun 9, 2024 • edited Loading

giladgd commented Jun 2, 2024 •

edited

Loading

giladgd commented Jun 9, 2024 •

edited

Loading