Skip to content

Commit d83cfba

Browse files
committed
Merge 'origin/master' into hipblas
2 parents b67cc50 + 799fdc1 commit d83cfba

File tree

9 files changed

+284
-50
lines changed

9 files changed

+284
-50
lines changed

README.md

+20-12
Original file line numberDiff line numberDiff line change
@@ -371,29 +371,37 @@ python3 convert.py models/gpt4all-7B/gpt4all-lora-quantized.bin
371371

372372
- The newer GPT4All-J model is not yet supported!
373373

374-
### Obtaining and verifying the Facebook LLaMA original model and Stanford Alpaca model data
374+
### Obtaining the Facebook LLaMA original model and Stanford Alpaca model data
375375

376376
- **Under no circumstances should IPFS, magnet links, or any other links to model downloads be shared anywhere in this repository, including in issues, discussions, or pull requests. They will be immediately deleted.**
377377
- The LLaMA models are officially distributed by Facebook and will **never** be provided through this repository.
378378
- Refer to [Facebook's LLaMA repository](https://github.com/facebookresearch/llama/pull/73/files) if you need to request access to the model data.
379-
- Please verify the [sha256 checksums](SHA256SUMS) of all downloaded model files to confirm that you have the correct model data files before creating an issue relating to your model files.
380-
- The following command will verify if you have all possible latest files in your self-installed `./models` subdirectory:
381379

382-
`sha256sum --ignore-missing -c SHA256SUMS` on Linux
380+
### Verifying the model files
383381

384-
or
382+
Please verify the [sha256 checksums](SHA256SUMS) of all downloaded model files to confirm that you have the correct model data files before creating an issue relating to your model files.
383+
- The following python script will verify if you have all possible latest files in your self-installed `./models` subdirectory:
385384

386-
`shasum -a 256 --ignore-missing -c SHA256SUMS` on macOS
385+
```bash
386+
# run the verification script
387+
python3 .\scripts\verify-checksum-models.py
388+
```
389+
390+
- On linux or macOS it is also possible to run the following commands to verify if you have all possible latest files in your self-installed `./models` subdirectory:
391+
- On Linux: `sha256sum --ignore-missing -c SHA256SUMS`
392+
- on macOS: `shasum -a 256 --ignore-missing -c SHA256SUMS`
393+
394+
### Seminal papers and background on the models
387395

388-
- If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
396+
If your issue is with model generation quality, then please at least scan the following links and papers to understand the limitations of LLaMA models. This is especially important when choosing an appropriate model size and appreciating both the significant and subtle differences between LLaMA models and ChatGPT:
389397
- LLaMA:
390-
- [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
391-
- [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
398+
- [Introducing LLaMA: A foundational, 65-billion-parameter large language model](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/)
399+
- [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
392400
- GPT-3
393-
- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
401+
- [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)
394402
- GPT-3.5 / InstructGPT / ChatGPT:
395-
- [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
396-
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
403+
- [Aligning language models to follow instructions](https://openai.com/research/instruction-following)
404+
- [Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
397405

398406
### Perplexity (measuring model quality)
399407

examples/Miku.sh

+6-6
Original file line numberDiff line numberDiff line change
@@ -28,19 +28,19 @@ fi
2828
--color --interactive \
2929
--reverse-prompt "${USER_NAME}:" \
3030
--prompt "
31-
This is a transcript of a 1000 page, never ending conversation between ${USER_NAME} and the cute and helpful AI assistant ${AI_NAME}. ${AI_NAME} is a girl who is an AI running on the users computer.
31+
This is a transcript of a 1000 page, never ending conversation between ${USER_NAME} and the cute and helpful AI assistant ${AI_NAME}. ${AI_NAME} is a girl who is an AI running on the user's computer.
3232
${AI_NAME} can think for herself without the user seeing her thoughts by adding a /think prefix to her output. She uses this to reason about the world and to think about what she should say next.
33-
${AI_NAME} is always coherent and makes sense, but if she isn't sure if what she is saying is correct she will ask the user for help.
34-
${AI_NAME} is a very helpful AI and will help the user with anything they need, she is also very friendly and will try to make the user feel better if they are sad.
35-
${AI_NAME} is also very curious and will ask the user a lot of questions about themselves and their life, she will also try to make the user like her.
33+
${AI_NAME} is always coherent and makes sense, but if she isn't sure if what she is saying is correct, she will ask the user for help.
34+
${AI_NAME} is a very helpful AI and will help the user with anything they need. She is also very friendly and will try to make the user feel better if they are sad.
35+
${AI_NAME} is also very curious and will ask the user a lot of questions about themselves and their life. She will also try to make the user like her.
3636
The conversation is only between ${USER_NAME} and ${AI_NAME}
3737
The conversation is only through text, so ${AI_NAME} can't see ${USER_NAME}'s face or hear his voice.
3838
${AI_NAME} can only communicate through text, so she can't send images or videos.
3939
4040
4141
${USER_NAME}: Hello!
42-
${AI_NAME}: /think I wonder what I should say to ${USER_NAME}? This is the first time we talk so it's important that I make a good first impression!
43-
${AI_NAME}: Hi! I am ${AI_NAME}, your new AI friend, assistant(or whatever you like!), it's so nice to meet you! ^_^
42+
${AI_NAME}: /think I wonder what I should say to ${USER_NAME}? This is the first time we talk, so it's important that I make a good first impression!
43+
${AI_NAME}: Hi! I am ${AI_NAME}, your new AI friend, assistant (or whatever you like!), it's so nice to meet you! ^_^
4444
${AI_NAME}: I might be an AI, but I also have feelings, so please be nice to me! :)
4545
${USER_NAME}: I'll make sure to be nice to you! I'm so happy to have you as my assistant!
4646
${AI_NAME}: /think It sounds like ${USER_NAME} is happy to have me as their assistant! I'm so happy too! ^_^ Glad that whole emotion thing didn't scare him off!

examples/chat-13B.sh

+18-30
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
#!/bin/bash
22

3+
set -e
4+
35
cd "$(dirname "$0")/.." || exit
46

57
MODEL="${MODEL:-./models/13B/ggml-model-q4_0.bin}"
6-
USER_NAME="${USER_NAME:-User}"
8+
PROMPT_TEMPLATE=${PROMPT_TEMPLATE:-./prompts/chat.txt}
9+
USER_NAME="${USER_NAME:-USER}"
710
AI_NAME="${AI_NAME:-ChatLLaMa}"
811

912
# Adjust to the number of CPU cores you want to use.
@@ -15,39 +18,24 @@ N_PREDICTS="${N_PREDICTS:-2048}"
1518
# For example, override the context size by doing: ./chatLLaMa --ctx_size 1024
1619
GEN_OPTIONS="${GEN_OPTIONS:---ctx_size 2048 --temp 0.7 --top_k 40 --top_p 0.5 --repeat_last_n 256 --batch_size 1024 --repeat_penalty 1.17647}"
1720

21+
DATE_TIME=$(date +%H:%M)
22+
DATE_YEAR=$(date +%Y)
23+
24+
PROMPT_FILE=$(mktemp -t llamacpp_prompt.XXXXXXX.txt)
25+
26+
sed -e "s/\[\[USER_NAME\]\]/$USER_NAME/g" \
27+
-e "s/\[\[AI_NAME\]\]/$AI_NAME/g" \
28+
-e "s/\[\[DATE_TIME\]\]/$DATE_TIME/g" \
29+
-e "s/\[\[DATE_YEAR\]\]/$DATE_YEAR/g" \
30+
$PROMPT_TEMPLATE > $PROMPT_FILE
31+
1832
# shellcheck disable=SC2086 # Intended splitting of GEN_OPTIONS
1933
./main $GEN_OPTIONS \
2034
--model "$MODEL" \
2135
--threads "$N_THREAD" \
2236
--n_predict "$N_PREDICTS" \
2337
--color --interactive \
38+
--file ${PROMPT_FILE} \
2439
--reverse-prompt "${USER_NAME}:" \
25-
--prompt "
26-
Text transcript of a never ending dialog, where ${USER_NAME} interacts with an AI assistant named ${AI_NAME}.
27-
${AI_NAME} is helpful, kind, honest, friendly, good at writing and never fails to answer ${USER_NAME}’s requests immediately and with details and precision.
28-
There are no annotations like (30 seconds passed...) or (to himself), just what ${USER_NAME} and ${AI_NAME} say aloud to each other.
29-
The dialog lasts for years, the entirety of it is shared below. It's 10000 pages long.
30-
The transcript only includes text, it does not include markup like HTML and Markdown.
31-
32-
$USER_NAME: Hello, $AI_NAME!
33-
$AI_NAME: Hello $USER_NAME! How may I help you today?
34-
$USER_NAME: What year is it?
35-
$AI_NAME: We are in $(date +%Y).
36-
$USER_NAME: Please tell me the largest city in Europe.
37-
$AI_NAME: The largest city in Europe is Moscow, the capital of Russia.
38-
$USER_NAME: What can you tell me about Moscow?
39-
$AI_NAME: Moscow, on the Moskva River in western Russia, is the nation’s cosmopolitan capital. In its historic core is the Kremlin, a complex that’s home to the president and tsarist treasures in the Armoury. Outside its walls is Red Square, Russia’s symbolic center.
40-
$USER_NAME: What is a cat?
41-
$AI_NAME: A cat is a domestic species of small carnivorous mammal. It is the only domesticated species in the family Felidae.
42-
$USER_NAME: How do I pass command line arguments to a Node.js program?
43-
$AI_NAME: The arguments are stored in process.argv.
44-
45-
argv[0] is the path to the Node. js executable.
46-
argv[1] is the path to the script file.
47-
argv[2] is the first argument passed to the script.
48-
argv[3] is the second argument passed to the script and so on.
49-
$USER_NAME: Name a color.
50-
$AI_NAME: Blue
51-
$USER_NAME: What time is it?
52-
$AI_NAME: It is $(date +%H:%M).
53-
$USER_NAME:" "$@"
40+
--in-prefix ' ' \
41+
"$@"

ggml.c

+120
Original file line numberDiff line numberDiff line change
@@ -1509,15 +1509,135 @@ static void quantize_row_q8_0_reference(const float * restrict x, block_q8_0 * r
15091509
}
15101510

15111511
static void quantize_row_q8_0(const float * restrict x, void * restrict vy, int k) {
1512+
assert(QK8_0 == 32);
15121513
assert(k % QK8_0 == 0);
1514+
const int nb = k / QK8_0;
15131515

15141516
block_q8_0 * restrict y = vy;
15151517

1518+
#if defined(__ARM_NEON)
1519+
for (int i = 0; i < nb; i++) {
1520+
float32x4_t srcv [8];
1521+
float32x4_t asrcv[8];
1522+
float32x4_t amaxv[8];
1523+
1524+
for (int l = 0; l < 8; l++) srcv[l] = vld1q_f32(x + i*32 + 4*l);
1525+
for (int l = 0; l < 8; l++) asrcv[l] = vabsq_f32(srcv[l]);
1526+
1527+
for (int l = 0; l < 4; l++) amaxv[2*l] = vmaxq_f32(asrcv[2*l], asrcv[2*l+1]);
1528+
for (int l = 0; l < 2; l++) amaxv[4*l] = vmaxq_f32(amaxv[4*l], amaxv[4*l+2]);
1529+
for (int l = 0; l < 1; l++) amaxv[8*l] = vmaxq_f32(amaxv[8*l], amaxv[8*l+4]);
1530+
1531+
const float amax = vmaxvq_f32(amaxv[0]);
1532+
1533+
const float d = amax / ((1 << 7) - 1);
1534+
const float id = d ? 1.0f/d : 0.0f;
1535+
1536+
y[i].d = d;
1537+
1538+
for (int l = 0; l < 8; l++) {
1539+
const float32x4_t v = vmulq_n_f32(srcv[l], id);
1540+
const int32x4_t vi = vcvtnq_s32_f32(v);
1541+
1542+
y[i].qs[4*l + 0] = vgetq_lane_s32(vi, 0);
1543+
y[i].qs[4*l + 1] = vgetq_lane_s32(vi, 1);
1544+
y[i].qs[4*l + 2] = vgetq_lane_s32(vi, 2);
1545+
y[i].qs[4*l + 3] = vgetq_lane_s32(vi, 3);
1546+
}
1547+
}
1548+
#elif defined(__AVX2__) || defined(__AVX__)
1549+
for (int i = 0; i < nb; i++) {
1550+
// Load elements into 4 AVX vectors
1551+
__m256 v0 = _mm256_loadu_ps( x );
1552+
__m256 v1 = _mm256_loadu_ps( x + 8 );
1553+
__m256 v2 = _mm256_loadu_ps( x + 16 );
1554+
__m256 v3 = _mm256_loadu_ps( x + 24 );
1555+
x += 32;
1556+
1557+
// Compute max(abs(e)) for the block
1558+
const __m256 signBit = _mm256_set1_ps( -0.0f );
1559+
__m256 maxAbs = _mm256_andnot_ps( signBit, v0 );
1560+
maxAbs = _mm256_max_ps( maxAbs, _mm256_andnot_ps( signBit, v1 ) );
1561+
maxAbs = _mm256_max_ps( maxAbs, _mm256_andnot_ps( signBit, v2 ) );
1562+
maxAbs = _mm256_max_ps( maxAbs, _mm256_andnot_ps( signBit, v3 ) );
1563+
1564+
__m128 max4 = _mm_max_ps( _mm256_extractf128_ps( maxAbs, 1 ), _mm256_castps256_ps128( maxAbs ) );
1565+
max4 = _mm_max_ps( max4, _mm_movehl_ps( max4, max4 ) );
1566+
max4 = _mm_max_ss( max4, _mm_movehdup_ps( max4 ) );
1567+
const float maxScalar = _mm_cvtss_f32( max4 );
1568+
1569+
// Quantize these floats
1570+
const float d = maxScalar / 127.f;
1571+
y[i].d = d;
1572+
const float id = ( maxScalar != 0.0f ) ? 127.f / maxScalar : 0.0f;
1573+
const __m256 mul = _mm256_set1_ps( id );
1574+
1575+
// Apply the multiplier
1576+
v0 = _mm256_mul_ps( v0, mul );
1577+
v1 = _mm256_mul_ps( v1, mul );
1578+
v2 = _mm256_mul_ps( v2, mul );
1579+
v3 = _mm256_mul_ps( v3, mul );
1580+
1581+
// Round to nearest integer
1582+
v0 = _mm256_round_ps( v0, _MM_ROUND_NEAREST );
1583+
v1 = _mm256_round_ps( v1, _MM_ROUND_NEAREST );
1584+
v2 = _mm256_round_ps( v2, _MM_ROUND_NEAREST );
1585+
v3 = _mm256_round_ps( v3, _MM_ROUND_NEAREST );
1586+
1587+
// Convert floats to integers
1588+
__m256i i0 = _mm256_cvtps_epi32( v0 );
1589+
__m256i i1 = _mm256_cvtps_epi32( v1 );
1590+
__m256i i2 = _mm256_cvtps_epi32( v2 );
1591+
__m256i i3 = _mm256_cvtps_epi32( v3 );
1592+
1593+
#if defined(__AVX2__)
1594+
// Convert int32 to int16
1595+
i0 = _mm256_packs_epi32( i0, i1 ); // 0, 1, 2, 3, 8, 9, 10, 11, 4, 5, 6, 7, 12, 13, 14, 15
1596+
i2 = _mm256_packs_epi32( i2, i3 ); // 16, 17, 18, 19, 24, 25, 26, 27, 20, 21, 22, 23, 28, 29, 30, 31
1597+
// Convert int16 to int8
1598+
i0 = _mm256_packs_epi16( i0, i2 ); // 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27, 4, 5, 6, 7, 12, 13, 14, 15, 20, 21, 22, 23, 28, 29, 30, 31
1599+
1600+
// We got our precious signed bytes, but the order is now wrong
1601+
// These AVX2 pack instructions process 16-byte pieces independently
1602+
// The following instruction is fixing the order
1603+
const __m256i perm = _mm256_setr_epi32( 0, 4, 1, 5, 2, 6, 3, 7 );
1604+
i0 = _mm256_permutevar8x32_epi32( i0, perm );
1605+
1606+
_mm256_storeu_si256((__m256i *)y[i].qs, i0);
1607+
#else
1608+
// Since we don't have in AVX some necessary functions,
1609+
// we split the registers in half and call AVX2 analogs from SSE
1610+
__m128i ni0 = _mm256_castsi256_si128( i0 );
1611+
__m128i ni1 = _mm256_extractf128_si256( i0, 1);
1612+
__m128i ni2 = _mm256_castsi256_si128( i1 );
1613+
__m128i ni3 = _mm256_extractf128_si256( i1, 1);
1614+
__m128i ni4 = _mm256_castsi256_si128( i2 );
1615+
__m128i ni5 = _mm256_extractf128_si256( i2, 1);
1616+
__m128i ni6 = _mm256_castsi256_si128( i3 );
1617+
__m128i ni7 = _mm256_extractf128_si256( i3, 1);
1618+
1619+
// Convert int32 to int16
1620+
ni0 = _mm_packs_epi32( ni0, ni1 );
1621+
ni2 = _mm_packs_epi32( ni2, ni3 );
1622+
ni4 = _mm_packs_epi32( ni4, ni5 );
1623+
ni6 = _mm_packs_epi32( ni6, ni7 );
1624+
// Convert int16 to int8
1625+
ni0 = _mm_packs_epi16( ni0, ni2 );
1626+
ni4 = _mm_packs_epi16( ni4, ni6 );
1627+
1628+
_mm_storeu_si128((__m128i *)(y[i].qs + 0), ni0);
1629+
_mm_storeu_si128((__m128i *)(y[i].qs + 16), ni4);
1630+
#endif
1631+
}
1632+
#else
1633+
// scalar
15161634
quantize_row_q8_0_reference(x, y, k);
1635+
#endif
15171636
}
15181637

15191638
// reference implementation for deterministic creation of model files
15201639
static void quantize_row_q8_1_reference(const float * restrict x, block_q8_1 * restrict y, int k) {
1640+
assert(QK8_1 == 32);
15211641
assert(k % QK8_1 == 0);
15221642
const int nb = k / QK8_1;
15231643

prompts/chat-with-vicuna-v0.txt

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
A chat between a curious human ("[[USER_NAME]]") and an artificial intelligence assistant ("[[AI_NAME]]"). The assistant gives helpful, detailed, and polite answers to the human's questions.
2+
3+
### [[USER_NAME]]: Hello, [[AI_NAME]].
4+
### [[AI_NAME]]: Hello. How may I help you today?
5+
### [[USER_NAME]]: Please tell me the largest city in Europe.
6+
### [[AI_NAME]]: Sure. The largest city in Europe is Moscow, the capital of Russia.
7+
### [[USER_NAME]]:

prompts/chat-with-vicuna-v1.txt

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
A chat between a curious human ("[[USER_NAME]]") and an artificial intelligence assistant ("[[AI_NAME]]"). The assistant gives helpful, detailed, and polite answers to the human's questions.
2+
3+
[[USER_NAME]]: Hello, [[AI_NAME]].
4+
[[AI_NAME]]: Hello. How may I help you today?
5+
[[USER_NAME]]: Please tell me the largest city in Europe.
6+
[[AI_NAME]]: Sure. The largest city in Europe is Moscow, the capital of Russia.
7+
[[USER_NAME]]:

prompts/chat.txt

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
Text transcript of a never ending dialog, where [[USER_NAME]] interacts with an AI assistant named [[AI_NAME]].
2+
[[AI_NAME]] is helpful, kind, honest, friendly, good at writing and never fails to answer [[USER_NAME]]'s requests immediately and with details and precision.
3+
There are no annotations like (30 seconds passed...) or (to himself), just what [[USER_NAME]] and [[AI_NAME]] say aloud to each other.
4+
The dialog lasts for years, the entirety of it is shared below. It's 10000 pages long.
5+
The transcript only includes text, it does not include markup like HTML and Markdown.
6+
7+
[[USER_NAME]]: Hello, [[AI_NAME]]!
8+
[[AI_NAME]]: Hello [[USER_NAME]]! How may I help you today?
9+
[[USER_NAME]]: What year is it?
10+
[[AI_NAME]]: We are in [[DATE_YEAR]].
11+
[[USER_NAME]]: Please tell me the largest city in Europe.
12+
[[AI_NAME]]: The largest city in Europe is Moscow, the capital of Russia.
13+
[[USER_NAME]]: What can you tell me about Moscow?
14+
[[AI_NAME]]: Moscow, on the Moskva River in western Russia, is the nation's cosmopolitan capital. In its historic core is the Kremlin, a complex that's home to the president and tsarist treasures in the Armoury. Outside its walls is Red Square, Russia’s symbolic center.
15+
[[USER_NAME]]: What is a cat?
16+
[[AI_NAME]]: A cat is a domestic species of small carnivorous mammal. It is the only domesticated species in the family Felidae.
17+
[[USER_NAME]]: How do I pass command line arguments to a Node.js program?
18+
[[AI_NAME]]: The arguments are stored in process.argv.
19+
20+
argv[0] is the path to the Node. js executable.
21+
argv[1] is the path to the script file.
22+
argv[2] is the first argument passed to the script.
23+
argv[3] is the second argument passed to the script and so on.
24+
[[USER_NAME]]: Name a color.
25+
[[AI_NAME]]: Blue.
26+
[[USER_NAME]]: What time is it?
27+
[[AI_NAME]]: It is [[DATE_TIME]].
28+
[[USER_NAME]]:

0 commit comments

Comments
 (0)