We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release: b1471
When I pass the command, with --gpu-layers 1, finetune should offload to GPU, . /Volumes/d/apps/llama.cpp/llama.cpp/finetune --model-base /Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf --gpu-layers 1 --checkpoint-in chk-ol3b-shakespeare-LATEST.gguf --checkpoint-out chk-ol3b-shakespeare-ITERATION.gguf --lora-out lora-ol3b-shakespeare-ITERATION.bin --train-data shakespeare.txt --save-every 10 --threads 4 --adam-iter 30 --batch 4 --ctx 64 --use-checkpointing
/Volumes/d/apps/llama.cpp/llama.cpp/finetune --model-base /Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf --gpu-layers 1 --checkpoint-in chk-ol3b-shakespeare-LATEST.gguf --checkpoint-out chk-ol3b-shakespeare-ITERATION.gguf --lora-out lora-ol3b-shakespeare-ITERATION.bin --train-data shakespeare.txt --save-every 10 --threads 4 --adam-iter 30 --batch 4 --ctx 64 --use-checkpointing
No, GPU used. All cores of P-CPU maxed out. Temperatures soaring. Check asitop, output.
Mac Mini, M2, 24GB Memory
Please help provide information about the failure / bug.
Volumes/d/apps/llama.cpp/llama.cpp/finetune --model-base /Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf --gpu-layers 25 --checkpoint-in chk-ol3b-shakespeare-LATEST.gguf --checkpoint-out chk-ol3b-shakespeare-ITERATION.gguf --lora-out lora-ol3b-shakespeare-ITERATION.bin --train-data shakespeare.txt --save-every 10 --threads 4 --adam-iter 30 --batch 4 --ctx 128 --use-checkpointing main: seed: 1698931925 main: model base = '/Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf' llama_model_loader: loaded meta data with 19 key-value pairs and 237 tensors from /Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf (version GGUF V3 (latest)) llama_model_loader: - tensor 0: token_embd.weight q8_0 [ 3200, 32000, 1, 1 ] llama_model_loader: - tensor 1: blk.0.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 2: blk.0.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 3: blk.0.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 4: blk.0.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 5: blk.0.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 6: blk.0.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 7: blk.0.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 8: blk.0.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 9: blk.0.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 10: blk.1.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 11: blk.1.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 12: blk.1.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 13: blk.1.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 14: blk.1.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 15: blk.1.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 16: blk.1.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 17: blk.1.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 18: blk.1.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 19: blk.2.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 20: blk.2.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 21: blk.2.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 22: blk.2.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 23: blk.2.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 24: blk.2.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 25: blk.2.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 26: blk.2.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 27: blk.2.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 28: blk.3.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 29: blk.3.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 30: blk.3.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 31: blk.3.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 32: blk.3.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 33: blk.3.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 34: blk.3.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 35: blk.3.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 36: blk.3.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 37: blk.4.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 38: blk.4.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 39: blk.4.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 40: blk.4.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 41: blk.4.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 42: blk.4.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 43: blk.4.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 44: blk.4.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 45: blk.4.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 46: blk.5.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 47: blk.5.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 48: blk.5.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 49: blk.5.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 50: blk.5.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 51: blk.5.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 52: blk.5.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 53: blk.5.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 54: blk.5.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 55: blk.6.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 56: blk.6.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 57: blk.6.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 58: blk.6.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 59: blk.6.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 60: blk.6.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 61: blk.6.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 62: blk.6.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 63: blk.6.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 64: blk.7.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 65: blk.7.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 66: blk.7.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 67: blk.7.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 68: blk.7.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 69: blk.7.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 70: blk.7.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 71: blk.7.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 72: blk.7.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 73: blk.8.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 74: blk.8.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 75: blk.8.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 76: blk.8.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 77: blk.8.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 78: blk.8.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 79: blk.8.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 80: blk.8.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 81: blk.8.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 82: blk.9.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 83: blk.9.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 84: blk.9.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 85: blk.9.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 86: blk.9.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 87: blk.9.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 88: blk.9.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 89: blk.9.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 90: blk.9.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 91: blk.10.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 92: blk.10.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 93: blk.10.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 94: blk.10.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 95: blk.10.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 96: blk.10.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 97: blk.10.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 98: blk.10.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 99: blk.10.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 100: blk.11.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 101: blk.11.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 102: blk.11.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 103: blk.11.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 104: blk.11.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 105: blk.11.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 106: blk.11.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 107: blk.11.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 108: blk.11.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 109: blk.12.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 110: blk.12.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 111: blk.12.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 112: blk.12.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 113: blk.12.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 114: blk.12.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 115: blk.12.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 116: blk.12.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 117: blk.12.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 118: blk.13.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 119: blk.13.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 120: blk.13.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 121: blk.13.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 122: blk.13.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 123: blk.13.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 124: blk.13.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 125: blk.13.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 126: blk.13.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 127: blk.14.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 128: blk.14.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 129: blk.14.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 130: blk.14.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 131: blk.14.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 132: blk.14.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 133: blk.14.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 134: blk.14.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 135: blk.14.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 136: blk.15.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 137: blk.15.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 138: blk.15.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 139: blk.15.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 140: blk.15.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 141: blk.15.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 142: blk.15.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 143: blk.15.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 144: blk.15.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 145: blk.16.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 146: blk.16.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 147: blk.16.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 148: blk.16.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 149: blk.16.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 150: blk.16.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 151: blk.16.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 152: blk.16.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 153: blk.16.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 154: blk.17.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 155: blk.17.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 156: blk.17.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 157: blk.17.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 158: blk.17.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 159: blk.17.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 160: blk.17.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 161: blk.17.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 162: blk.17.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 163: blk.18.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 164: blk.18.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 165: blk.18.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 166: blk.18.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 167: blk.18.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 168: blk.18.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 169: blk.18.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 170: blk.18.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 171: blk.18.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 172: blk.19.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 173: blk.19.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 174: blk.19.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 175: blk.19.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 176: blk.19.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 177: blk.19.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 178: blk.19.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 179: blk.19.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 180: blk.19.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 181: blk.20.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 182: blk.20.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 183: blk.20.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 184: blk.20.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 185: blk.20.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 186: blk.20.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 187: blk.20.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 188: blk.20.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 189: blk.20.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 190: blk.21.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 191: blk.21.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 192: blk.21.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 193: blk.21.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 194: blk.21.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 195: blk.21.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 196: blk.21.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 197: blk.21.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 198: blk.21.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 199: blk.22.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 200: blk.22.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 201: blk.22.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 202: blk.22.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 203: blk.22.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 204: blk.22.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 205: blk.22.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 206: blk.22.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 207: blk.22.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 208: blk.23.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 209: blk.23.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 210: blk.23.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 211: blk.23.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 212: blk.23.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 213: blk.23.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 214: blk.23.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 215: blk.23.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 216: blk.23.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 217: blk.24.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 218: blk.24.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 219: blk.24.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 220: blk.24.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 221: blk.24.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 222: blk.24.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 223: blk.24.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 224: blk.24.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 225: blk.24.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 226: blk.25.attn_q.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 227: blk.25.attn_k.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 228: blk.25.attn_v.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 229: blk.25.attn_output.weight q8_0 [ 3200, 3200, 1, 1 ] llama_model_loader: - tensor 230: blk.25.ffn_gate.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 231: blk.25.ffn_down.weight q8_0 [ 8640, 3200, 1, 1 ] llama_model_loader: - tensor 232: blk.25.ffn_up.weight q8_0 [ 3200, 8640, 1, 1 ] llama_model_loader: - tensor 233: blk.25.attn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 234: blk.25.ffn_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 235: output_norm.weight f32 [ 3200, 1, 1, 1 ] llama_model_loader: - tensor 236: output.weight q8_0 [ 3200, 32000, 1, 1 ] llama_model_loader: - kv 0: general.architecture str llama_model_loader: - kv 1: general.name str llama_model_loader: - kv 2: llama.context_length u32 llama_model_loader: - kv 3: llama.embedding_length u32 llama_model_loader: - kv 4: llama.block_count u32 llama_model_loader: - kv 5: llama.feed_forward_length u32 llama_model_loader: - kv 6: llama.rope.dimension_count u32 llama_model_loader: - kv 7: llama.attention.head_count u32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 llama_model_loader: - kv 10: general.file_type u32 llama_model_loader: - kv 11: tokenizer.ggml.model str llama_model_loader: - kv 12: tokenizer.ggml.tokens arr llama_model_loader: - kv 13: tokenizer.ggml.scores arr llama_model_loader: - kv 14: tokenizer.ggml.token_type arr llama_model_loader: - kv 15: tokenizer.ggml.bos_token_id u32 llama_model_loader: - kv 16: tokenizer.ggml.eos_token_id u32 llama_model_loader: - kv 17: tokenizer.ggml.padding_token_id u32 llama_model_loader: - kv 18: general.quantization_version u32 llama_model_loader: - type f32: 53 tensors llama_model_loader: - type q8_0: 184 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 2048 llm_load_print_meta: n_embd = 3200 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 32 llm_load_print_meta: n_layer = 26 llm_load_print_meta: n_rot = 100 llm_load_print_meta: n_gqa = 1 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-06 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: n_ff = 8640 llm_load_print_meta: freq_base_train = 10000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: model type = 3B llm_load_print_meta: model ftype = mostly Q8_0 llm_load_print_meta: model params = 3.43 B llm_load_print_meta: model size = 3.39 GiB (8.50 BPW) llm_load_print_meta: general.name = others llm_load_print_meta: BOS token = 1 '<s>' llm_load_print_meta: EOS token = 2 '</s>' llm_load_print_meta: UNK token = 0 '<unk>' llm_load_print_meta: PAD token = 0 '<unk>' llm_load_print_meta: LF token = 13 '<0x0A>' llm_load_tensors: ggml ctx size = 0.08 MB llm_load_tensors: mem required = 3472.53 MB ................................................................................................. llama_new_context_with_model: n_ctx = 512 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: kv self size = 162.50 MB llama_build_graph: non-view tensors processed: 602/602 ggml_metal_init: allocating ggml_metal_init: found device: Apple M2 ggml_metal_init: picking default device: Apple M2 ggml_metal_init: default.metallib not found, loading from source ggml_metal_init: loading '/Volumes/d/apps/llama.cpp/llama.cpp/ggml-metal.metal' ggml_metal_init: GPU name: Apple M2 ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008) ggml_metal_init: hasUnifiedMemory = true ggml_metal_init: recommendedMaxWorkingSetSize = 16384.02 MB ggml_metal_init: maxTransferRate = built-in GPU llama_new_context_with_model: compute buffer total size = 74.88 MB llama_new_context_with_model: max tensor size = 103.76 MB ggml_metal_add_buffer: allocated 'data ' buffer, size = 3473.17 MB, ( 3473.80 / 16384.02) ggml_metal_add_buffer: allocated 'kv ' buffer, size = 162.52 MB, ( 3636.31 / 16384.02) ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 68.77 MB, ( 3705.08 / 16384.02) main: init model print_params: n_vocab: 32000 print_params: n_ctx: 128 print_params: n_embd: 3200 print_params: n_ff: 8640 print_params: n_head: 32 print_params: n_head_kv: 32 print_params: n_layer: 26 print_params: norm_rms_eps : 0.000001 print_params: rope_freq_base : 10000.000000 print_params: rope_freq_scale : 1.000000 print_lora_params: n_rank_attention_norm : 1 print_lora_params: n_rank_wq : 4 print_lora_params: n_rank_wk : 4 print_lora_params: n_rank_wv : 4 print_lora_params: n_rank_wo : 4 print_lora_params: n_rank_ffn_norm : 1 print_lora_params: n_rank_w1 : 4 print_lora_params: n_rank_w2 : 4 print_lora_params: n_rank_w3 : 4 print_lora_params: n_rank_tok_embeddings : 4 print_lora_params: n_rank_norm : 1 print_lora_params: n_rank_output : 4 main: total train_iterations 0 main: seen train_samples 0 main: seen train_tokens 0 main: completed train_epochs 0 main: lora_size = 54798560 bytes (52.3 MB) main: opt_size = 81693904 bytes (77.9 MB) main: opt iter 0 main: input_size = 65538080 bytes (62.5 MB) main: compute_size = 5092979808 bytes (4857.0 MB) main: evaluation order = RIGHT_TO_LEFT main: tokenize training data tokenize_file: total number of samples: 26702 main: number of training tokens: 26830 main: number of unique tokens: 3320 main: train data seems to have changed. restarting shuffled epoch. main: begin training main: work_size = 512240 bytes (0.5 MB) train_opt_callback: iter= 0 sample=1/26702 sched=0.000000 loss=0.000000 |->
The text was updated successfully, but these errors were encountered:
I think currently only f16 and f32 base models are supported for GPU offloading #3762 (comment)
Sorry, something went wrong.
Oh I hope they bring support for quantized models, it seems f16 don't work still actually
This issue was closed because it has been inactive for 14 days since being marked as stale.
No branches or pull requests
Prerequisites
Release: b1471
Expected Behavior
When I pass the command, with --gpu-layers 1, finetune should offload to GPU, .
/Volumes/d/apps/llama.cpp/llama.cpp/finetune --model-base /Volumes/d/apps/aimodels/others/openllama-3b-v2/openllama-3b-v2.q8_0.gguf --gpu-layers 1 --checkpoint-in chk-ol3b-shakespeare-LATEST.gguf --checkpoint-out chk-ol3b-shakespeare-ITERATION.gguf --lora-out lora-ol3b-shakespeare-ITERATION.bin --train-data shakespeare.txt --save-every 10 --threads 4 --adam-iter 30 --batch 4 --ctx 64 --use-checkpointing
Current Behavior
No, GPU used. All cores of P-CPU maxed out. Temperatures soaring. Check asitop, output.

Environment and Context
Mac Mini, M2, 24GB Memory
Failure Information (for bugs)
Please help provide information about the failure / bug.
Steps to Reproduce
Failure Logs
The text was updated successfully, but these errors were encountered: