An error while loading 30b with alpaca.cpp.
An error while loading 30b with alpaca.cpp.
Maybe I do something wrong?
main: seed = 1679388768
llama_model_load: loading model from 'D:\alpaca\ggml-alpaca-30b-q4.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
llama_model_load: memory_size = 6240.00 MB, n_mem = 122880
llama_model_load: loading model part 1/4 from 'D:\alpaca\ggml-alpaca-30b-q4.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from 'D:\alpaca\ggml-alpaca-30b-q4.bin'
\alpaca>.\Release\chat.exe -m ggml-model-q4_0.bin -n 256 --repeat_penalty 1.0 --color -i -r "ROBOT:" -f -ins
main: seed = 1679403424
llama_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
llama_model_load: memory_size = 6240.00 MB, n_mem = 122880
llama_model_load: loading model part 1/4 from 'ggml-model-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from 'ggml-model-q4_0.bin'
Please see #3
https://huggingface.co/Pi3141/alpaca-30B-ggml/discussions/3
Didn't work for me. Tried to run in colab. Here is the result.
colab:
!mkdir 30b
!cd 30b && wget https://huggingface.co/Pi3141/alpaca-lora-30B-ggml/resolve/main/ggml-model-q4_0.bin
!git clone https://github.com/ItsPi3141/alpaca.cpp.git nalpaca
!cd nalpaca && make main
!cd nalpaca && ./main -m ../30b/ggml-model-q4_0.bin
result
main: seed = 1680018948
llama_model_load: loading model from '../30b/ggml-model-q4_0.bin' - please wait ...
llama_model_load: ggml ctx size = 25631.50 MB
llama_model_load: memory_size = 6240.00 MB, n_mem = 122880
llama_model_load: loading model part 1/1 from '../30b/ggml-model-q4_0.bin'
llama_model_load: ........................................^C
Please, can you help?
... what is remarkable this assembly works fine alpaca-lora-7B-ggml
From the results, it looks like you aborted it (^C). Did you use alpaca.cpp or llama.cpp?
I use alpaca.cpp
the following settings(30B) works fine for both 13B and 30B.
// determine number of model parts based on the dimension
const map<int, int> LLAMA_N_PARTS = {
//Model 30B
{ 4096, 1 },
{ 5120, 1 },
{ 6656, 1 },
{ 8192, 8 },
};
// default hparams
struct llama_hparams {
// Model 30B
int32_t n_vocab = 32000;
int32_t n_ctx = 512; // max lenght of the input prompt (default, work also 1000)
int32_t n_embd = 6656;
int32_t n_mult = 256;
int32_t n_head = 52;
int32_t n_layer = 60;
int32_t n_rot = 64;
int32_t f16 = 1;
};
The 30B is awesome.
o7