使用 llama.cpp 将 HuggingFace 模型转为 GGUF 格式
llama.cpp Ollama AI About 97,265 wordsclone git repo
git clone https://github.com/ggerganov/llama.cpp.git
安装 Python 依赖
进入到llama.cpp
文件夹
pip install -r requirements.txt
convert_hf_to_gguf
执行convert_hf_to_gguf.py
转换脚本,参数是模型的文件夹。
python llama.cpp/convert_hf_to_gguf.py PULSE-7bv5
输出
❯ python llama.cpp/convert_hf_to_gguf.py PULSE-7bv5
INFO:hf-to-gguf:Loading model: PULSE-7bv5
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'pytorch_model.bin.index.json'
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00001-of-00002.bin'
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {4096, 250880}
INFO:hf-to-gguf:token_embd_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:token_embd_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.0.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.0.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.0.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.0.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.0.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.1.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.1.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.1.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.1.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.1.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.2.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.2.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.2.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.2.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.2.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.3.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.3.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.3.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.3.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.3.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.3.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.3.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.4.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.4.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.4.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.4.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.4.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.5.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.5.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.5.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.5.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.5.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.6.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.6.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.6.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.6.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.6.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.7.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.7.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.7.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.7.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.7.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.8.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.8.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.8.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.8.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.8.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.8.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.8.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.9.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.9.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.9.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.9.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.9.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.10.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.10.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.10.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.10.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.10.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.11.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.11.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.11.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.11.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.11.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.12.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.12.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.12.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.12.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.12.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.12.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.12.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.13.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.13.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.13.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.13.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.13.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.14.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.14.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.14.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.14.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.14.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.14.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.14.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.15.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.15.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.15.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.15.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.15.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.16.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.16.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.16.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.16.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.16.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.16.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.16.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.17.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.17.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.17.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.17.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.17.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.17.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.17.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.18.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.18.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.18.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.18.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.18.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.18.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.18.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.19.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.19.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.19.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.19.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.19.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00002-of-00002.bin'
INFO:hf-to-gguf:blk.19.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.19.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.20.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.20.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.20.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.20.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.20.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.20.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.20.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.20.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.20.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.21.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.21.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.21.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.21.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.21.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.21.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.21.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.22.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.22.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.22.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.22.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.22.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.22.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.22.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.23.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.23.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.23.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.23.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.23.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.23.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.23.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.24.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.24.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.24.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.24.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.24.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.24.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.24.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.25.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.25.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.25.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.25.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.25.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.25.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.25.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.25.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.25.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.26.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.26.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.26.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.26.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.26.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.26.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.26.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.27.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.27.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.27.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.27.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.27.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.27.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.27.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.28.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.28.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.28.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.28.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.28.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.28.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.28.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.28.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.attn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.attn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:re-format attention.linear_qkv.weight
INFO:hf-to-gguf:blk.29.attn_qkv.weight, torch.bfloat16 --> F16, shape = {4096, 12288}
INFO:hf-to-gguf:re-format attention.linear_qkv.bias
INFO:hf-to-gguf:blk.29.attn_qkv.bias, torch.bfloat16 --> F32, shape = {12288}
INFO:hf-to-gguf:blk.29.attn_output.weight, torch.bfloat16 --> F16, shape = {4096, 4096}
INFO:hf-to-gguf:blk.29.attn_output.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.ffn_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.ffn_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:blk.29.ffn_up.weight, torch.bfloat16 --> F16, shape = {4096, 16384}
INFO:hf-to-gguf:blk.29.ffn_up.bias, torch.bfloat16 --> F32, shape = {16384}
INFO:hf-to-gguf:blk.29.ffn_down.weight, torch.bfloat16 --> F16, shape = {16384, 4096}
INFO:hf-to-gguf:blk.29.ffn_down.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:output_norm.bias, torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> F16, shape = {4096, 250880}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:Set model tokenizer
INFO:gguf.vocab:Adding 250434 merge(s).
INFO:gguf.vocab:Setting special token type bos to 1
INFO:gguf.vocab:Setting special token type eos to 2
INFO:gguf.vocab:Setting special token type unk to 0
INFO:gguf.vocab:Setting special token type pad to 3
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:PULSE-7bv5/bloomz-7.1B-mt-F16.gguf: n_tensors = 366, total_size = 16.2G
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Writing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16.2G/16.2G [00:45<00:00, 354Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to PULSE-7bv5/bloomz-7.1B-mt-F16.gguf
量化分析
分析转换后的gguf
。
llama-quantize-stats -m ~/ollama/PULSE-7bv5/bloomz-7.1B-mt-F16.gguf
输出
❯ llama-quantize-stats -m ~/ollama/PULSE-7bv5/bloomz-7.1B-mt-F16.gguf
main: build = 3896 (63747437)
main: built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0
Loading model
llama_model_loader: loaded meta data with 28 key-value pairs and 366 tensors from /Users/fendoudebb/ollama/PULSE-7bv5/bloomz-7.1B-mt-F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = bloom
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Bloomz 7b1 Mt
llama_model_loader: - kv 3: general.organization str = Bigscience
llama_model_loader: - kv 4: general.finetune str = mt
llama_model_loader: - kv 5: general.basename str = bloomz
llama_model_loader: - kv 6: general.size_label str = 7.1B
llama_model_loader: - kv 7: general.license str = agpl-3.0
llama_model_loader: - kv 8: general.tags arr[str,2] = ["PULSE", "llm"]
llama_model_loader: - kv 9: general.languages arr[str,1] = ["zh"]
llama_model_loader: - kv 10: bloom.context_length u32 = 2048
llama_model_loader: - kv 11: bloom.embedding_length u32 = 4096
llama_model_loader: - kv 12: bloom.feed_forward_length u32 = 16384
llama_model_loader: - kv 13: bloom.block_count u32 = 30
llama_model_loader: - kv 14: bloom.attention.head_count u32 = 32
llama_model_loader: - kv 15: bloom.attention.head_count_kv u32 = 32
llama_model_loader: - kv 16: bloom.attention.layer_norm_epsilon f32 = 0.000010
llama_model_loader: - kv 17: general.file_type u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 19: tokenizer.ggml.pre str = bloom
llama_model_loader: - kv 20: tokenizer.ggml.tokens arr[str,250880] = ["<unk>", "<s>", "</s>", "<pad>", "!"...
llama_model_loader: - kv 21: tokenizer.ggml.token_type arr[i32,250880] = [3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 22: tokenizer.ggml.merges arr[str,250434] = ["à ¤", "à ¦", "Ġ à", "à ¥", ...
llama_model_loader: - kv 23: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 24: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 25: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 26: tokenizer.ggml.padding_token_id u32 = 3
llama_model_loader: - kv 27: general.quantization_version u32 = 2
llama_model_loader: - type f32: 244 tensors
llama_model_loader: - type f16: 122 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 6
llm_load_vocab: token to piece cache size = 2.0858 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = bloom
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 250880
llm_load_print_meta: n_merges = 250434
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_layer = 30
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 4096
llm_load_print_meta: n_embd_v_gqa = 4096
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 0.0e+00
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 8.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 16384
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = -1
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 2048
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = F16
llm_load_print_meta: model params = 8.10 B
llm_load_print_meta: model size = 15.08 GiB (16.00 BPW)
llm_load_print_meta: general.name = Bloomz 7b1 Mt
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 3 '<pad>'
llm_load_print_meta: LF token = 130 'Ä'
llm_load_print_meta: EOG token = 2 '</s>'
llm_load_print_meta: max token length = 900
llm_load_tensors: ggml ctx size = 0.32 MiB
ggml_backend_metal_log_allocated_size: allocated buffer, size = 13486.17 MiB, (13486.23 / 21845.34)
llm_load_tensors: offloading 30 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 31/31 layers to GPU
llm_load_tensors: Metal buffer size = 13486.17 MiB
llm_load_tensors: CPU buffer size = 1960.00 MiB
.............................................................................
llama_new_context_with_model: n_ctx = 256
llama_new_context_with_model: n_batch = 256
llama_new_context_with_model: n_ubatch = 256
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: using embedded metal library
ggml_metal_init: GPU name: Apple M1 Max
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)
ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)
ggml_metal_init: simdgroup reduction support = true
ggml_metal_init: simdgroup matrix mul. support = true
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 22906.50 MB
llama_kv_cache_init: Metal KV buffer size = 120.00 MiB
llama_new_context_with_model: KV self size = 120.00 MiB, K (f16): 60.00 MiB, V (f16): 60.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.96 MiB
llama_new_context_with_model: Metal compute buffer size = 253.00 MiB
llama_new_context_with_model: CPU compute buffer size = 4.25 MiB
llama_new_context_with_model: graph nodes = 1120
llama_new_context_with_model: graph splits = 2
note: source model is f16
testing 366 layers with max size 1027604480
f16 : rmse 0.00000000, maxerr 0.00000003, 95pct<0.0002, median<0.0002
q4_0 : rmse 0.00138636, maxerr 1.10937500, 95pct<0.0026, median<0.0010
q4_1 : rmse 0.00121247, maxerr 0.54296875, 95pct<0.0024, median<0.0010
q5_0 : rmse 0.00069026, maxerr 0.48437500, 95pct<0.0014, median<0.0006
q5_1 : rmse 0.00058682, maxerr 0.28515625, 95pct<0.0012, median<0.0006
q8_0 : rmse 0.00008670, maxerr 0.06542969, 95pct<0.0002, median<0.0002
q2_K : rmse 0.00458848, maxerr 2.14111328, 95pct<0.0092, median<0.0028
q3_K : rmse 0.00237759, maxerr 1.82031250, 95pct<0.0046, median<0.0016
q4_K : rmse 0.00111348, maxerr 0.54852295, 95pct<0.0022, median<0.0008
q5_K : rmse 0.00056436, maxerr 0.27148438, 95pct<0.0012, median<0.0004
q6_K : rmse 0.00028501, maxerr 0.23364258, 95pct<0.0006, median<0.0002
iq3_xxs : rmse 0.00342342, maxerr 5.26354980, 95pct<0.0070, median<0.0018
iq4_nl : rmse 0.00127413, maxerr 0.86791992, 95pct<0.0024, median<0.0010
iq3_s : rmse 0.00274513, maxerr 2.35723877, 95pct<0.0054, median<0.0014
iq2_s : rmse 0.00426713, maxerr 5.88697815, 95pct<0.0082, median<0.0026
iq4_xs : rmse 0.00122807, maxerr 0.88183594, 95pct<0.0024, median<0.0010
bf16 : rmse 0.00000000, maxerr 0.00000000, 95pct<0.0002, median<0.0002
tq1_0 : rmse 0.01272682, maxerr 8.12500000, 95pct<0.0240, median<0.0084
tq2_0 : rmse 0.01272682, maxerr 8.12500000, 95pct<0.0240, median<0.0084
ggml_metal_free: deallocating
main: total time = 1215216.45 ms
量化模型
转换后的gguf
模型,还是F16
精度的,对于在本地电脑部署,还是比较占用资源的。
将模型量化为int4
。
llama-quantize ~/ollama/PULSE-7bv5/bloomz-7.1B-mt-F16.gguf ~/ollama/PULSE-7bv5/PULSE-7bv5-Q4_0.gguf Q4_0
输出
❯ llama-quantize ~/ollama/PULSE-7bv5/bloomz-7.1B-mt-F16.gguf ~/ollama/PULSE-7bv5/PULSE-7bv5-Q4_0.gguf Q4_0
main: build = 3896 (63747437)
main: built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.6.0
main: quantizing '/Users/fendoudebb/ollama/PULSE-7bv5/bloomz-7.1B-mt-F16.gguf' to '/Users/fendoudebb/ollama/PULSE-7bv5/PULSE-7bv5-Q4_0.gguf' as Q4_0
llama_model_loader: loaded meta data with 28 key-value pairs and 366 tensors from /Users/fendoudebb/ollama/PULSE-7bv5/bloomz-7.1B-mt-F16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = bloom
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Bloomz 7b1 Mt
llama_model_loader: - kv 3: general.organization str = Bigscience
llama_model_loader: - kv 4: general.finetune str = mt
llama_model_loader: - kv 5: general.basename str = bloomz
llama_model_loader: - kv 6: general.size_label str = 7.1B
llama_model_loader: - kv 7: general.license str = agpl-3.0
llama_model_loader: - kv 8: general.tags arr[str,2] = ["PULSE", "llm"]
llama_model_loader: - kv 9: general.languages arr[str,1] = ["zh"]
llama_model_loader: - kv 10: bloom.context_length u32 = 2048
llama_model_loader: - kv 11: bloom.embedding_length u32 = 4096
llama_model_loader: - kv 12: bloom.feed_forward_length u32 = 16384
llama_model_loader: - kv 13: bloom.block_count u32 = 30
llama_model_loader: - kv 14: bloom.attention.head_count u32 = 32
llama_model_loader: - kv 15: bloom.attention.head_count_kv u32 = 32
llama_model_loader: - kv 16: bloom.attention.layer_norm_epsilon f32 = 0.000010
llama_model_loader: - kv 17: general.file_type u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 19: tokenizer.ggml.pre str = bloom
llama_model_loader: - kv 20: tokenizer.ggml.tokens arr[str,250880] = ["<unk>", "<s>", "</s>", "<pad>", "!"...
llama_model_loader: - kv 21: tokenizer.ggml.token_type arr[i32,250880] = [3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 22: tokenizer.ggml.merges arr[str,250434] = ["à ¤", "à ¦", "Ġ à", "à ¥", ...
llama_model_loader: - kv 23: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 24: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 25: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 26: tokenizer.ggml.padding_token_id u32 = 3
llama_model_loader: - kv 27: general.quantization_version u32 = 2
llama_model_loader: - type f32: 244 tensors
llama_model_loader: - type f16: 122 tensors
[ 1/ 366] token_embd.weight - [ 4096, 250880, 1, 1], type = f16, converting to q4_0 .. size = 1960.00 MiB -> 551.25 MiB
[ 2/ 366] token_embd_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 3/ 366] token_embd_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 4/ 366] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 5/ 366] blk.0.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 6/ 366] blk.0.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 7/ 366] blk.0.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 8/ 366] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 9/ 366] blk.0.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 10/ 366] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 11/ 366] blk.0.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 12/ 366] blk.0.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 13/ 366] blk.0.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 14/ 366] blk.0.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 15/ 366] blk.0.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 16/ 366] blk.1.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 17/ 366] blk.1.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 18/ 366] blk.1.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 19/ 366] blk.1.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 20/ 366] blk.1.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 21/ 366] blk.1.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 22/ 366] blk.1.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 23/ 366] blk.1.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 24/ 366] blk.1.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 25/ 366] blk.1.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 26/ 366] blk.1.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 27/ 366] blk.1.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 28/ 366] blk.2.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 29/ 366] blk.2.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 30/ 366] blk.2.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 31/ 366] blk.2.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 32/ 366] blk.2.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 33/ 366] blk.2.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 34/ 366] blk.2.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 35/ 366] blk.2.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 36/ 366] blk.2.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 37/ 366] blk.2.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 38/ 366] blk.2.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 39/ 366] blk.2.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 40/ 366] blk.3.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 41/ 366] blk.3.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 42/ 366] blk.3.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 43/ 366] blk.3.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 44/ 366] blk.3.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 45/ 366] blk.3.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 46/ 366] blk.3.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 47/ 366] blk.3.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 48/ 366] blk.3.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 49/ 366] blk.3.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 50/ 366] blk.3.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 51/ 366] blk.3.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 52/ 366] blk.4.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 53/ 366] blk.4.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 54/ 366] blk.4.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 55/ 366] blk.4.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 56/ 366] blk.4.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 57/ 366] blk.4.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 58/ 366] blk.4.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 59/ 366] blk.4.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 60/ 366] blk.4.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 61/ 366] blk.4.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 62/ 366] blk.4.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 63/ 366] blk.4.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 64/ 366] blk.5.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 65/ 366] blk.5.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 66/ 366] blk.5.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 67/ 366] blk.5.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 68/ 366] blk.5.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 69/ 366] blk.5.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 70/ 366] blk.5.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 71/ 366] blk.5.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 72/ 366] blk.5.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 73/ 366] blk.5.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 74/ 366] blk.5.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 75/ 366] blk.5.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 76/ 366] blk.6.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 77/ 366] blk.6.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 78/ 366] blk.6.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 79/ 366] blk.6.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 80/ 366] blk.6.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 81/ 366] blk.6.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 82/ 366] blk.6.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 83/ 366] blk.6.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 84/ 366] blk.6.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 85/ 366] blk.6.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 86/ 366] blk.6.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 87/ 366] blk.6.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 88/ 366] blk.7.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 89/ 366] blk.7.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 90/ 366] blk.7.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 91/ 366] blk.7.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 92/ 366] blk.7.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 93/ 366] blk.7.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 94/ 366] blk.7.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 95/ 366] blk.7.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 96/ 366] blk.7.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 97/ 366] blk.7.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 98/ 366] blk.7.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 99/ 366] blk.7.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 100/ 366] blk.8.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 101/ 366] blk.8.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 102/ 366] blk.8.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 103/ 366] blk.8.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 104/ 366] blk.8.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 105/ 366] blk.8.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 106/ 366] blk.8.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 107/ 366] blk.8.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 108/ 366] blk.8.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 109/ 366] blk.8.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 110/ 366] blk.8.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 111/ 366] blk.8.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 112/ 366] blk.9.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 113/ 366] blk.9.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 114/ 366] blk.9.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 115/ 366] blk.9.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 116/ 366] blk.9.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 117/ 366] blk.9.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 118/ 366] blk.9.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 119/ 366] blk.9.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 120/ 366] blk.9.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 121/ 366] blk.9.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 122/ 366] blk.9.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 123/ 366] blk.9.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 124/ 366] blk.10.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 125/ 366] blk.10.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 126/ 366] blk.10.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 127/ 366] blk.10.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 128/ 366] blk.10.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 129/ 366] blk.10.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 130/ 366] blk.10.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 131/ 366] blk.10.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 132/ 366] blk.10.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 133/ 366] blk.10.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 134/ 366] blk.10.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 135/ 366] blk.10.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 136/ 366] blk.11.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 137/ 366] blk.11.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 138/ 366] blk.11.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 139/ 366] blk.11.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 140/ 366] blk.11.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 141/ 366] blk.11.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 142/ 366] blk.11.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 143/ 366] blk.11.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 144/ 366] blk.11.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 145/ 366] blk.11.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 146/ 366] blk.11.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 147/ 366] blk.11.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 148/ 366] blk.12.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 149/ 366] blk.12.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 150/ 366] blk.12.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 151/ 366] blk.12.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 152/ 366] blk.12.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 153/ 366] blk.12.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 154/ 366] blk.12.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 155/ 366] blk.12.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 156/ 366] blk.12.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 157/ 366] blk.12.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 158/ 366] blk.12.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 159/ 366] blk.12.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 160/ 366] blk.13.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 161/ 366] blk.13.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 162/ 366] blk.13.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 163/ 366] blk.13.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 164/ 366] blk.13.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 165/ 366] blk.13.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 166/ 366] blk.13.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 167/ 366] blk.13.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 168/ 366] blk.13.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 169/ 366] blk.13.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 170/ 366] blk.13.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 171/ 366] blk.13.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 172/ 366] blk.14.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 173/ 366] blk.14.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 174/ 366] blk.14.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 175/ 366] blk.14.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 176/ 366] blk.14.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 177/ 366] blk.14.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 178/ 366] blk.14.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 179/ 366] blk.14.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 180/ 366] blk.14.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 181/ 366] blk.14.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 182/ 366] blk.14.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 183/ 366] blk.14.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 184/ 366] blk.15.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 185/ 366] blk.15.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 186/ 366] blk.15.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 187/ 366] blk.15.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 188/ 366] blk.15.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 189/ 366] blk.15.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 190/ 366] blk.15.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 191/ 366] blk.15.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 192/ 366] blk.15.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 193/ 366] blk.15.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 194/ 366] blk.15.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 195/ 366] blk.15.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 196/ 366] blk.16.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 197/ 366] blk.16.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 198/ 366] blk.16.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 199/ 366] blk.16.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 200/ 366] blk.16.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 201/ 366] blk.16.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 202/ 366] blk.16.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 203/ 366] blk.16.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 204/ 366] blk.16.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 205/ 366] blk.16.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 206/ 366] blk.16.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 207/ 366] blk.16.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 208/ 366] blk.17.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 209/ 366] blk.17.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 210/ 366] blk.17.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 211/ 366] blk.17.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 212/ 366] blk.17.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 213/ 366] blk.17.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 214/ 366] blk.17.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 215/ 366] blk.17.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 216/ 366] blk.17.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 217/ 366] blk.17.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 218/ 366] blk.17.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 219/ 366] blk.17.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 220/ 366] blk.18.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 221/ 366] blk.18.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 222/ 366] blk.18.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 223/ 366] blk.18.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 224/ 366] blk.18.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 225/ 366] blk.18.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 226/ 366] blk.18.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 227/ 366] blk.18.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 228/ 366] blk.18.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 229/ 366] blk.18.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 230/ 366] blk.18.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 231/ 366] blk.18.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 232/ 366] blk.19.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 233/ 366] blk.19.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 234/ 366] blk.19.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 235/ 366] blk.19.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 236/ 366] blk.19.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 237/ 366] blk.19.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 238/ 366] blk.19.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 239/ 366] blk.19.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 240/ 366] blk.19.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 241/ 366] blk.19.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 242/ 366] blk.19.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 243/ 366] blk.19.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 244/ 366] blk.20.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 245/ 366] blk.20.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 246/ 366] blk.20.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 247/ 366] blk.20.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 248/ 366] blk.20.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 249/ 366] blk.20.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 250/ 366] blk.20.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 251/ 366] blk.20.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 252/ 366] blk.20.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 253/ 366] blk.20.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 254/ 366] blk.20.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 255/ 366] blk.20.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 256/ 366] blk.21.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 257/ 366] blk.21.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 258/ 366] blk.21.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 259/ 366] blk.21.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 260/ 366] blk.21.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 261/ 366] blk.21.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 262/ 366] blk.21.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 263/ 366] blk.21.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 264/ 366] blk.21.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 265/ 366] blk.21.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 266/ 366] blk.21.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 267/ 366] blk.21.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 268/ 366] blk.22.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 269/ 366] blk.22.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 270/ 366] blk.22.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 271/ 366] blk.22.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 272/ 366] blk.22.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 273/ 366] blk.22.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 274/ 366] blk.22.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 275/ 366] blk.22.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 276/ 366] blk.22.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 277/ 366] blk.22.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 278/ 366] blk.22.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 279/ 366] blk.22.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 280/ 366] blk.23.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 281/ 366] blk.23.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 282/ 366] blk.23.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 283/ 366] blk.23.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 284/ 366] blk.23.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 285/ 366] blk.23.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 286/ 366] blk.23.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 287/ 366] blk.23.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 288/ 366] blk.23.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 289/ 366] blk.23.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 290/ 366] blk.23.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 291/ 366] blk.23.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 292/ 366] blk.24.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 293/ 366] blk.24.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 294/ 366] blk.24.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 295/ 366] blk.24.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 296/ 366] blk.24.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 297/ 366] blk.24.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 298/ 366] blk.24.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 299/ 366] blk.24.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 300/ 366] blk.24.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 301/ 366] blk.24.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 302/ 366] blk.24.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 303/ 366] blk.24.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 304/ 366] blk.25.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 305/ 366] blk.25.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 306/ 366] blk.25.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 307/ 366] blk.25.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 308/ 366] blk.25.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 309/ 366] blk.25.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 310/ 366] blk.25.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 311/ 366] blk.25.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 312/ 366] blk.25.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 313/ 366] blk.25.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 314/ 366] blk.25.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 315/ 366] blk.25.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 316/ 366] blk.26.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 317/ 366] blk.26.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 318/ 366] blk.26.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 319/ 366] blk.26.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 320/ 366] blk.26.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 321/ 366] blk.26.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 322/ 366] blk.26.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 323/ 366] blk.26.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 324/ 366] blk.26.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 325/ 366] blk.26.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 326/ 366] blk.26.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 327/ 366] blk.26.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 328/ 366] blk.27.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 329/ 366] blk.27.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 330/ 366] blk.27.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 331/ 366] blk.27.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 332/ 366] blk.27.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 333/ 366] blk.27.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 334/ 366] blk.27.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 335/ 366] blk.27.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 336/ 366] blk.27.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 337/ 366] blk.27.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 338/ 366] blk.27.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 339/ 366] blk.27.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 340/ 366] blk.28.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 341/ 366] blk.28.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 342/ 366] blk.28.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 343/ 366] blk.28.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 344/ 366] blk.28.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 345/ 366] blk.28.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 346/ 366] blk.28.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 347/ 366] blk.28.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 348/ 366] blk.28.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 349/ 366] blk.28.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 350/ 366] blk.28.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 351/ 366] blk.28.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 352/ 366] blk.29.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 353/ 366] blk.29.attn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 354/ 366] blk.29.attn_qkv.weight - [ 4096, 12288, 1, 1], type = f16, converting to q4_0 .. size = 96.00 MiB -> 27.00 MiB
[ 355/ 366] blk.29.attn_qkv.bias - [12288, 1, 1, 1], type = f32, size = 0.047 MB
[ 356/ 366] blk.29.attn_output.weight - [ 4096, 4096, 1, 1], type = f16, converting to q4_0 .. size = 32.00 MiB -> 9.00 MiB
[ 357/ 366] blk.29.attn_output.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 358/ 366] blk.29.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 359/ 366] blk.29.ffn_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 360/ 366] blk.29.ffn_up.weight - [ 4096, 16384, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 361/ 366] blk.29.ffn_up.bias - [16384, 1, 1, 1], type = f32, size = 0.062 MB
[ 362/ 366] blk.29.ffn_down.weight - [16384, 4096, 1, 1], type = f16, converting to q4_0 .. size = 128.00 MiB -> 36.00 MiB
[ 363/ 366] blk.29.ffn_down.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 364/ 366] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 365/ 366] output_norm.bias - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
[ 366/ 366] output.weight - [ 4096, 250880, 1, 1], type = f16, converting to q6_K .. size = 1960.00 MiB -> 803.91 MiB
llama_model_quantize_internal: model size = 15446.16 MB
llama_model_quantize_internal: quant size = 4601.31 MB
main: quantize time = 15078.93 ms
main: total time = 15078.93 ms
导入到 Ollama
编写Modelfile
文件
FROM ./PULSE-7bv5-Q4_0.gguf
创建模型
ollama create PULSE-7bv5 -f Modelfile
输出
❯ ollama create PULSE-7bv5 -f Modelfile
transferring model data 100%
using existing layer sha256:511c2549629fbeaaa9dca58493dad4681383cca63ceea063803cb9a8d5f21269
creating new layer sha256:d98f74b937eb4bd3e01a558697c988c038ec91556779b9a938435dec3650e173
writing manifest
success
Views: 685 · Posted: 2024-10-10
————        END        ————
Give me a Star, Thanks:)
https://github.com/fendoudebb/LiteNote扫描下方二维码关注公众号和小程序↓↓↓
Loading...