IntelLabs
/

shears-llama-13b-50-math-heuristic-adapter

PEFT

Safetensors

English

Model card Files Files and versions

jinjieyuan commited on May 31, 2024

Commit

79bbdac

verified ·

1 Parent(s): 5f82b06

Add instruction for the sparse base model

Browse files

Files changed (1) hide show

README.md +29 -3

README.md CHANGED Viewed

@@ -12,12 +12,38 @@ The heuristic adapter discovered from the [super-adapter](https://huggingface.co
 ### Information
 - **Model name:** shears-llama-13b-50-math-heuristic-adapter
-- **Base model:** [IntelLabs/shears-llama-13b-50-base](https://huggingface.co/IntelLabs/shears-llama-13b-50-base)
 - **Sparsity:** 50%
 - **Domain:** Math
 - **Subnetwork version:** Heuristic
 - **NNCF Configuration:** [nncf_shears_llama.json](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears/nncf_config/nncf_shears_llama.json)
 ### Adapter Configuration
 - **LoRA rank:** 32 (24 in the heuristic subnetwork)
@@ -62,14 +88,14 @@ def generate_prompt(instruction):
                     ### Response:
                     """
-base_model = AutoModelForCausalLM.from_pretrained("IntelLabs/shears-llama-13b-50-base")
 model = PeftModel.from_pretrained(base_model, "IntelLabs/shears-llama-13b-50-math-heuristic-adapter")
 model.eval()
 non_zero_params = sum([(param.data != 0).sum().item() for _, param in model.named_parameters()])
 print(f"Number of all non-zero parameters: {non_zero_params}")
-tokenizer = AutoTokenizer.from_pretrained("IntelLabs/shears-llama-13b-50-base")
 instruction = "Edgar eats 18 pretzels a day. If his brother eats 1/2 as many, how many does his brother eat in a week?"
 prompt = generate_prompt(instruction)

 ### Information
 - **Model name:** shears-llama-13b-50-math-heuristic-adapter
+- **Base model:** Sparsified [LLaMA-13B](https://huggingface.co/yahma/llama-13b-hf)
 - **Sparsity:** 50%
 - **Domain:** Math
 - **Subnetwork version:** Heuristic
 - **NNCF Configuration:** [nncf_shears_llama.json](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears/nncf_config/nncf_shears_llama.json)
+### Sparsified Base Model
+Shears employs a simple but effective pruning approach [Wanda](https://arxiv.org/abs/2306.11695) to sparsify the language model, serving as the base model.
+Clone the [Wanda](https://github.com/locuslab/wanda) repo:
+```bash
+git clone https://github.com/locuslab/wanda.git && cd wanda && git checkout 8e8fc87 && cd ..
+```
+The command for unstructured sparsifying LLaMA-13B with Wanda, to achieve unstructured 50% sparsity:
+```bash
+python wanda/main.py \
+    --model yahma/llama-13b-hf \
+    --prune_method wanda \
+    --sparsity_ratio 0.5 \
+    --sparsity_type unstructured \
+    --save wanda_out \
+    --save_model shears-llama-13b-50-base
+```
+- `--model`: The identifier for the model on the Hugging Face model hub or local path.
+- `--sparsity_ratio`: Specifies the percentage of weights to be pruned.
+- `--save_model`: Specifies the directory where the pruned language model will be stored.
+Refer to our [repo](https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning/tree/main/Shears#setup) for the environment information to run this command.
 ### Adapter Configuration
 - **LoRA rank:** 32 (24 in the heuristic subnetwork)
                     ### Response:
                     """
+base_model = AutoModelForCausalLM.from_pretrained("shears-llama-13b-50-base")
 model = PeftModel.from_pretrained(base_model, "IntelLabs/shears-llama-13b-50-math-heuristic-adapter")
 model.eval()
 non_zero_params = sum([(param.data != 0).sum().item() for _, param in model.named_parameters()])
 print(f"Number of all non-zero parameters: {non_zero_params}")
+tokenizer = AutoTokenizer.from_pretrained("shears-llama-13b-50-base")
 instruction = "Edgar eats 18 pretzels a day. If his brother eats 1/2 as many, how many does his brother eat in a week?"
 prompt = generate_prompt(instruction)