keremturgutlu
commited on
Commit
•
7930d97
1
Parent(s):
6501670
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Transformers : https://github.com/huggingface/transformers/blob/47fedc1665f14a60260b1b8357e682669300093a/src/transformers/models/llama/modeling_llama.py
|
2 |
+
|
3 |
+
Training Script: https://github.com/AnswerDotAI/fsdp_qlora/blob/3f7c583e985ff35e37a7b7497a7d4fedb77df695/experiments/cla/train.sh
|
4 |
+
|
5 |
+
This model shares KV activations every 2 layers. For example, layer 1 uses layer 0 kv activations, layer 3 uses layer 2 kv activations, etc..
|