yasmineee commited on
Commit
dec0cc5
·
verified ·
1 Parent(s): aee8c85

finetune-NLLB-600M-on-opus100-Ar2En-with-Qlora

Browse files
README.md CHANGED
@@ -2,6 +2,9 @@
2
  base_model: facebook/nllb-200-distilled-600M
3
  library_name: peft
4
  license: cc-by-nc-4.0
 
 
 
5
  tags:
6
  - generated_from_trainer
7
  model-index:
@@ -12,11 +15,15 @@ model-index:
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/FinalProject_/NLLB_2/runs/4zuxh06b)
16
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/FinalProject_/NLLB_2/runs/4zuxh06b)
17
  # NLLB_QLoRA
18
 
19
  This model is a fine-tuned version of [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) on an unknown dataset.
 
 
 
 
 
20
 
21
  ## Model description
22
 
@@ -36,13 +43,24 @@ More information needed
36
 
37
  The following hyperparameters were used during training:
38
  - learning_rate: 2e-05
39
- - train_batch_size: 1
40
- - eval_batch_size: 1
41
  - seed: 42
 
 
42
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
43
  - lr_scheduler_type: linear
44
  - num_epochs: 3
45
 
 
 
 
 
 
 
 
 
 
46
  ### Framework versions
47
 
48
  - PEFT 0.12.0
 
2
  base_model: facebook/nllb-200-distilled-600M
3
  library_name: peft
4
  license: cc-by-nc-4.0
5
+ metrics:
6
+ - bleu
7
+ - rouge
8
  tags:
9
  - generated_from_trainer
10
  model-index:
 
15
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/FinalProject_/NLLB/runs/li2er79u)
 
19
  # NLLB_QLoRA
20
 
21
  This model is a fine-tuned version of [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M) on an unknown dataset.
22
+ It achieves the following results on the evaluation set:
23
+ - Loss: 1.3340
24
+ - Bleu: 31.5945
25
+ - Rouge: 0.5906
26
+ - Gen Len: 17.338
27
 
28
  ## Model description
29
 
 
43
 
44
  The following hyperparameters were used during training:
45
  - learning_rate: 2e-05
46
+ - train_batch_size: 2
47
+ - eval_batch_size: 2
48
  - seed: 42
49
+ - gradient_accumulation_steps: 4
50
+ - total_train_batch_size: 8
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: linear
53
  - num_epochs: 3
54
 
55
+ ### Training results
56
+
57
+ | Training Loss | Epoch | Step | Validation Loss | Bleu | Rouge | Gen Len |
58
+ |:-------------:|:-----:|:----:|:---------------:|:-------:|:------:|:-------:|
59
+ | 2.858 | 1.0 | 875 | 1.4023 | 30.5493 | 0.5771 | 17.3705 |
60
+ | 1.4649 | 2.0 | 1750 | 1.3447 | 31.343 | 0.5886 | 17.284 |
61
+ | 1.4247 | 3.0 | 2625 | 1.3340 | 31.5945 | 0.5906 | 17.338 |
62
+
63
+
64
  ### Framework versions
65
 
66
  - PEFT 0.12.0
adapter_config.json CHANGED
@@ -20,8 +20,8 @@
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
23
- "q_proj",
24
- "v_proj"
25
  ],
26
  "task_type": "SEQ_2_SEQ_LM",
27
  "use_dora": false,
 
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
23
+ "v_proj",
24
+ "q_proj"
25
  ],
26
  "task_type": "SEQ_2_SEQ_LM",
27
  "use_dora": false,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a26751dbfa8d885560b15be1cff4a1f08656bb2167688a39abf0562d9060ddd3
3
  size 4738744
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:011351902334d673e7fe8b147f7d8ae89120a99995e71569b097f7019cdbb34a
3
  size 4738744
tokenizer_config.json CHANGED
@@ -1869,7 +1869,6 @@
1869
  },
1870
  "eos_token": "</s>",
1871
  "legacy_behaviour": false,
1872
- "load_in_8bit": true,
1873
  "mask_token": "<mask>",
1874
  "model_max_length": 1024,
1875
  "pad_token": "<pad>",
 
1869
  },
1870
  "eos_token": "</s>",
1871
  "legacy_behaviour": false,
 
1872
  "mask_token": "<mask>",
1873
  "model_max_length": 1024,
1874
  "pad_token": "<pad>",
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e790f46e5748df50ab36487d940ffc8dfc404426e5282d687489385e0967fb19
3
  size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b94b51c0adeaef8dd9aa1b43760a5a2c72e0b5115b9a07da9847091ef443583a
3
  size 5304