2200-llama-3.2-lora
This model is a fine-tuned version of meta-llama/Llama-3.2-1B on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.1813
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.8507 | 0.0157 | 50 | 0.7212 |
0.4871 | 0.0315 | 100 | 0.4490 |
0.4467 | 0.0472 | 150 | 0.4026 |
0.3792 | 0.0629 | 200 | 0.3743 |
0.3867 | 0.0786 | 250 | 0.3572 |
0.3624 | 0.0944 | 300 | 0.3449 |
0.3387 | 0.1101 | 350 | 0.3331 |
0.3419 | 0.1258 | 400 | 0.3276 |
0.347 | 0.1415 | 450 | 0.3180 |
0.3147 | 0.1573 | 500 | 0.3084 |
0.2945 | 0.1730 | 550 | 0.3003 |
0.3042 | 0.1887 | 600 | 0.2941 |
0.3022 | 0.2044 | 650 | 0.2888 |
0.2775 | 0.2202 | 700 | 0.2857 |
0.2914 | 0.2359 | 750 | 0.2895 |
0.2687 | 0.2516 | 800 | 0.2794 |
0.2891 | 0.2673 | 850 | 0.2716 |
0.26 | 0.2831 | 900 | 0.2659 |
0.2838 | 0.2988 | 950 | 0.2631 |
0.2639 | 0.3145 | 1000 | 0.2582 |
0.2865 | 0.3302 | 1050 | 0.2587 |
0.256 | 0.3460 | 1100 | 0.2524 |
0.2471 | 0.3617 | 1150 | 0.2481 |
0.2222 | 0.3774 | 1200 | 0.2483 |
0.2543 | 0.3931 | 1250 | 0.2414 |
0.2556 | 0.4089 | 1300 | 0.2381 |
0.2456 | 0.4246 | 1350 | 0.2359 |
0.2475 | 0.4403 | 1400 | 0.2317 |
0.2382 | 0.4560 | 1450 | 0.2310 |
0.2548 | 0.4718 | 1500 | 0.2283 |
0.2225 | 0.4875 | 1550 | 0.2269 |
0.2314 | 0.5032 | 1600 | 0.2214 |
0.2304 | 0.5189 | 1650 | 0.2205 |
0.2206 | 0.5347 | 1700 | 0.2174 |
0.2341 | 0.5504 | 1750 | 0.2156 |
0.2217 | 0.5661 | 1800 | 0.2138 |
0.2358 | 0.5819 | 1850 | 0.2137 |
0.2292 | 0.5976 | 1900 | 0.2087 |
0.2208 | 0.6133 | 1950 | 0.2063 |
0.2013 | 0.6290 | 2000 | 0.2058 |
0.2179 | 0.6448 | 2050 | 0.2040 |
0.2136 | 0.6605 | 2100 | 0.2017 |
0.2202 | 0.6762 | 2150 | 0.1990 |
0.2008 | 0.6919 | 2200 | 0.1972 |
0.1937 | 0.7077 | 2250 | 0.1963 |
0.2022 | 0.7234 | 2300 | 0.1962 |
0.2092 | 0.7391 | 2350 | 0.1962 |
0.2047 | 0.7548 | 2400 | 0.1937 |
0.2259 | 0.7706 | 2450 | 0.1924 |
0.1745 | 0.7863 | 2500 | 0.1907 |
0.2 | 0.8020 | 2550 | 0.1892 |
0.196 | 0.8177 | 2600 | 0.1893 |
0.187 | 0.8335 | 2650 | 0.1881 |
0.2171 | 0.8492 | 2700 | 0.1867 |
0.1857 | 0.8649 | 2750 | 0.1862 |
0.1995 | 0.8806 | 2800 | 0.1848 |
0.1901 | 0.8964 | 2850 | 0.1846 |
0.1878 | 0.9121 | 2900 | 0.1833 |
0.1913 | 0.9278 | 2950 | 0.1828 |
0.1878 | 0.9435 | 3000 | 0.1822 |
0.1899 | 0.9593 | 3050 | 0.1818 |
0.1925 | 0.9750 | 3100 | 0.1816 |
0.1752 | 0.9907 | 3150 | 0.1813 |
Framework versions
- PEFT 0.13.2
- Transformers 4.46.3
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- 34
Model tree for cs6220-ai-gradescope-grader/2200-llama-3.2-lora
Base model
meta-llama/Llama-3.2-1B