kenhktsui commited on
Commit
1d0a076
·
verified ·
1 Parent(s): 91f62b2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -23
README.md CHANGED
@@ -56,10 +56,11 @@ license: mit
56
  # Model Card for nano-phi-115M-v0.1
57
 
58
  Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
59
- Pre-trained with training 7B token from scratch, with application of quality filter to datasets resulting in 0.26B token.
60
- The control is [kenhktsui/nano-phi-115M-control-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-control-v0.1), where full dataset (0.6B) is used.
61
- Not much degradation in performance despite only using 42% of the data.
62
- It just took 2d 4h to train in Colab with a A100 40GB (~USD$ 100).
 
63
  It achieves quite competitive results in evaluation given its training token, and training data size.
64
  Yet, there are still large gaps (particularly in ARC, HellaSwag, MMLU and GSM8K) between nano-phi-115M-v0.1 and phi-2, where author will attempt to narrow down the gap in the future.
65
  No alignment has been done yet.
@@ -79,25 +80,25 @@ No alignment has been done yet.
79
 
80
  ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
81
 
82
- | Metric | kenhktsui/nano-phi-115M-v0.1|[kenhktsui/nano-phi-115M-control-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-control-v0.1)|[microsoft/phi-2](https://huggingface.co/microsoft/phi-2)|
83
- |-----------------------|---------------------------|---------------------------|---------------------------|
84
- | Model Para | 115M |115M |2.7B |
85
- | Dataset Size | 0.26B |0.6B |250B |
86
- | Training Token | 0.26B |0.6B |1.4T |
87
- | Context Length |1024 |1024 |2048|
88
- | Device |1xA100-40G|1xA100-40G |96xA100-80G|
89
- | Training Time |2d4h |2d4h |14d|
90
-
91
-
92
- | Metric | kenhktsui/nano-phi-115M-v0.1|[kenhktsui/nano-phi-115M-control-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-control-v0.1)|[microsoft/phi-2](https://huggingface.co/microsoft/phi-2) (Reproduced)|
93
- |-----------------------|---------------------------|---------------------------|---------------------------|
94
- | Avg. | 28.68 |28.75 |61.53 |
95
- | ARC (25-shot) | 21.93 |21.67 |61.52 |
96
- | HellaSwag (10-shot) | 27.87 |26.89 |75.13 |
97
- | MMLU (5-shot) | 25.30 |24.76 |58.23 |
98
- | TruthfulQA (0-shot) | 46.01 |47.69 |44.46 |
99
- | Winogrande (5-shot) | 50.99 |51.46 |74.51 |
100
- | GSM8K (5-shot) | 0.0 |0.0 |55.34 |
101
 
102
  Details:
103
 
 
56
  # Model Card for nano-phi-115M-v0.1
57
 
58
  Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
59
+ Pre-trained with training 7B token **from scratch**, with application of quality filter to datasets resulting in 0.26B token.
60
+ The control is [kenhktsui/nano-phi-115M-control-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-control-v0.1), where full dataset (0.6B) is used.
61
+ Not much degradation in performance despite only using **42%** of the data due to the effective quality filter.
62
+ In fact, upon inspection, the 6000 steps chkpt achieves similar performance as this model, signaling underlying **effective training due to high quality data**.
63
+ It just took 1d to train in Colab with a A100 40GB (**<USD$ 50**).
64
  It achieves quite competitive results in evaluation given its training token, and training data size.
65
  Yet, there are still large gaps (particularly in ARC, HellaSwag, MMLU and GSM8K) between nano-phi-115M-v0.1 and phi-2, where author will attempt to narrow down the gap in the future.
66
  No alignment has been done yet.
 
80
 
81
  ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
82
 
83
+ | Metric | kenhktsui/nano-phi-115M-v0.1|kenhktsui/nano-phi-115M-v0.1 (6000 steps)|[kenhktsui/nano-phi-115M-control-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-control-v0.1)|[microsoft/phi-2](https://huggingface.co/microsoft/phi-2)|
84
+ |-----------------------|---------------------------|---------------------------|---------------------------|---------------------------|
85
+ | Model Para | 115M |115M |115M |2.7B |
86
+ | Dataset Size | 0.26B |0.26B |0.6B |250B |
87
+ | Training Token | 7B |3B|7B |1.4T |
88
+ | Context Length |1024 |1024|1024 |2048|
89
+ | Device |1xA100-40G|1xA100-40G|1xA100-40G |96xA100-80G|
90
+ | Training Time |2d4h |1d|2d4h |14d|
91
+
92
+
93
+ | Metric | kenhktsui/nano-phi-115M-v0.1|kenhktsui/nano-phi-115M-v0.1 (6000 steps)|[kenhktsui/nano-phi-115M-control-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-control-v0.1)|[microsoft/phi-2](https://huggingface.co/microsoft/phi-2) (Reproduced)|
94
+ |-----------------------|---------------------------|---------------------------|---------------------------|---------------------------|
95
+ | Avg. | 28.68 |29.03 | 28.75 |61.53 |
96
+ | ARC (25-shot) | 21.93 |22.27 | 21.67 |61.52 |
97
+ | HellaSwag (10-shot) | 27.87 |26.88 | 26.89 |75.13 |
98
+ | MMLU (5-shot) | 25.30 |25.01 | 24.76 |58.23 |
99
+ | TruthfulQA (0-shot) | 46.01 |48.03 | 47.69 |44.46 |
100
+ | Winogrande (5-shot) | 50.99 |52.01 | 51.46 |74.51 |
101
+ | GSM8K (5-shot) | 0.0 |0.0 | 0.0 |55.34 |
102
 
103
  Details:
104