kenhktsui commited on
Commit
90cd4a9
·
verified ·
1 Parent(s): 68cbb10

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -52,7 +52,7 @@ pipeline_tag: text-generation
52
  ---
53
 
54
 
55
- # Model Card for nano-phi-v0.1
56
 
57
  Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
58
  Pre-trained with training 7B token from scratch, with a high quality dataset of 0.6B token.
@@ -60,6 +60,19 @@ It just took 2d 4h to train in Colab with a A100 40GB (~USD$ 100).
60
  It achieves quite competitive results in evaluation given its training token, and training data size.
61
  No alignment has been done yet.
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
64
 
65
  | Metric | Value |
 
52
  ---
53
 
54
 
55
+ # Model Card for nano-phi-115M-v0.1
56
 
57
  Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
58
  Pre-trained with training 7B token from scratch, with a high quality dataset of 0.6B token.
 
60
  It achieves quite competitive results in evaluation given its training token, and training data size.
61
  No alignment has been done yet.
62
 
63
+ ## Some metrics
64
+ - model
65
+ - hidden_size: 768
66
+ - num_key_value_heads: 8 (grouped query attention)
67
+ - num_attention_heads: 24
68
+ - num_hidden_layers: 6
69
+ - context length: 1024
70
+ - total params: 115M
71
+ - training:
72
+ - global steps: 14,000
73
+
74
+
75
+
76
  ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
77
 
78
  | Metric | Value |