kenhktsui commited on
Commit
68cbb10
·
verified ·
1 Parent(s): ecd3f14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -2
README.md CHANGED
@@ -2,16 +2,78 @@
2
  library_name: transformers
3
  language:
4
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
 
 
7
  # Model Card for nano-phi-v0.1
8
 
9
  Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
10
  Pre-trained with training 7B token from scratch, with a high quality dataset of 0.6B token.
11
- It just took 2d 4h to train in Colab with a A100 40GB (~USD$ 100).
12
- It achieves quite competitive results in evaluation given its training token, and training data size.
13
  No alignment has been done yet.
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-pegfss6f:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
16
  | Task |Version| Metric |Value | |Stderr|
17
  |--------|------:|--------|-----:|---|-----:|
@@ -160,6 +222,12 @@ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/chec
160
  |winogrande| 0|acc |0.5099|± | 0.014|
161
 
162
 
 
 
 
 
 
 
163
 
164
  ## Model Details
165
 
 
2
  library_name: transformers
3
  language:
4
  - en
5
+ inference:
6
+ parameters:
7
+ max_new_tokens: 64
8
+ do_sample: true
9
+ temperature: 0.8
10
+ repetition_penalty: 1.15
11
+ no_repeat_ngram_size: 4
12
+ eta_cutoff: 0.0006
13
+ renormalize_logits: true
14
+ widget:
15
+ - text: My name is El Microondas the Wise, and
16
+ example_title: El Microondas
17
+ - text: Kennesaw State University is a public
18
+ example_title: Kennesaw State University
19
+ - text: >-
20
+ Bungie Studios is an American video game developer. They are most famous
21
+ for developing the award winning Halo series of video games. They also
22
+ made Destiny. The studio was founded
23
+ example_title: Bungie
24
+ - text: The Mona Lisa is a world-renowned painting created by
25
+ example_title: Mona Lisa
26
+ - text: >-
27
+ The Harry Potter series, written by J.K. Rowling, begins with the book
28
+ titled
29
+ example_title: Harry Potter Series
30
+ - text: >-
31
+ Question: I have cities, but no houses. I have mountains, but no trees. I
32
+ have water, but no fish. What am I?
33
+
34
+ Answer:
35
+ example_title: Riddle
36
+ - text: The process of photosynthesis involves the conversion of
37
+ example_title: Photosynthesis
38
+ - text: >-
39
+ Jane went to the store to buy some groceries. She picked up apples,
40
+ oranges, and a loaf of bread. When she got home, she realized she forgot
41
+ example_title: Story Continuation
42
+ - text: >-
43
+ Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
44
+ and another train leaves Station B at 10:00 AM and travels at 80 mph, when
45
+ will they meet if the distance between the stations is 300 miles?
46
+
47
+ To determine
48
+ example_title: Math Problem
49
+ - text: In the context of computer programming, an algorithm is
50
+ example_title: Algorithm Definition
51
+ pipeline_tag: text-generation
52
  ---
53
 
54
+
55
  # Model Card for nano-phi-v0.1
56
 
57
  Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
58
  Pre-trained with training 7B token from scratch, with a high quality dataset of 0.6B token.
59
+ It just took 2d 4h to train in Colab with a A100 40GB (~USD$ 100).
60
+ It achieves quite competitive results in evaluation given its training token, and training data size.
61
  No alignment has been done yet.
62
 
63
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
64
+
65
+ | Metric | Value |
66
+ |-----------------------|---------------------------|
67
+ | Avg. | 28.68 |
68
+ | ARC (25-shot) | 21.93 |
69
+ | HellaSwag (10-shot) | 27.87 |
70
+ | MMLU (5-shot) | 25.30 |
71
+ | TruthfulQA (0-shot) | 46.01 |
72
+ | Winogrande (5-shot) | 50.99 |
73
+ | GSM8K (5-shot) | 0.0 |
74
+
75
+ Details:
76
+
77
  hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-pegfss6f:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
78
  | Task |Version| Metric |Value | |Stderr|
79
  |--------|------:|--------|-----:|---|-----:|
 
222
  |winogrande| 0|acc |0.5099|± | 0.014|
223
 
224
 
225
+ hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-pegfss6f:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
226
+ | Task |Version|Metric|Value | |Stderr|
227
+ |----------|------:|------|-----:|---|-----:|
228
+ |gsm8k | 0|acc | 0.0|± | 0.0|
229
+
230
+
231
 
232
  ## Model Details
233