kenhktsui
/

nano-phi-115M-v0.1

@@ -1,11 +1,163 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
@@ -196,6 +348,4 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 ## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+language:
+- en
 ---
+# Model Card for nano-phi-v0.1
+Inspired by [Phi2](https://huggingface.co/microsoft/phi-2), and open source small language model attempts like [smol_llama-101M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-101M-GQA).
+Pre-trained with training 7B token from scratch, with a high quality dataset of 0.6B token.
+It just took 2d 4h to train in Colab with a A100 40GB (~USD$ 100).
+It achieves quite competitive results in evaluation given its training token, and training data size.
+No alignment has been done yet.
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-pegfss6f:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
+|  Task  |Version| Metric |Value |   |Stderr|
+|--------|------:|--------|-----:|---|-----:|
+|arc_easy|      0|acc     |0.4263|±  |0.0101|
+|        |       |acc_norm|0.3864|±  |0.0100|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-pegfss6f:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 25, batch_size: 16
+|    Task     |Version| Metric |Value |   |Stderr|
+|-------------|------:|--------|-----:|---|-----:|
+|arc_challenge|      0|acc     |0.1826|±  |0.0113|
+|             |       |acc_norm|0.2193|±  |0.0121|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-pegfss6f:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 10, batch_size: 16
+|  Task   |Version| Metric |Value |   |Stderr|
+|---------|------:|--------|-----:|---|-----:|
+|hellaswag|      0|acc     |0.2733|±  |0.0044|
+|         |       |acc_norm|0.2787|±  |0.0045|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-pegfss6f:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 16
+|    Task     |Version|Metric|Value |   |Stderr|
+|-------------|------:|------|-----:|---|-----:|
+|truthfulqa_mc|      1|mc1   |0.2521|±  |0.0152|
+|             |       |mc2   |0.4601|±  |0.0154|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-pegfss6f:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
+|                      Task                       |Version| Metric |Value |   |Stderr|
+|-------------------------------------------------|------:|--------|-----:|---|-----:|
+|hendrycksTest-abstract_algebra                   |      1|acc     |0.2300|±  |0.0423|
+|                                                 |       |acc_norm|0.2300|±  |0.0423|
+|hendrycksTest-anatomy                            |      1|acc     |0.3111|±  |0.0400|
+|                                                 |       |acc_norm|0.3111|±  |0.0400|
+|hendrycksTest-astronomy                          |      1|acc     |0.2171|±  |0.0336|
+|                                                 |       |acc_norm|0.2171|±  |0.0336|
+|hendrycksTest-business_ethics                    |      1|acc     |0.2500|±  |0.0435|
+|                                                 |       |acc_norm|0.2500|±  |0.0435|
+|hendrycksTest-clinical_knowledge                 |      1|acc     |0.2226|±  |0.0256|
+|                                                 |       |acc_norm|0.2226|±  |0.0256|
+|hendrycksTest-college_biology                    |      1|acc     |0.2292|±  |0.0351|
+|                                                 |       |acc_norm|0.2292|±  |0.0351|
+|hendrycksTest-college_chemistry                  |      1|acc     |0.1700|±  |0.0378|
+|                                                 |       |acc_norm|0.1700|±  |0.0378|
+|hendrycksTest-college_computer_science           |      1|acc     |0.2500|±  |0.0435|
+|                                                 |       |acc_norm|0.2500|±  |0.0435|
+|hendrycksTest-college_mathematics                |      1|acc     |0.2500|±  |0.0435|
+|                                                 |       |acc_norm|0.2500|±  |0.0435|
+|hendrycksTest-college_medicine                   |      1|acc     |0.2023|±  |0.0306|
+|                                                 |       |acc_norm|0.2023|±  |0.0306|
+|hendrycksTest-college_physics                    |      1|acc     |0.3235|±  |0.0466|
+|                                                 |       |acc_norm|0.3235|±  |0.0466|
+|hendrycksTest-computer_security                  |      1|acc     |0.2600|±  |0.0441|
+|                                                 |       |acc_norm|0.2600|±  |0.0441|
+|hendrycksTest-conceptual_physics                 |      1|acc     |0.2511|±  |0.0283|
+|                                                 |       |acc_norm|0.2511|±  |0.0283|
+|hendrycksTest-econometrics                       |      1|acc     |0.2281|±  |0.0395|
+|                                                 |       |acc_norm|0.2281|±  |0.0395|
+|hendrycksTest-electrical_engineering             |      1|acc     |0.2276|±  |0.0349|
+|                                                 |       |acc_norm|0.2276|±  |0.0349|
+|hendrycksTest-elementary_mathematics             |      1|acc     |0.2460|±  |0.0222|
+|                                                 |       |acc_norm|0.2460|±  |0.0222|
+|hendrycksTest-formal_logic                       |      1|acc     |0.1508|±  |0.0320|
+|                                                 |       |acc_norm|0.1508|±  |0.0320|
+|hendrycksTest-global_facts                       |      1|acc     |0.3000|±  |0.0461|
+|                                                 |       |acc_norm|0.3000|±  |0.0461|
+|hendrycksTest-high_school_biology                |      1|acc     |0.3387|±  |0.0269|
+|                                                 |       |acc_norm|0.3387|±  |0.0269|
+|hendrycksTest-high_school_chemistry              |      1|acc     |0.2906|±  |0.0319|
+|                                                 |       |acc_norm|0.2906|±  |0.0319|
+|hendrycksTest-high_school_computer_science       |      1|acc     |0.3100|±  |0.0465|
+|                                                 |       |acc_norm|0.3100|±  |0.0465|
+|hendrycksTest-high_school_european_history       |      1|acc     |0.2182|±  |0.0323|
+|                                                 |       |acc_norm|0.2182|±  |0.0323|
+|hendrycksTest-high_school_geography              |      1|acc     |0.3232|±  |0.0333|
+|                                                 |       |acc_norm|0.3232|±  |0.0333|
+|hendrycksTest-high_school_government_and_politics|      1|acc     |0.2021|±  |0.0290|
+|                                                 |       |acc_norm|0.2021|±  |0.0290|
+|hendrycksTest-high_school_macroeconomics         |      1|acc     |0.2487|±  |0.0219|
+|                                                 |       |acc_norm|0.2487|±  |0.0219|
+|hendrycksTest-high_school_mathematics            |      1|acc     |0.2741|±  |0.0272|
+|                                                 |       |acc_norm|0.2741|±  |0.0272|
+|hendrycksTest-high_school_microeconomics         |      1|acc     |0.3319|±  |0.0306|
+|                                                 |       |acc_norm|0.3319|±  |0.0306|
+|hendrycksTest-high_school_physics                |      1|acc     |0.3179|±  |0.0380|
+|                                                 |       |acc_norm|0.3179|±  |0.0380|
+|hendrycksTest-high_school_psychology             |      1|acc     |0.2477|±  |0.0185|
+|                                                 |       |acc_norm|0.2477|±  |0.0185|
+|hendrycksTest-high_school_statistics             |      1|acc     |0.4722|±  |0.0340|
+|                                                 |       |acc_norm|0.4722|±  |0.0340|
+|hendrycksTest-high_school_us_history             |      1|acc     |0.2696|±  |0.0311|
+|                                                 |       |acc_norm|0.2696|±  |0.0311|
+|hendrycksTest-high_school_world_history          |      1|acc     |0.2152|±  |0.0268|
+|                                                 |       |acc_norm|0.2152|±  |0.0268|
+|hendrycksTest-human_aging                        |      1|acc     |0.1973|±  |0.0267|
+|                                                 |       |acc_norm|0.1973|±  |0.0267|
+|hendrycksTest-human_sexuality                    |      1|acc     |0.2824|±  |0.0395|
+|                                                 |       |acc_norm|0.2824|±  |0.0395|
+|hendrycksTest-international_law                  |      1|acc     |0.2231|±  |0.0380|
+|                                                 |       |acc_norm|0.2231|±  |0.0380|
+|hendrycksTest-jurisprudence                      |      1|acc     |0.2222|±  |0.0402|
+|                                                 |       |acc_norm|0.2222|±  |0.0402|
+|hendrycksTest-logical_fallacies                  |      1|acc     |0.2822|±  |0.0354|
+|                                                 |       |acc_norm|0.2822|±  |0.0354|
+|hendrycksTest-machine_learning                   |      1|acc     |0.2768|±  |0.0425|
+|                                                 |       |acc_norm|0.2768|±  |0.0425|
+|hendrycksTest-management                         |      1|acc     |0.2039|±  |0.0399|
+|                                                 |       |acc_norm|0.2039|±  |0.0399|
+|hendrycksTest-marketing                          |      1|acc     |0.1966|±  |0.0260|
+|                                                 |       |acc_norm|0.1966|±  |0.0260|
+|hendrycksTest-medical_genetics                   |      1|acc     |0.2800|±  |0.0451|
+|                                                 |       |acc_norm|0.2800|±  |0.0451|
+|hendrycksTest-miscellaneous                      |      1|acc     |0.2746|±  |0.0160|
+|                                                 |       |acc_norm|0.2746|±  |0.0160|
+|hendrycksTest-moral_disputes                     |      1|acc     |0.2081|±  |0.0219|
+|                                                 |       |acc_norm|0.2081|±  |0.0219|
+|hendrycksTest-moral_scenarios                    |      1|acc     |0.2469|±  |0.0144|
+|                                                 |       |acc_norm|0.2469|±  |0.0144|
+|hendrycksTest-nutrition                          |      1|acc     |0.2647|±  |0.0253|
+|                                                 |       |acc_norm|0.2647|±  |0.0253|
+|hendrycksTest-philosophy                         |      1|acc     |0.1897|±  |0.0223|
+|                                                 |       |acc_norm|0.1897|±  |0.0223|
+|hendrycksTest-prehistory                         |      1|acc     |0.2377|±  |0.0237|
+|                                                 |       |acc_norm|0.2377|±  |0.0237|
+|hendrycksTest-professional_accounting            |      1|acc     |0.2482|±  |0.0258|
+|                                                 |       |acc_norm|0.2482|±  |0.0258|
+|hendrycksTest-professional_law                   |      1|acc     |0.2464|±  |0.0110|
+|                                                 |       |acc_norm|0.2464|±  |0.0110|
+|hendrycksTest-professional_medicine              |      1|acc     |0.4265|±  |0.0300|
+|                                                 |       |acc_norm|0.4265|±  |0.0300|
+|hendrycksTest-professional_psychology            |      1|acc     |0.2614|±  |0.0178|
+|                                                 |       |acc_norm|0.2614|±  |0.0178|
+|hendrycksTest-public_relations                   |      1|acc     |0.1818|±  |0.0369|
+|                                                 |       |acc_norm|0.1818|±  |0.0369|
+|hendrycksTest-security_studies                   |      1|acc     |0.1959|±  |0.0254|
+|                                                 |       |acc_norm|0.1959|±  |0.0254|
+|hendrycksTest-sociology                          |      1|acc     |0.2289|±  |0.0297|
+|                                                 |       |acc_norm|0.2289|±  |0.0297|
+|hendrycksTest-us_foreign_policy                  |      1|acc     |0.2400|±  |0.0429|
+|                                                 |       |acc_norm|0.2400|±  |0.0429|
+|hendrycksTest-virology                           |      1|acc     |0.2048|±  |0.0314|
+|                                                 |       |acc_norm|0.2048|±  |0.0314|
+|hendrycksTest-world_religions                    |      1|acc     |0.2222|±  |0.0319|
+|                                                 |       |acc_norm|0.2222|±  |0.0319|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/checkpoint-pegfss6f:v13,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 16
+|   Task   |Version|Metric|Value |   |Stderr|
+|----------|------:|------|-----:|---|-----:|
+|winogrande|      0|acc   |0.5099|±  | 0.014|
 ## Model Card Contact
+[More Information Needed]