Files changed (1) hide show
  1. README.md +11 -23
README.md CHANGED
@@ -6,22 +6,18 @@ language:
6
  - pt
7
  tags:
8
  - falcon3
9
- license: other
10
- license_name: falcon-llm-license
11
  license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
12
- library_name: transformers
13
  ---
14
 
15
- <div align="center">
16
- <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
17
- </div>
18
 
19
  # Falcon3-1B-Base
20
 
21
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
22
 
23
  This repository contains the **Falcon3-1B-Base**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
24
- Falcon3-1B-Base supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 4K.
25
  It was pruned in terms of depth, width, number of heads, and embedding channels from a larger 3B Falcon model, and was efficiently trained on only 80 GT using a knowledge distillation objective.
26
 
27
  ⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most use cases.**
@@ -29,14 +25,14 @@ It was pruned in terms of depth, width, number of heads, and embedding channels
29
  ## Model Details
30
  - Architecture
31
  - Transformer-based causal decoder-only architecture
32
- - 18 decoder blocks
33
  - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
34
  - Wider head dimension: 256
35
  - High RoPE value to support long context understanding: 1000042
36
  - Uses SwiGLU and RMSNorm
37
- - 4K context length
38
  - 131K vocab size
39
- - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
40
  - Supports EN, FR, ES, PT
41
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
42
  - License: TII Falcon-LLM License 2.0
@@ -67,10 +63,7 @@ print(response[0]['generated_text'])
67
  <br>
68
 
69
  ## Benchmarks
70
- We report in the following table our internal pipeline benchmarks.
71
- - We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
72
- - We report **raw scores**.
73
- - We use same batch-size across all models.
74
 
75
 
76
 
@@ -192,21 +185,16 @@ We report in the following table our internal pipeline benchmarks.
192
  </tbody>
193
  </table>
194
 
195
- ## Useful links
196
- - View our [release blogpost](https://huggingface.co/blog/falcon3).
197
- - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
198
-
199
  ## Technical Report
200
  Coming soon....
201
 
202
  ## Citation
203
- If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
204
-
205
  ```
206
  @misc{Falcon3,
207
- title = {The Falcon 3 Family of Open Models},
208
- url = {https://huggingface.co/blog/falcon3},
209
- author = {Falcon-LLM Team},
210
  month = {December},
211
  year = {2024}
212
  }
 
6
  - pt
7
  tags:
8
  - falcon3
9
+ license: other
10
+ license_name: falcon-llm-license
11
  license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
 
12
  ---
13
 
 
 
 
14
 
15
  # Falcon3-1B-Base
16
 
17
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
18
 
19
  This repository contains the **Falcon3-1B-Base**. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks.
20
+ Falcon3-1B-Base supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 32K.
21
  It was pruned in terms of depth, width, number of heads, and embedding channels from a larger 3B Falcon model, and was efficiently trained on only 80 GT using a knowledge distillation objective.
22
 
23
  ⚠️ **This is a raw, pretrained model, which should be further finetuned using SFT, RLHF, continued pretraining, etc. for most use cases.**
 
25
  ## Model Details
26
  - Architecture
27
  - Transformer-based causal decoder-only architecture
28
+ - 22 decoder blocks
29
  - Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
30
  - Wider head dimension: 256
31
  - High RoPE value to support long context understanding: 1000042
32
  - Uses SwiGLU and RMSNorm
33
+ - 32K context length
34
  - 131K vocab size
35
+ - Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 256 H100 GPU chips
36
  - Supports EN, FR, ES, PT
37
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
38
  - License: TII Falcon-LLM License 2.0
 
63
  <br>
64
 
65
  ## Benchmarks
66
+ We report in the following table our internal pipeline benchmarks:
 
 
 
67
 
68
 
69
 
 
185
  </tbody>
186
  </table>
187
 
 
 
 
 
188
  ## Technical Report
189
  Coming soon....
190
 
191
  ## Citation
192
+ If Falcon3 family were helpful to your work, feel free to give us a cite.
193
+
194
  ```
195
  @misc{Falcon3,
196
+ title = {The Falcon 3 family of Open Models},
197
+ author = {TII Team},
 
198
  month = {December},
199
  year = {2024}
200
  }