amanrangapur
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -23,7 +23,28 @@ The core models released in this batch are the following:
|
|
23 |
| Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
|
24 |
|------|--------|---------|-------------|-----------------|----------------|
|
25 |
| [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo-7B-0724-hf) | 4 Trillion | 32 | 4096 | 32 | 4096 |
|
26 |
-
| [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo-1B-0724-hf) | 5 Trillion |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
We have released checkpoints for these models, for every 1000 training steps.
|
29 |
The naming convention is `stepXXX-tokensYYYB`.
|
@@ -40,6 +61,20 @@ out = list_repo_refs("allenai/OLMo2-13B-1124")
|
|
40 |
branches = [b.name for b in out.branches]
|
41 |
```
|
42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
43 |
### Model Description
|
44 |
|
45 |
- **Developed by:** Allen Institute for AI (Ai2)
|
@@ -63,41 +98,6 @@ branches = [b.name for b in out.branches]
|
|
63 |
- **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
|
64 |
|
65 |
|
66 |
-
## Uses
|
67 |
-
|
68 |
-
### Inference
|
69 |
-
|
70 |
-
Proceed as usual with HuggingFace:
|
71 |
-
```python
|
72 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
73 |
-
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
|
74 |
-
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo2-7B-1124")
|
75 |
-
message = ["Language modeling is "]
|
76 |
-
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
|
77 |
-
# optional verifying cuda
|
78 |
-
# inputs = {k: v.to('cuda') for k,v in inputs.items()}
|
79 |
-
# olmo = olmo.to('cuda')
|
80 |
-
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
|
81 |
-
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
|
82 |
-
>> 'Language modeling is the first step to build natural language generation...'
|
83 |
-
```
|
84 |
-
|
85 |
-
Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
|
86 |
-
The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
|
87 |
-
|
88 |
-
### Fine-tuning
|
89 |
-
Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
|
90 |
-
1. Fine-tune with the OLMo repository:
|
91 |
-
```bash
|
92 |
-
torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
|
93 |
-
--data.paths=[{path_to_data}/input_ids.npy] \
|
94 |
-
--data.label_mask_paths=[{path_to_data}/label_mask.npy] \
|
95 |
-
--load_path={path_to_checkpoint} \
|
96 |
-
--reset_trainer_state
|
97 |
-
```
|
98 |
-
For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning).
|
99 |
-
|
100 |
-
2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
|
101 |
|
102 |
<!-- TODO -->
|
103 |
## Evaluation
|
@@ -120,7 +120,7 @@ Core model results for OLMo 7B models are found below.
|
|
120 |
| GSM8k | 10.0 | 12.0 | 4.0 | 4.5 | 8.5 | 25.0 | 29.0 | 35.0 |
|
121 |
| Full average | 60.3 | 62.1 | 59.2 | 59.3 | 59.8 | 66.2 | 63.8 | 64.2 |
|
122 |
|
123 |
-
And for
|
124 |
|
125 |
| task | random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
|
126 |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
|
@@ -229,4 +229,4 @@ Groeneveld, D., Beltagy, I., Walsh, P., Bhagia, A., Kinney, R., Tafjord, O., Jha
|
|
229 |
## Model Card Contact
|
230 |
|
231 |
|
232 |
-
For errors in this model card, contact Nathan, `{
|
|
|
23 |
| Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
|
24 |
|------|--------|---------|-------------|-----------------|----------------|
|
25 |
| [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo-7B-0724-hf) | 4 Trillion | 32 | 4096 | 32 | 4096 |
|
26 |
+
| [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo-1B-0724-hf) | 5 Trillion | 40 | 5120 | 42 | 4096 |
|
27 |
+
|
28 |
+
|
29 |
+
## Inference
|
30 |
+
|
31 |
+
Proceed as usual with HuggingFace:
|
32 |
+
```python
|
33 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
34 |
+
olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
|
35 |
+
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo2-7B-1124")
|
36 |
+
message = ["Language modeling is "]
|
37 |
+
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
|
38 |
+
# optional verifying cuda
|
39 |
+
# inputs = {k: v.to('cuda') for k,v in inputs.items()}
|
40 |
+
# olmo = olmo.to('cuda')
|
41 |
+
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
|
42 |
+
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
|
43 |
+
>> 'Language modeling is the first step to build natural language generation...'
|
44 |
+
```
|
45 |
+
|
46 |
+
Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
|
47 |
+
The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
|
48 |
|
49 |
We have released checkpoints for these models, for every 1000 training steps.
|
50 |
The naming convention is `stepXXX-tokensYYYB`.
|
|
|
61 |
branches = [b.name for b in out.branches]
|
62 |
```
|
63 |
|
64 |
+
### Fine-tuning
|
65 |
+
Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
|
66 |
+
1. Fine-tune with the OLMo repository:
|
67 |
+
```bash
|
68 |
+
torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
|
69 |
+
--data.paths=[{path_to_data}/input_ids.npy] \
|
70 |
+
--data.label_mask_paths=[{path_to_data}/label_mask.npy] \
|
71 |
+
--load_path={path_to_checkpoint} \
|
72 |
+
--reset_trainer_state
|
73 |
+
```
|
74 |
+
For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning).
|
75 |
+
|
76 |
+
2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
|
77 |
+
|
78 |
### Model Description
|
79 |
|
80 |
- **Developed by:** Allen Institute for AI (Ai2)
|
|
|
98 |
- **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
|
99 |
|
100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
101 |
|
102 |
<!-- TODO -->
|
103 |
## Evaluation
|
|
|
120 |
| GSM8k | 10.0 | 12.0 | 4.0 | 4.5 | 8.5 | 25.0 | 29.0 | 35.0 |
|
121 |
| Full average | 60.3 | 62.1 | 59.2 | 59.3 | 59.8 | 66.2 | 63.8 | 64.2 |
|
122 |
|
123 |
+
And for 13B models:
|
124 |
|
125 |
| task | random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
|
126 |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
|
|
|
229 |
## Model Card Contact
|
230 |
|
231 |
|
232 |
+
For errors in this model card, contact Nathan, `{amanr} at allenai dot org`.
|