Safetensors
English
olmo2
amanrangapur commited on
Commit
7fe5381
·
verified ·
1 Parent(s): af1b2e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -38
README.md CHANGED
@@ -23,7 +23,28 @@ The core models released in this batch are the following:
23
  | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
24
  |------|--------|---------|-------------|-----------------|----------------|
25
  | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo-7B-0724-hf) | 4 Trillion | 32 | 4096 | 32 | 4096 |
26
- | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo-1B-0724-hf) | 5 Trillion | 42 | 5120 | 42 | 4096 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  We have released checkpoints for these models, for every 1000 training steps.
29
  The naming convention is `stepXXX-tokensYYYB`.
@@ -40,6 +61,20 @@ out = list_repo_refs("allenai/OLMo2-13B-1124")
40
  branches = [b.name for b in out.branches]
41
  ```
42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  ### Model Description
44
 
45
  - **Developed by:** Allen Institute for AI (Ai2)
@@ -63,41 +98,6 @@ branches = [b.name for b in out.branches]
63
  - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
64
 
65
 
66
- ## Uses
67
-
68
- ### Inference
69
-
70
- Proceed as usual with HuggingFace:
71
- ```python
72
- from transformers import AutoModelForCausalLM, AutoTokenizer
73
- olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
74
- tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo2-7B-1124")
75
- message = ["Language modeling is "]
76
- inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
77
- # optional verifying cuda
78
- # inputs = {k: v.to('cuda') for k,v in inputs.items()}
79
- # olmo = olmo.to('cuda')
80
- response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
81
- print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
82
- >> 'Language modeling is the first step to build natural language generation...'
83
- ```
84
-
85
- Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
86
- The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
87
-
88
- ### Fine-tuning
89
- Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
90
- 1. Fine-tune with the OLMo repository:
91
- ```bash
92
- torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
93
- --data.paths=[{path_to_data}/input_ids.npy] \
94
- --data.label_mask_paths=[{path_to_data}/label_mask.npy] \
95
- --load_path={path_to_checkpoint} \
96
- --reset_trainer_state
97
- ```
98
- For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning).
99
-
100
- 2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
101
 
102
  <!-- TODO -->
103
  ## Evaluation
@@ -120,7 +120,7 @@ Core model results for OLMo 7B models are found below.
120
  | GSM8k | 10.0 | 12.0 | 4.0 | 4.5 | 8.5 | 25.0 | 29.0 | 35.0 |
121
  | Full average | 60.3 | 62.1 | 59.2 | 59.3 | 59.8 | 66.2 | 63.8 | 64.2 |
122
 
123
- And for 1B models:
124
 
125
  | task | random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
126
  | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
@@ -229,4 +229,4 @@ Groeneveld, D., Beltagy, I., Walsh, P., Bhagia, A., Kinney, R., Tafjord, O., Jha
229
  ## Model Card Contact
230
 
231
 
232
- For errors in this model card, contact Nathan, `{nathanl} at allenai dot org`.
 
23
  | Size | Training Tokens | Layers | Hidden Size | Attention Heads | Context Length |
24
  |------|--------|---------|-------------|-----------------|----------------|
25
  | [OLMo2-7B July 2024](https://huggingface.co/allenai/OLMo-7B-0724-hf) | 4 Trillion | 32 | 4096 | 32 | 4096 |
26
+ | [OLMo2- 13B July 2024](https://huggingface.co/allenai/OLMo-1B-0724-hf) | 5 Trillion | 40 | 5120 | 42 | 4096 |
27
+
28
+
29
+ ## Inference
30
+
31
+ Proceed as usual with HuggingFace:
32
+ ```python
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+ olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124")
35
+ tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo2-7B-1124")
36
+ message = ["Language modeling is "]
37
+ inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
38
+ # optional verifying cuda
39
+ # inputs = {k: v.to('cuda') for k,v in inputs.items()}
40
+ # olmo = olmo.to('cuda')
41
+ response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
42
+ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
43
+ >> 'Language modeling is the first step to build natural language generation...'
44
+ ```
45
+
46
+ Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo2-7B-1124", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
47
+ The quantized model is more sensitive to typing / cuda, so it is recommended to pass the inputs as `inputs.input_ids.to('cuda')` to avoid potential issues.
48
 
49
  We have released checkpoints for these models, for every 1000 training steps.
50
  The naming convention is `stepXXX-tokensYYYB`.
 
61
  branches = [b.name for b in out.branches]
62
  ```
63
 
64
+ ### Fine-tuning
65
+ Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
66
+ 1. Fine-tune with the OLMo repository:
67
+ ```bash
68
+ torchrun --nproc_per_node=8 scripts/train.py {path_to_train_config} \
69
+ --data.paths=[{path_to_data}/input_ids.npy] \
70
+ --data.label_mask_paths=[{path_to_data}/label_mask.npy] \
71
+ --load_path={path_to_checkpoint} \
72
+ --reset_trainer_state
73
+ ```
74
+ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo?tab=readme-ov-file#fine-tuning).
75
+
76
+ 2. Further fine-tuning support is being developing in AI2's Open Instruct repository. Details are [here](https://github.com/allenai/open-instruct).
77
+
78
  ### Model Description
79
 
80
  - **Developed by:** Allen Institute for AI (Ai2)
 
98
  - **W&B Logs:** [pretraining](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B), [annealing](https://wandb.ai/ai2-llm/OLMo-7B/groups/OLMo-1.7-7B-anneal)
99
 
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
  <!-- TODO -->
103
  ## Evaluation
 
120
  | GSM8k | 10.0 | 12.0 | 4.0 | 4.5 | 8.5 | 25.0 | 29.0 | 35.0 |
121
  | Full average | 60.3 | 62.1 | 59.2 | 59.3 | 59.8 | 66.2 | 63.8 | 64.2 |
122
 
123
+ And for 13B models:
124
 
125
  | task | random | [StableLM 2 1.6b](https://huggingface.co/stabilityai/stablelm-2-1_6b)\* | [Pythia 1B](https://huggingface.co/EleutherAI/pythia-1b) | [TinyLlama 1.1B](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) | [OLMo 1.0 1B](https://huggingface.co/allenai/OLMo-1B-hf) | **OLMo 1B July 2024** |
126
  | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ----------------- | --------- | -------------------------------------- | ------- | ------ |
 
229
  ## Model Card Contact
230
 
231
 
232
+ For errors in this model card, contact Nathan, `{amanr} at allenai dot org`.