jacobmorrison
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -94,7 +94,7 @@ inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
|
|
94 |
# olmo = olmo.to('cuda')
|
95 |
response = olmo.generate(input_ids=inputs.to(olmo.device), max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
|
96 |
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
|
97 |
-
>> '<|user|>\nWhat is language modeling?\n<|assistant|>\nLanguage modeling is a type of natural language processing (NLP) task
|
98 |
```
|
99 |
Alternatively, with the pipeline abstraction:
|
100 |
```python
|
@@ -103,7 +103,7 @@ import hf_olmo
|
|
103 |
from transformers import pipeline
|
104 |
olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B-SFT")
|
105 |
print(olmo_pipe("What is language modeling?"))
|
106 |
-
>> '[{'generated_text': 'What is language modeling?\nLanguage modeling is a type of natural language processing (NLP) task...'}]'
|
107 |
```
|
108 |
|
109 |
Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-SFT", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
|
|
|
94 |
# olmo = olmo.to('cuda')
|
95 |
response = olmo.generate(input_ids=inputs.to(olmo.device), max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
|
96 |
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
|
97 |
+
>> '<|user|>\nWhat is language modeling?\n<|assistant|>\nLanguage modeling is a type of natural language processing (NLP) task that...'
|
98 |
```
|
99 |
Alternatively, with the pipeline abstraction:
|
100 |
```python
|
|
|
103 |
from transformers import pipeline
|
104 |
olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B-SFT")
|
105 |
print(olmo_pipe("What is language modeling?"))
|
106 |
+
>> '[{'generated_text': 'What is language modeling?\nLanguage modeling is a type of natural language processing (NLP) task that...'}]'
|
107 |
```
|
108 |
|
109 |
Or, you can make this slightly faster by quantizing the model, e.g. `AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-SFT", torch_dtype=torch.float16, load_in_8bit=True)` (requires `bitsandbytes`).
|